AlphaFold2 Evoformer: Architecture & How It Works

The Evoformer is the core neural network module of AlphaFold2, DeepMind's breakthrough protein structure prediction system. It processes evolutionary and pairwise residue information through a novel dual-track Transformer architecture to predict 3D protein structures with atomic accuracy.

Architecture Overview

The Evoformer maintains two parallel representations: an MSA (Multiple Sequence Alignment) representation of shape (N_seq × N_res × d_msa) capturing evolutionary patterns, and a pair representation of shape (N_res × N_res × d_pair) capturing relationships between all residue pairs.

Each Evoformer block consists of several operations that exchange information between these two tracks. The MSA track applies row-wise self-attention (with pair bias), column-wise self-attention, and a transition FFN. The pair track applies triangle multiplicative updates (outgoing and incoming), triangle self-attention (starting and ending node), and a transition FFN. An outer product mean operation transfers information from the MSA representation to the pair representation.

After 48 Evoformer blocks, the pair representation feeds into the Structure Module, which uses an Invariant Point Attention (IPA) mechanism to predict 3D backbone coordinates (rotation + translation per residue) in an iterative refinement process.

Key Innovations

Triangle updates: Operations that enforce geometric consistency in the pair representation—if residue A is near B and B is near C, then A should be near C—using multiplicative interactions along triangular paths
MSA-pair dual track: Continuous exchange of information between evolutionary (MSA) and structural (pair) representations allows each to inform the other
Invariant Point Attention: An attention mechanism in the Structure Module that operates in local residue reference frames, ensuring predictions are invariant to global rotations and translations
Recycling: The entire network is run 3 times, with outputs fed back as inputs, progressively refining the prediction

Common Use Cases

Protein structure prediction from amino acid sequences, drug discovery (understanding binding sites), protein engineering, understanding disease-related mutations, structural biology research, and as a foundation for protein design tools.

Notable Variants & Sizes

AlphaFold2 (original, 93M parameters), AlphaFold-Multimer (protein complexes), OpenFold (open-source reimplementation), ESMFold (single-sequence prediction using ESM-2 language model instead of MSA), RoseTTAFold (three-track architecture), AlphaFold3 (extends to ligands, DNA, RNA, and post-translational modifications using diffusion).

Technical Details

Evoformer: 48 blocks, MSA dim 256, pair dim 128, 8 attention heads. MSA representation: (N_seq × N_res × 256), pair: (N_res × N_res × 128). Row-wise attention uses pair representation as bias terms. Triangle multiplicative update projects pair features to 128-dim, computes outer products along triangular edges, and projects back. Structure Module: 8 layers of IPA with 12 attention heads, 16 scalar query points and 4 point query points per head. Training: ~170K protein structures from PDB, self-distillation on predicted structures, crop size 384 residues, ~1 week on 128 TPUv3. Recycling: 3 iterations with gradient only through the final iteration.