AlphaFold is an artificial intelligence system developed by Google DeepMind that predicts the three-dimensional structure of proteins from their amino acid sequences. Since its debut at the CASP13 competition in 2018, AlphaFold has transformed structural biology, making it possible to predict protein shapes with near-experimental accuracy. The system earned its creators, Demis Hassabis and John Jumper, the 2024 Nobel Prize in Chemistry, and its predictions have been used by over two million researchers worldwide.
Proteins are molecular machines that carry out virtually every function in living cells, from catalyzing chemical reactions to transporting molecules across membranes. A protein's function is determined by its three-dimensional shape, which in turn is dictated by the sequence of amino acids in the protein chain. Determining these shapes experimentally, through techniques such as X-ray crystallography, cryo-electron microscopy, and nuclear magnetic resonance spectroscopy, is slow and expensive. A single structure determination can take months or years and cost hundreds of thousands of dollars.
The gap between known protein sequences and experimentally determined structures has widened steadily. By the early 2020s, roughly 200 million protein sequences had been cataloged in the UniProt database, but the Protein Data Bank (PDB) contained experimentally solved structures for fewer than 200,000 of them. This disparity made computational protein folding prediction one of the most important open problems in biology.
DeepMind entered the protein structure prediction field in 2018 with its first AlphaFold system. The team competed in the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP13), a biennial blind competition that had been running since 1994 and served as the primary benchmark for the field.
AlphaFold 1 used a deep learning approach based on a deep residual neural network (ResNet). Rather than predicting binary contact maps (whether two amino acid residues are close together or not), the system predicted full probability distributions over distances between every pair of residues. This distance prediction approach provided substantially more information than contact maps alone.
The system took as input a multiple sequence alignment (MSA), which compiles related protein sequences found in genetic databases. From the MSA, the neural network extracted co-evolutionary patterns, meaning pairs of positions in the sequence that tend to mutate together, suggesting spatial proximity in the folded structure. The predicted distance distributions were then used to construct a potential of mean force, which could be optimized through gradient descent to generate candidate 3D structures.
AlphaFold 1 placed first in the overall rankings at CASP13 in December 2018. It was particularly effective on the hardest category of targets, known as free modeling (FM) targets, where no homologous template structures existed. AlphaFold produced high-accuracy structures (with template modeling scores of 0.7 or higher) for 24 out of 43 free modeling domains, while the next best method achieved this level of accuracy for only 14 out of 43 domains. The margin of victory was the largest improvement in a single CASP cycle in the competition's history at that time.
| Metric | AlphaFold 1 | Next best method |
|---|---|---|
| FM domains with TM-score >= 0.7 | 24 / 43 | 14 / 43 |
| Overall ranking | 1st | 2nd |
| Year | 2018 | 2018 |
AlphaFold 2, presented at CASP14 in November-December 2020, represented a fundamental redesign of the system. It achieved near-experimental accuracy on most targets and is widely considered the moment the protein structure prediction problem was effectively solved for single-chain proteins.
AlphaFold 2 replaced the ResNet architecture of the first version with a novel transformer-based design built around two main components: the Evoformer module and the Structure module.
Evoformer module. The Evoformer is the core of AlphaFold 2. It operates on two data representations simultaneously:
The Evoformer consists of 48 blocks (with unshared weights), each of which updates both representations through a series of operations. The MSA representation is processed using axial self-attention, where attention is applied along the rows and columns of the alignment separately. The pair representation is processed using triangular updates and triangular self-attention, operations designed to enforce geometric consistency: if residue A is close to residue B and residue B is close to residue C, then the representation should reflect information about the A-C relationship as well. The two representations also exchange information, with the pair representation conditioning the MSA attention and the MSA representation feeding back into the pair representation.
Structure module. The Structure module takes the final outputs from the Evoformer and converts them into explicit 3D atomic coordinates. It first generates a protein backbone by predicting a rigid-body transformation (rotation and translation) for each residue, then places side-chain atoms using predicted torsion angles. The Structure module contains 8 blocks with shared weights, operating in an iterative fashion.
Recycling. AlphaFold 2 uses an iterative process called recycling, where the predicted MSA representation, pair representation, and 3D coordinates are fed back into the network for additional rounds of refinement. In practice, three recycling iterations are used during inference, with each pass improving the accuracy of the predicted structure.
| Component | Function | Blocks | Weight sharing |
|---|---|---|---|
| Input embedding | Generates initial MSA and pair representations from sequence and MSA | 1 | N/A |
| Evoformer | Refines MSA and pair representations through attention and triangular updates | 48 | No |
| Structure module | Converts representations to 3D atomic coordinates | 8 | Yes |
| Recycling | Feeds output back into the network for iterative refinement | 3 iterations | Reuses full network |
At CASP14, AlphaFold 2 achieved a median Global Distance Test (GDT) score of 92.4 across all targets, a score that approached the level of experimental uncertainty in many structural determination methods. A GDT score of 90 or above is generally considered competitive with experimentally determined structures. Approximately two-thirds of the 96 targets reached GDT scores above 90 in backbone accuracy.
The result stunned the structural biology community. John Moult, the co-founder and longtime organizer of CASP, stated that the protein structure prediction problem was, in a practical sense, solved. The performance gap between AlphaFold 2 and the second-place team was enormous, with AlphaFold 2 achieving accuracy far beyond what any other method had managed.
| Competition | Year | AlphaFold version | Median GDT score | Result |
|---|---|---|---|---|
| CASP13 | 2018 | AlphaFold 1 | ~60 (estimated) | 1st place |
| CASP14 | 2020 | AlphaFold 2 | 92.4 | 1st place (near-experimental) |
The AlphaFold 2 source code and trained model weights were released publicly on July 15, 2021, alongside the publication of the full research paper in Nature. The code was released under the Apache 2.0 license, allowing both academic and commercial use. This open release enabled researchers worldwide to run AlphaFold 2 predictions on their own hardware and to build on the architecture for new applications.
Subsequent updates expanded the system's capabilities. AlphaFold-Multimer, released in versions 2.1 and 2.2, extended the system to predict the structures of protein complexes containing multiple chains. Version 2.3 further improved accuracy on large multi-chain complexes.
The open-source community also produced independent implementations. OpenFold, developed by the OpenFold Consortium, provided a trainable PyTorch reimplementation of AlphaFold 2, giving researchers the ability to retrain the model on different datasets or for different tasks.
In partnership with the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), DeepMind launched the AlphaFold Protein Structure Database (AlphaFold DB) in July 2021. The database provides free, open access to AlphaFold's predicted structures.
The initial release covered the complete proteomes of 21 model organisms, including the human proteome. In July 2022, the database was massively expanded to include predicted structures for over 200 million proteins, covering nearly every protein sequence cataloged in the UniProt database. This expansion represented a roughly 500-fold increase from the initial release and effectively provided a structural prediction for almost every known protein.
As of 2024, the database contains over 214 million entries. In 2025, the database was synchronized with the UniProt 2025_03 release and received a comprehensive redesign of its entry pages, integrating annotations directly with an interactive 3D structure viewer.
In early 2026, the AlphaFold DB expanded to include predictions of protein complexes for the first time. The initial addition consisted of 1.7 million high-confidence homodimer predictions (complexes of two identical protein chains). These homodimer structures are drawn from a broader set of 30 million complex predictions computed by DeepMind and EMBL-EBI.
| Database milestone | Date | Entries |
|---|---|---|
| Initial launch (21 model organisms) | July 2021 | ~365,000 |
| Major expansion | July 2022 | 200+ million |
| UniProt-aligned update | 2024 | 214+ million |
| Homodimer predictions added | 2026 | 1.7 million complexes added |
The database has been accessed by over two million researchers from more than 190 countries. By November 2025, the AlphaFold 3 paper alone had been cited over 9,000 times, reflecting the broad adoption of AlphaFold across biological sciences.
AlphaFold 3 was announced on May 8, 2024, co-developed by Google DeepMind and Isomorphic Labs, a drug discovery company spun out of DeepMind.
While AlphaFold 2 focused primarily on predicting the structures of individual proteins or protein complexes, AlphaFold 3 broadened the scope to predict structures involving proteins together with other biomolecules. It can model:
This expanded capability is particularly important for drug discovery, where understanding how a protein interacts with a small-molecule drug candidate is essential.
For interactions between proteins and other molecule types, AlphaFold 3 showed at least a 50% improvement over existing prediction methods. In some categories of molecular interaction, accuracy doubled compared to previous approaches.
AlphaFold 3 introduced a diffusion-based generative approach for the structure module, replacing the direct coordinate prediction used in AlphaFold 2. This diffusion model, similar in concept to those used in image generation systems, generates atomic coordinates by iteratively denoising a random initial configuration. The diffusion approach allows the model to better handle the increased complexity of multi-molecule systems and can represent structural uncertainty more naturally.
AlphaFold 3 was initially made available through the AlphaFold Server, a free web-based interface that allows researchers to submit prediction jobs without needing local computational resources. The source code and model weights were released in stages: they were made available to the scientific community for non-commercial use in November 2024, and became publicly accessible on GitHub in February 2025, though still under a non-commercial license.
The OpenFold Consortium at Columbia University separately developed OpenFold 3, an independent open-source reimplementation aiming to reproduce AlphaFold 3's results without commercial restrictions.
On October 9, 2024, the Royal Swedish Academy of Sciences awarded the Nobel Prize in Chemistry to three scientists for their work on computational protein science. One half of the prize went to David Baker of the University of Washington for computational protein design. The other half was shared by Demis Hassabis and John Jumper of Google DeepMind for protein structure prediction with AlphaFold.
The prize, worth 11 million Swedish kronor (approximately $1 million USD), recognized what the Nobel Committee described as solving a 50-year-old problem in biology. The committee cited Levinthal's paradox and the long history of the protein folding problem as context for the significance of the achievement.
Hassabis, who co-founded DeepMind in 2010 and serves as its CEO, had originally trained as a neuroscientist and game designer. Jumper, a senior research scientist at DeepMind, holds a PhD in chemistry from the University of Chicago and led the technical development of AlphaFold 2.
| Nobel Prize in Chemistry 2024 | ||
|---|---|---|
| Laureate | Affiliation | Contribution |
| David Baker | University of Washington | Computational protein design |
| Demis Hassabis | Google DeepMind / Isomorphic Labs | Protein structure prediction (AlphaFold) |
| John Jumper | Google DeepMind | Protein structure prediction (AlphaFold) |
AlphaFold has accelerated structural biology research by providing instant structural hypotheses for proteins that previously had no known structure. Researchers use AlphaFold predictions to guide experimental work, design better crystallization constructs, interpret cryo-EM density maps, and identify functional sites on proteins.
The system has proven particularly useful for organisms whose proteins have been studied less extensively. For many bacterial and archaeal species, the AlphaFold database provides the only available structural information for the majority of their proteomes.
Isomorphic Labs, the drug discovery company founded by Hassabis in 2021, applies AlphaFold and related AI technology to pharmaceutical research. In January 2024, Isomorphic announced partnerships with Eli Lilly and Novartis worth a combined $3 billion in potential milestone payments. The Lilly deal included $45 million upfront with over $1.7 billion in milestones, while the Novartis deal included $37.5 million upfront with $1.2 billion in potential milestones. In February 2025, Novartis expanded this partnership with additional research programs, and Isomorphic raised $600 million in its first financing round in March 2025.
Beyond Isomorphic, the broader pharmaceutical industry has rapidly adopted AI-driven structure prediction. Companies and academic labs use AlphaFold predictions to identify binding sites, screen drug candidates computationally, and design molecules targeting previously "undruggable" proteins.
AlphaFold's impact extends beyond traditional drug discovery:
Despite its achievements, AlphaFold has notable limitations that researchers must account for:
Static structures. AlphaFold predicts a single static structure for each protein, but real proteins are dynamic molecules that adopt multiple conformations as part of their function. For proteins known to switch between different conformational states, AlphaFold typically predicts only one of these states, usually the one best represented in training data.
Intrinsically disordered regions. Many proteins contain regions that do not fold into stable structures. AlphaFold assigns low confidence scores to these intrinsically disordered regions, which is useful as a diagnostic, but it cannot characterize the conformational ensembles that these regions actually adopt.
Confidence calibration. AlphaFold provides per-residue confidence scores (pLDDT), but these scores do not always perfectly correlate with actual accuracy. Users must exercise caution when interpreting low-confidence predictions.
Novel folds. AlphaFold's accuracy depends on the availability of homologous sequences in the MSA. For proteins with very few known relatives ("orphan proteins") or genuinely novel folds not represented in training data, prediction accuracy can decrease.
Ligand and cofactor effects. While AlphaFold 3 can predict protein-ligand interactions, the accuracy of these predictions remains lower than for protein-only structures. Predicting how a protein changes shape upon binding a ligand remains a particular challenge.
AlphaFold's success spurred the development of several alternative protein structure prediction tools:
| Tool | Developer | Key features |
|---|---|---|
| RoseTTAFold | David Baker lab, University of Washington | Three-track neural network; open-source; extended to model protein-DNA-RNA complexes |
| ESMFold | Meta AI | Single-sequence input (no MSA needed); 60x faster than AlphaFold 2 for short sequences; 15 billion parameter protein language model |
| OpenFold | OpenFold Consortium | Trainable open-source reimplementation of AlphaFold 2 in PyTorch |
| OmegaFold | HeliXon | Single-sequence prediction using protein language models |
| Boltz-2 | MIT / Recursion | Co-folds protein-ligand pairs and predicts binding affinity; announced June 2025 |
| Pearl | Genesis Molecular AI | Interactive model allowing user-guided predictions; aimed at drug discovery |
RoseTTAFold, developed by David Baker's laboratory at the University of Washington, uses a three-track architecture that simultaneously processes one-dimensional sequence information, two-dimensional distance maps, and three-dimensional coordinates. ESMFold, developed by Meta AI, takes a different approach by using a large protein language model that eliminates the need for MSA computation entirely, making it dramatically faster for large-scale applications. Meta's ESM Metagenomic Atlas used ESMFold to predict structures for 772 million metagenomic protein sequences.
Ensemble approaches have also emerged. FiveFold, for example, combines predictions from AlphaFold 2, RoseTTAFold, OmegaFold, ESMFold, and EMBER3D to improve accuracy and better capture conformational diversity.
AlphaFold's dominance reshaped the CASP competition. DeepMind did not enter CASP15 (2022) or CASP16 (2024) as a competitor, but virtually all top-performing teams in both competitions used AlphaFold or modifications of it as the basis for their predictions.
CASP15 (2022) showed substantial progress in modeling protein complexes, a challenge that goes beyond single-chain structure prediction. CASP16 (2024), held in Punta Cana, Dominican Republic, tested 59 protein targets with 85 domains. The competition revealed that while AlphaFold-based methods dominated overall, they still struggled with certain types of targets, particularly antibody-antigen complexes. The team led by Sandor Vajda and Dima Kozakov won the protein complexes category by combining AlphaFold predictions with the ClusPro docking server, substantially outperforming teams that relied on AlphaFold alone.
The CASP competition itself faced an existential challenge in 2025 when the U.S. National Institutes of Health (NIH) declined to renew its long-running funding grant due to budget cuts. Google DeepMind stepped in with interim funding to keep the competition operational.
As of early 2026, the AlphaFold ecosystem continues to expand:
The challenge of predicting protein dynamics, conformational ensembles for intrinsically disordered proteins, and the effects of mutations on protein stability and function represents the next frontier for the field that AlphaFold opened.