AlphaProteo is a machine learning system developed by Google DeepMind for the de novo design of high-affinity protein binders. Announced on September 5, 2024, the system accepts a target protein's three-dimensional structure as input and generates novel proteins that bind tightly to specified sites on that target. In laboratory experiments across seven target proteins, AlphaProteo achieved binding affinities 3 to 300 times better than the best existing computational design methods, and reached an 88% experimental binding success rate on one target. The system is trained on structural data from the Protein Data Bank (PDB) and more than 100 million protein structure predictions generated by AlphaFold.
Proteins carry out virtually every function in living cells, from catalyzing chemical reactions to transmitting signals across membranes. Many diseases arise when specific proteins malfunction or when pathogens exploit them. Designing molecules that bind tightly and selectively to a protein of interest is therefore a central challenge in drug discovery, diagnostics, and basic biology research.
Protein binders are proteins engineered to attach to a defined surface region, or epitope, on a target molecule. Tight binding, measured as a dissociation constant (Kd) in the nanomolar (nM) or sub-nanomolar (picomolar, pM) range, is generally required for a binder to be useful in research or therapeutic contexts. A lower Kd means the binder clings more tenaciously to its target.
For decades, researchers relied on antibodies as the default source of binding proteins. Antibodies are powerful but expensive to produce, difficult to engineer, and sometimes poorly suited for applications requiring small, stable proteins. An alternative approach, called de novo protein design, aims to construct entirely new protein sequences that fold into shapes predicted to bind a chosen target. Early de novo design using the Rosetta software suite could achieve this in principle, but success rates were extremely low, often below 1%, because the enormous sequence and structure space makes it hard to find designs that both fold correctly and bind the target.
The introduction of deep learning, particularly methods like AlphaFold 2 for structure prediction, changed what was possible. Using AlphaFold-based filters to score candidate designs improved success rates roughly tenfold over unfiltered Rosetta approaches. The RFdiffusion system, developed at the Baker Laboratory at the University of Washington and published in Nature in 2023, pushed further by framing backbone generation as a diffusion denoising problem adapted from the image generation literature. RFdiffusion generates protein backbone geometries; a separate model called ProteinMPNN then assigns amino acid sequences to those backbones, and AlphaFold2 is used to rank the resulting designs. This two-step pipeline became the leading method before AlphaProteo and produced experimental success rates reported in the range of roughly 9 to 43% depending on the target.
Even with those improvements, significant obstacles remained. Many challenging targets, including proteins with flat, featureless binding surfaces or those known to be difficult because of their topology, yielded very low success rates. High-throughput screening of large libraries was sometimes required to find functional binders. And the sequential nature of backbone generation followed by sequence design meant that errors in the first step compounded into the second.
AlphaProteo is described in a preprint titled "De novo design of high-affinity protein binders with AlphaProteo," submitted to arXiv on September 12, 2024, with Vinicius Zambaldi and David La as lead authors and Demis Hassabis and Pushmeet Kohli as senior contributors. The 45-page paper includes 17 figures and carries a Creative Commons BY-NC-SA 4.0 license. The work involved 32 co-authors from Google DeepMind and the Francis Crick Institute in London.
The paper explicitly states that machine learning methods are not disclosed: "This report does not include machine learning methods due to biosecurity and commercial considerations." Despite this, the paper describes the system's inputs, outputs, and components at a functional level.
AlphaProteo consists of two components. The first is a generative model trained on protein structure and sequence data from the PDB and a distillation set of AlphaFold structure predictions. At inference time, a researcher supplies the three-dimensional structure of a target protein, typically derived from X-ray crystallography or cryo-electron microscopy, and optionally designates specific hotspot residues corresponding to the binding epitope of interest. The generative model then produces candidate binder structures and sequences. Generated binders range from 50 to 140 amino acids in length.
The second component is an automated filter that scores each generated design and predicts whether it will bind successfully when tested in the laboratory. This scoring step reduces the pool of generated candidates before wet-lab validation. For each target, between 47 and 172 candidates were selected for experimental testing after filtering.
AlphaFold predictions play a dual role. During training, the system learns from more than 100 million AlphaFold-predicted structures in addition to experimental PDB structures, giving it exposure to a far larger sampling of protein conformational space than experimental data alone could provide. This connection to AlphaFold is substantial: without the large corpus of predicted structures, the training set would be limited to the roughly 200,000 experimentally determined structures in the PDB as of 2024.
AlphaFold 2, released in 2021, solved the protein structure prediction problem by predicting the three-dimensional structure of a protein from its amino acid sequence with near-experimental accuracy. AlphaFold 3, announced in May 2024, extended this capability to complexes involving DNA, RNA, and small-molecule ligands.
AlphaProteo is not a version of AlphaFold. It addresses the inverse problem: rather than predicting how a given sequence folds, it generates new sequences designed to bind a specified structure. However, the two systems are tightly linked. AlphaProteo's training corpus incorporates AlphaFold predictions at scale, and the filter component uses structure prediction as a scoring mechanism. The relationship is analogous to a construction worker using blueprints to build something new versus an architect reading a completed building and inferring its design.
In practical terms, AlphaFold provides the "language" of protein structures that AlphaProteo uses to reason about binding. Structural data on how proteins naturally interact with one another, as represented in PDB complexes and AlphaFold predictions of those complexes, informs the generative model's understanding of binding interfaces.
AlphaProteo was evaluated on eight target proteins. Seven yielded successful binders; one, TNF-alpha (TNFα), produced no successful candidates after computational filtering and wet-lab testing.
Experimental binding was assessed primarily by yeast display, a technique where candidate binder sequences are expressed on the surface of yeast cells and tested for their ability to retain fluorescently labeled target protein. Binding affinity (Kd) was measured for confirmed binders using methods including biolayer interferometry. Structural validation for selected binders was carried out using X-ray crystallography and cryo-electron microscopy.
The table below shows experimental success rates and best measured binding affinities for each target, alongside the best-performing prior method tested on the same target.
| Target | Indication area | AlphaProteo success rate | Best AlphaProteo Kd | Best prior method success rate | Best prior method Kd |
|---|---|---|---|---|---|
| BHRF1 (viral protein) | Viral biology | 88% (n=94) | 8.5 nM | ~18% | ~58 nM |
| VEGF-A | Cancer, diabetes complications | 33% (n=94) | 0.48 nM | No prior binders reported | N/A |
| IL-7Rα | Autoimmune, cancer immunology | 25% (n=94) | 0.082 nM | 17% (RFdiffusion) | 14 nM |
| PD-L1 | Cancer immunotherapy | 15% (n=159) | 0.18 nM | 13% (RFdiffusion) | 0.9 nM |
| IL-17A | Autoimmune inflammation | 14% (n=63) | 8.4 nM | 0.02% | 47 nM |
| SC2RBD (SARS-CoV-2 spike) | COVID-19 | 12% (n=172) | 26 nM | 1.6% | 100 nM |
| TrkA (NTRK1) | Neuropathic pain, cancer | 9% (n=131) | 0.96 nM | 0% (RFdiffusion) | 3000 nM |
| TNFα | Autoimmune diseases | 0% (n=54) | N/A | N/A | N/A |
For BHRF1, an Epstein-Barr virus protein, AlphaProteo achieved an 88% binding success rate across 94 tested candidates. On SC2RBD and IL-17A, AlphaProteo showed respectively approximately 8-fold and 700-fold higher success rates than the next-best tested method.
For VEGF-A (vascular endothelial growth factor A), which plays roles in tumor angiogenesis and in diabetic retinopathy, AlphaProteo generated the first computationally designed successful binders ever reported for that target. Prior computational methods had not produced VEGF-A binders validated in the laboratory.
For TrkA (tropomyosin receptor kinase A, also called NTRK1), a target implicated in neuropathic pain and some cancers, RFdiffusion produced 0 successful binders in the authors' own testing, while AlphaProteo achieved a 9% success rate.
For IL-7Rα, a receptor relevant to T-cell development and some autoimmune conditions, AlphaProteo outperformed RFdiffusion in both success rate (25% vs 17%) and best binding affinity (0.082 nM vs 14 nM), representing roughly 170-fold tighter binding.
Four of the seven successful targets yielded binders with binding affinities in the sub-nanomolar to low-nanomolar range: IL-7Rα at 82 picomolar, PD-L1 at 180 picomolar, VEGF-A at 480 picomolar, and TrkA at 960 picomolar. These affinities were achieved without high-throughput screening and without experimental optimization rounds, meaning the designs from the first round of testing were already suitable for research use.
Before AlphaProteo, the dominant computational pipeline for de novo binder design combined RFdiffusion for backbone generation with ProteinMPNN for sequence design, followed by AlphaFold2 scoring and ranking. This pipeline was open-source and widely adopted by academic and industry labs.
RFdiffusion treats protein backbone generation as a denoising diffusion problem, iteratively refining a noisy initial backbone configuration into a physically reasonable protein structure given a target binding site. ProteinMPNN then assigns amino acid identities to that backbone using a graph neural network trained on PDB sequences. Together, the two models work sequentially: RFdiffusion proposes geometry, ProteinMPNN designs sequence within that geometry.
AlphaProteo differs in that it generates both structure and sequence jointly, through a single generative model rather than a modular two-step pipeline. The integrated approach avoids compounding errors between the two stages. The paper's authors tested RFdiffusion directly on IL-7Rα, PD-L1, and TrkA using their own yeast display assay to allow direct comparison under identical experimental conditions. On all three targets, AlphaProteo outperformed RFdiffusion.
The table below summarizes the direct head-to-head comparison on the three targets where both systems were benchmarked under identical conditions:
| Target | AlphaProteo success rate | RFdiffusion success rate | AlphaProteo best Kd | RFdiffusion best Kd |
|---|---|---|---|---|
| IL-7Rα | 25% | 17% | 0.082 nM | 14 nM |
| PD-L1 | 15% | 13% | 0.18 nM | 1.6 nM |
| TrkA | 9% | 0% | 0.96 nM | 370 nM |
On TrkA, the affinity gap was especially pronounced: AlphaProteo produced binders with Kd values around 1 nM, while RFdiffusion with yeast display screening yielded the best Kd of 370 nM in the authors' testing, a nearly 400-fold difference. For IL-7Rα, the best AlphaProteo binder was approximately 170 times more potent than the best RFdiffusion binder.
One important caveat is that AlphaProteo is not publicly available as open-source software. RFdiffusion and ProteinMPNN are both freely available, which means the research community can run, modify, and extend them. AlphaProteo was offered through a limited trusted tester program with API-style access through Jupyter notebooks hosted on GitHub, but not as a software release. This distinction matters for academic adoption.
Four of the SC2RBD binders were tested for functional neutralization of live SARS-CoV-2 virus in collaboration with researchers at the Francis Crick Institute. The binders inhibited viral infection of human cells, with EC50 values (the concentration required to inhibit 50% of infection) in the range of 89 to 300 nanomolar for the ancestral SARS-CoV-2 strain. Two of the binders showed cross-reactive neutralization activity across three tested viral strains.
Cryo-electron microscopy structures were determined for four SC2RBD-spike complexes, with resolution ranging from 4.5 to 6.0 angstroms. The backbone root-mean-square deviation (Cα RMSD) between the experimentally determined binder structures and the design models ranged from 0.84 to 3.14 angstroms, confirming that the designed proteins fold approximately as intended.
For VEGF-A, an X-ray crystal structure of one binder (GDM_VEGFA_71) was solved at 2.65 angstroms resolution, showing a binder Cα RMSD of 0.78 angstroms relative to the design model. This level of structural accuracy is comparable to what is typically observed in computational protein design validated by crystallography.
Protein binders have a broad range of potential uses in pharmaceutical research and development. The AI in drug discovery field has increasingly focused on generative protein design as a path toward faster and cheaper lead identification.
Protein therapeutics represent one major application area. Engineered protein binders can block harmful protein-protein interactions, block receptor activation by a pathological ligand, or deliver a payload (such as a cytotoxin) to a targeted cell type. The success of AlphaProteo on PD-L1 is particularly relevant here: PD-L1 is the target of several approved cancer immunotherapy drugs, including atezolizumab and durvalumab. Protein binders that block PD-L1 can prevent it from suppressing the immune system's response to tumors. The AlphaProteo binder against PD-L1 with a Kd of 0.18 nM is substantially more potent than some approved antibodies, though binding affinity alone does not determine clinical utility.
Diagnostics and biosensors represent a second major application area. Protein binders with high affinity and selectivity can serve as capture agents in rapid tests, replacing antibodies in lateral flow assays or enzyme-linked immunosorbent assay (ELISA) platforms. Computational design allows binders to be created for targets where natural antibodies may be difficult to raise in animals, or for targets that require very precise epitope specificity.
Cell and tissue imaging is a third area where tight, stable protein binders can be useful. Binders fused to fluorescent proteins can be used to track the location of target proteins in live cells, a technique called intrabodies or nanobody-based imaging when applied with small antibody-like scaffolds.
Isomorphic Labs, a Google DeepMind sister company focused on AI-driven drug discovery, was noted in the AlphaProteo blog post as exploring applications of the technology for pharmaceutical design. Isomorphic Labs has developed the Drug Design Engine, which applies machine learning models to the full drug discovery pipeline.
The AlphaProteo paper identifies several important limitations.
First, the system requires a high-resolution experimental crystal structure of the target protein as input. Proteins for which only homology models or low-resolution structures are available may produce less accurate binder designs. This restricts the immediate applicability of AlphaProteo to well-characterized targets with existing structural data, which excludes a large fraction of medically relevant proteins.
Second, the system failed on TNFα despite testing 54 candidate binders after filtering. TNFα is a trimeric cytokine associated with rheumatoid arthritis, Crohn's disease, and other autoimmune conditions, and is the target of several bestselling drugs including adalimumab (Humira). The paper notes that computational analysis indicated unusual in silico difficulty for TNFα, likely related to its homotrimeric structure and the geometry of its binding surfaces. This failure illustrates that some biologically important targets remain out of reach.
Third, specificity testing was limited in scope. For many intended applications, a binder must be shown not just to bind its intended target tightly but to not bind other proteins in the proteome. The paper acknowledges that thorough specificity testing against the full proteome was not carried out for most binders, and that such testing would be needed before downstream research or therapeutic applications.
Fourth, the number of validated targets is small. Only eight targets were tested, and the broader generalizability of AlphaProteo's performance across the full diversity of protein shapes and surface chemistries is unknown. The seven targets on which it succeeded may not be representative of the hardest cases.
Fifth, the machine learning methods are not disclosed. This makes it impossible for outside researchers to replicate the approach, understand its failure modes at a technical level, or build directly on the architecture. The biosecurity and commercial motivations cited are understandable, but the opacity distinguishes AlphaProteo from most academic work in the field.
Sixth, all binders in this study were designed as standalone proteins of 50 to 140 amino acids. The system was not evaluated on more complex design tasks such as designing binders that function as enzymes, scaffolding proteins into assemblies, or designing within constrained contexts such as membrane-embedded environments.
AlphaProteo was published at a moment of rapid progress in computational protein binder design. The Baker Laboratory's RFdiffusion and ProteinMPNN pipeline, released in 2023, had already improved success rates dramatically over earlier Rosetta-based methods. Around the same time as AlphaProteo, Chai Discovery (a startup founded by former OpenFold researchers) developed Chai-2, another closed-source binder design system.
In early 2025, a system called BindCraft, developed at ETH Zurich by Pacesa and colleagues and published in Nature, reported success rates ranging from 10% to 100% on various targets, claiming to outperform both AlphaProteo and RFdiffusion on some benchmarks. BindCraft is open-source, which allowed direct replication by external labs. These competing results illustrate that the field was advancing rapidly around the time of AlphaProteo's announcement and that the performance gap was contested.
By mid-2025, further systems including SeedProteo (ByteDance Seed) and Latent-X (an atom-level frontier model) had been reported, continuing the trend of rapid iteration in this subfield.
The broader landscape of AI in drug discovery situates AlphaProteo within a wave of generative AI tools aimed at compressing the lead identification phase of pharmaceutical development. Traditional drug discovery timelines from target identification to clinical candidate selection typically span 3 to 6 years. Computational binder design, if reliable and general enough, could reduce the experimental screening burden substantially.
At the time of the September 2024 announcement, AlphaProteo was not released as open-source software or as a public API. Instead, Google DeepMind launched a trusted tester program for selected research institutions and pharmaceutical partners. Access was provided through Jupyter notebooks hosted in the google-deepmind/ap_trusted_testers repository on GitHub, which provided an interface for preparing input structures, defining hotspot residues, and submitting design requests. The program was intended to let early users validate the system on their own targets while DeepMind gathered feedback and assessed safety considerations.
The AlphaProteo paper lists 32 co-authors. On the Google DeepMind side, key contributors include Vinicius Zambaldi and David La (co-lead authors), Alexander E. Chu, Harshnira Patani, Amy E. Danson, Tristan O. C. Kwan, Thomas Frerix, Rosalia G. Schneider, David Saxton, Ashok Thillaisundaram, Zachary Wu, Eliseo Papa, Gabriella Stanton, Victor Martin, Sukhdeep Singh, Lai H. Wong, Simon A. Kohl, Josh Abramson, Andrew W. Senior, Rob Fergus, and Jue Wang. Demis Hassabis and Pushmeet Kohli are named as senior authors.
From the Francis Crick Institute in London, contributors include Yilmaz Alguel, Mary Y. Wu, Irene M. Aspalter, Katie Bentley, David L. V. Bauer, and Peter Cherepanov. The Crick team was responsible for cell-based assays and cryo-EM structural validation.
Several extensions of AlphaProteo's approach were discussed or implied in the paper and accompanying blog post. Addressing the TNFα failure case through further model development or alternative approaches is an obvious near-term goal. Expanding the set of validated targets to cover a broader range of protein topologies, surface geometries, and disease areas would strengthen confidence in the system's generalizability.
Improving specificity prediction, so that off-target binding can be assessed computationally before laboratory testing, would be important for therapeutic applications. Developing methods that work from lower-quality structural inputs, such as homology models rather than crystal structures, could broaden the set of tractable targets.
The integration of binder design with downstream optimization tasks, such as improving stability, extending serum half-life, or reducing immunogenicity, represents a further frontier. Current AlphaProteo designs are first-round candidates; turning them into clinical-grade therapeutic proteins would require additional engineering steps. Whether AlphaProteo or descendant systems can handle these steps in a single generative pass remains an open question.
The broader question of whether closed-source systems like AlphaProteo or the open-source community around RFdiffusion and BindCraft will produce more durable scientific progress is a live debate. The paper's authors acknowledge the tension but defend the non-disclosure on biosecurity and commercial grounds.