RFdiffusion

AI for Science Diffusion Models Drug Discovery

22 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

28 citations

Revision

v4 · 4,397 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

RFdiffusion (short for RoseTTAFold diffusion) is a deep-learning system for de novo protein design developed at the University of Washington Institute for Protein Design (IPD) in David Baker's lab. It is built by fine-tuning the RoseTTAFold structure prediction network as a denoising diffusion probabilistic model that operates on residue rigid-body frames in SE(3)-equivariant space, generating new protein backbones conditioned on user-supplied design targets such as fold topology, symmetry, binding hotspots, or functional motifs.^[1]^[2] Reported in Nature on 31 August 2023 by Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim and colleagues, the model designed de novo protein binders against five unrelated targets with an estimated 19 percent experimental success rate while testing fewer than 100 designs per target, roughly an order-of-magnitude improvement over previous physics-based and hallucination-based protein design methods.^[1]^[3] RFdiffusion has been released under an open-source BSD license through the RosettaCommons GitHub organization and has spawned a family of follow-up tools, including RFdiffusion All-Atom and RFantibody.^[4]^[5]^[6] The work contributed to the 2024 Nobel Prize in Chemistry awarded to David Baker "for computational protein design," and RFdiffusion is widely regarded as the defining example of how generative diffusion can be repurposed beyond image generation to a high-stakes scientific domain.^[23]^[28]

Infobox

Property	Value
Type	Generative diffusion model for protein backbone design
Developer	Baker Lab, UW Institute for Protein Design, with collaborators at Columbia and MIT
Initial preprint	10 December 2022 (bioRxiv 2022.12.09.519842)^[7]
Code release	30 March 2023 (open source, BSD)^[4]
Peer-reviewed publication	Nature vol. 620, pp. 1089-1100, 31 August 2023^[1]
Backbone architecture	Fine-tuned RoseTTAFold three-track network with SE(3) equivariant attention^[1]^[2]
Sequence design partner	ProteinMPNN, with AlphaFold2 used for in silico filtering^[8]
Binder design success rate	~19% experimental hit rate, <100 designs tested per target^[1]
License	BSD (free for non-profit and commercial use)^[5]
Repository	github.com/RosettaCommons/RFdiffusion^[5]

What is RFdiffusion?

RFdiffusion is a generative artificial-intelligence model that designs entirely new proteins, structures that do not exist in nature, by reversing a noise process: it starts from a random cloud of residue frames and iteratively denoises them into a coherent, foldable protein backbone that satisfies a user's design specification.^[1]^[2] The name combines RoseTTAFold, the structure-prediction network it is fine-tuned from, with diffusion, the generative paradigm behind image models such as Stable Diffusion and DALL-E. In the authors' own framing, the breakthrough was carrying diffusion across that domain gap. The Nature abstract notes that "diffusion models have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships," and reports that fine-tuning RoseTTAFold on denoising tasks "yields a generative model of protein backbones that achieves outstanding performance" across an unusually wide range of design problems.^[1]

Background

De novo protein design, the task of computationally generating an amino-acid sequence that folds to a target structure and performs a desired function, was historically dominated by physics-based search inside the Rosetta modeling suite. The traditional Rosetta pipeline used Monte Carlo sampling against a hand-crafted energy function, then exhaustive experimental screening of tens of thousands of candidates per project, and even then yielded only modest success rates on hard problems such as protein binder design.^[3]^[9]

The launch of AlphaFold in 2020 and RoseTTAFold in 2021 demonstrated that deep neural networks could predict native protein structures with near-experimental accuracy, suggesting that the same networks could be inverted to drive design.^[10]^[11] Two early "structure-prediction-as-design" approaches emerged in the Baker lab: hallucination, which uses gradient descent through the network to optimize a sequence whose predicted structure matches a target, and inpainting (RFjoint), which conditions RoseTTAFold on partial structural inputs and fills in the rest.^[3] Both approaches scaled poorly: hallucination became unstable beyond about 100 residues, and inpainting could not perform unconstrained generation of long, novel structures.^[12]

In parallel, diffusion models led by DDPM (Ho, Jain and Abbeel, 2020) had become the dominant generative paradigm for images, powering systems such as Stable Diffusion, DALL E, and Midjourney.^[13] The Baker-lab team explicitly drew the analogy: just as a text-to-image diffusion model can be conditioned by a prompt to denoise a random pixel grid into a coherent picture, a structure diffusion model could be conditioned on a protein-design specification to denoise random residue frames into a coherent protein backbone. As the IPD release put it, "drawing inspiration from image generation tools like DALL-E, the team developed RFdiffusion as a guided diffusion model for protein design."^[4]^[15] Several groups, including Trippe et al. on bioRxiv and Yim et al. on SE(3) diffusion, showed that the same denoising framework could be ported to protein backbones expressed as sequences of rigid frames.^[14] RFdiffusion combined these threads. By fine-tuning RoseTTAFold's pretrained weights as a denoiser, the IPD team reused all of the structural inductive biases the network had already learned for prediction, while adapting it to the very different task of producing realistic protein geometries from pure noise under user-specified conditioning.^[1]^[7] The development team has described RFdiffusion's design as the moment generative diffusion fully arrived in structural biology, and the broader research literature treats it as the canonical example of repurposing a structure-prediction backbone as a generative model.^[2]^[23]

When was RFdiffusion released?

Date	Event
1 December 2022	IPD posts a blog announcement, "A diffusion model for protein design"^[15]
10 December 2022	Preprint posted on bioRxiv as "Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models"^[7]
30 March 2023	Code released under open-source BSD license on GitHub and via ColabFold^[4]
11 July 2023	Advance Online Publication of the Watson et al. Nature paper^[1]
31 August 2023	Issue date of the Nature paper, vol. 620, pp. 1089-1100^[1]
9 October 2023	RoseTTAFold All-Atom (RFAA) and RFdiffusion All-Atom (RFdiffusionAA) posted on bioRxiv^[16]
19 December 2023	Vázquez Torres et al. Nature report picomolar peptide binders via partial diffusion^[17]
7 March 2024	RoseTTAFold All-Atom published in Science (Krishna et al.)^[6]
9 October 2024	David Baker shares the Nobel Prize in Chemistry "for computational protein design"^[28]

How does RFdiffusion work?

From RoseTTAFold to denoiser

RoseTTAFold is a three-track architecture that exchanges information between a one-dimensional sequence track, a two-dimensional pairwise residue track, and a three-dimensional structural track. During structure prediction, the structure track outputs the rigid frame of each residue, defined by a backbone Cα coordinate and an N to Cα to C orientation matrix that fixes the rotation of the residue in space.^[1]^[11] RFdiffusion keeps this exact frame representation but re-purposes the network as the denoising function of a diffusion probabilistic model.^[1]^[2]

In the forward (noising) process, a true protein structure from the Protein Data Bank is gradually corrupted by independent perturbations of each residue's translation and rotation: 3D Gaussian noise on the Cα coordinate, and Brownian motion on the manifold of rotation matrices SO(3) for the orientation.^[1]^[2] After roughly 200 timesteps the structure is reduced to an effectively random cloud of frames.^[1] In the reverse (denoising) process, RFdiffusion is asked, at each timestep, to predict the corresponding noiseless structure given the current noisy frames and any conditioning information. The predicted structure is then mixed back into the noise schedule to produce the input to the next step. Iterating from random noise to t equals 0 yields a fully formed protein backbone.^[1]^[2]

SE(3) equivariance and self-conditioning

Because rotating or translating a protein leaves its identity unchanged, the denoiser is built to be SE(3)-equivariant: applying a rigid transformation to the input frames produces exactly the same transformation in the output. RFdiffusion inherits this property from RoseTTAFold's invariant-point-attention (IPA) modules and from NVIDIA's SE(3)-Transformer implementation, which it uses inside its equivariant graph operations.^[2]^[5] Translations are perturbed and predicted in Cα coordinates with mean-squared-error loss, while rotations are handled on SO(3).^[1]^[2]

A central trick is self-conditioning, inspired by AlphaFold2's "recycling": at each denoising step the network is given not only the noisy frames but also the structure it predicted at the previous step. This dramatically stabilizes long trajectories and is critical for achieving high in silico designability on long monomers.^[1]^[2] The model is fine-tuned from pretrained RoseTTAFold weights rather than trained from scratch, which both shortens training and lets the network exploit structural priors learned from the entire PDB.^[1] Training uses a mean-squared-error loss on Cα coordinates and on the rotation prediction in SO(3), rather than the frame-aligned point error (FAPE) used by AlphaFold2 for structure prediction; the authors found that this simpler objective worked better for the denoising setting and that the resulting model recovered the local stereochemistry of natural proteins (Ramachandran statistics, sensible secondary-structure proportions, well-packed cores) despite never being trained explicitly on those properties.^[1]^[2] Training is performed on roughly 100,000 high-resolution Protein Data Bank chains between 60 and 512 residues, with various data-augmentation schemes including random cropping and motif-conditioning examples.^[2]

What can RFdiffusion design?

Unconditional RFdiffusion just denoises random noise into any plausible monomer. The real power of the system is that the same network supports a wide range of conditioning signals, all expressed by selectively freezing or partially noising features of the input frames or pair representation:^[1]^[2]

Motif scaffolding: hold a small fixed motif (for example a catalytic triad or epitope) and ask the model to design the rest of the protein around it.
Symmetric oligomer design: apply a symmetry operator (cyclic, dihedral, tetrahedral, octahedral, icosahedral) during denoising so that the resulting protein assembly has the requested symmetry.
Binder design: provide a target protein structure and one or more "hotspot" residues, and let RFdiffusion grow a binder against the target.
Fold conditioning: specify secondary-structure elements or a target topology (for example a TIM barrel or NTF2 fold).
Partial diffusion: take a starting design and noise only partway, then denoise back, generating diversified variants of the original.

The pipeline is two-stage. RFdiffusion outputs a backbone; ProteinMPNN, a separate graph-based sequence design network from the Baker lab, then proposes amino-acid sequences for that backbone; AlphaFold2 or RoseTTAFold predicts the structure of each candidate sequence and filters those whose prediction does not match the design model.^[1]^[8] This sequence design and prediction loop has become the de facto standard "RFdiffusion plus ProteinMPNN plus AF2" workflow used across the field.^[8]^[9]

How well does RFdiffusion perform experimentally?

The Nature paper accompanies its computational claims with a substantial wet-lab validation campaign, in which hundreds of designs across multiple problem classes were expressed and characterized.^[1]^[3]

Unconditional monomers

RFdiffusion produces diverse monomeric structures from roughly 50 to 600 residues, with AlphaFold2 self-consistency RMSDs of about 0.5 to 1.7 Å up to 400 residues, after which AlphaFold2 agreement begins to deteriorate.^[1]^[18] Generation is fast: a 100-residue protein can be generated in about 11 seconds on an NVIDIA RTX A4000 GPU, compared to about 8.5 minutes for RoseTTAFold hallucination at the same length.^[18]

Motif scaffolding

On a published benchmark of 25 motif-scaffolding problems, RFdiffusion solved 23, including challenging tasks such as scaffolding the catalytic site of a retroaldolase and the binding loops of complex epitopes.^[1] Designed proteins built around immunogenic motifs were experimentally shown to recapitulate the structure of the motif within near-atomic accuracy.^[1]

Symmetric oligomers

RFdiffusion produced cyclic (Cn), dihedral (Dn), tetrahedral, octahedral, and icosahedral assemblies that matched their designed symmetry by negative-stain and cryo-electron microscopy in dozens of independent cases.^[1]^[3] Symmetric metal-binding cages with C4 and other symmetries were designed to coordinate ions such as Ni2+ with low-nanomolar dissociation constants.^[1]

Binder design

The most discussed result is protein binder design. RFdiffusion was used to generate de novo binders against multiple unrelated targets, including influenza hemagglutinin (HA), interleukin-7 receptor alpha (IL-7Rα), insulin receptor (InsR), PD-L1, and tropomyosin receptor kinase A (TrkA).^[1] Across these targets, experimental hit rates from a single design pool reached approximately 19 percent for some campaigns while testing fewer than 100 designs per target, with measured affinities ranging from nanomolar to picomolar, and roughly a tenfold improvement over the prior Cao et al. Rosetta-based binder design pipeline.^[1]^[3]^[9] A cryo-EM structure of an RFdiffusion-designed HA binder bound to influenza hemagglutinin matched the design model within 0.63 Å backbone RMSD, confirming atomic-level accuracy.^[1]^[3]

In binder mode, the user provides the target structure, optional hotspot residues, and a desired length for the binder. RFdiffusion then grows a binder backbone that contacts the chosen hotspots. The IPD team highlighted the practical implication: with prior Rosetta-based binder design, researchers typically had to express and test tens of thousands of designs per campaign to discover a single confirmed binder, whereas the RFdiffusion pipeline could deliver experimentally validated binders from pools as small as 96 to a few hundred designs.^[3] This compression of the design-build-test loop is one of the central reasons RFdiffusion is described in popular coverage as transforming protein engineering rather than merely improving it.^[23]^[26]

Enzyme active-site scaffolding

The paper demonstrates that RFdiffusion can scaffold catalytic constellations of residues for multiple enzyme classes, holding the spatial geometry of the active site fixed while designing a stable surrounding fold.^[1] This established the foundation for follow-up work on de novo enzyme design with subsequent RFdiffusion variants, and dovetails with parallel efforts on enzyme design that leveraged the underlying RoseTTAFold network for inverse-folding tasks.^[19]

Designability and novelty

The authors report that RFdiffusion designs are highly designable (a high fraction of generated backbones admit at least one sequence whose predicted structure agrees with the design) and at the same time produce backbones that are dissimilar to anything in the PDB by TM-score, indicating genuine de novo generation rather than retrieval or paraphrase of known structures.^[1]^[18] On the unconditional generation benchmark, designability at length 100 reaches scTM scores around 0.97 and remains above 0.9 even at length 500, with novelty (lowest TM-score to PDB) decreasing monotonically as length increases, consistent with the intuition that longer designs explore more novel topologies.^[2]^[18]

What are the RFdiffusion variants and follow-ups?

RFdiffusion is best understood as the founding member of a family of related Baker-lab generative models.

Bennett et al. 2023: deep learning improves binder design tenfold

A companion paper by Nathaniel R. Bennett, Brian Coventry, Inna Goreshnik and colleagues, "Improving de novo protein binder design with deep learning" (Nature Communications, 6 May 2023), showed that adding AlphaFold2 or RoseTTAFold post-hoc filters and ProteinMPNN sequence design on top of an otherwise Rosetta-style binder pipeline already raises experimental success rates roughly tenfold.^[9] This work supplied much of the validation methodology used in the RFdiffusion Nature paper.

Vázquez Torres et al. 2023: high-affinity peptide binders

Vázquez Torres, Leung, Venkatesh and collaborators reported, also in Nature in December 2023, the use of RFdiffusion (with partial-diffusion refinement) to generate de novo binders against intrinsically disordered bioactive peptides such as parathyroid hormone and glucagon, reaching picomolar affinities.^[17]

RoseTTAFold All-Atom and RFdiffusion All-Atom

Limitations of the original RFdiffusion (it sees only the protein backbone, not ligands, ions, nucleic acids, or post-translational modifications) motivated an "all-atom" successor. Krishna, Wang, Ahern and colleagues, "Generalized biomolecular modeling and design with RoseTTAFold All-Atom" (Science vol. 384, 7 March 2024), introduced RoseTTAFold All-Atom (RFAA), which models entire biological assemblies including small molecules, metals, DNA, RNA and covalent modifications, and RFdiffusion All-Atom (RFdiffusionAA), which extends the RFdiffusion design framework to grow new protein binders around small-molecule targets.^[6]^[16] The team experimentally validated proteins that bind the cardiac drug digoxigenin, the cofactor heme, and bilin chromophores relevant to engineered photosynthesis.^[6]^[16]

RFdiffusion2 and RFdiffusion3

Subsequent versions, RFdiffusion2 (inference code released September 2025) and RFdiffusion3 (announced December 2025), generalize the all-atom design framework further. They report improved enzyme design from sequence-level prompts, as well as designed DNA-binding proteins.^[20]^[21]

RFantibody

RFantibody, a Baker-lab system that fine-tunes the RFdiffusion machinery on antibody Fv complexes, was published in Nature in late 2025 and represents the first reported atomically accurate de novo design of antibodies entirely from scratch.^[22]

What is RFdiffusion used for?

RFdiffusion has rapidly become a default tool inside academic and industrial protein engineering. By the time of the paper's publication in mid-2023, multiple Baker-lab spinouts (including Xaira Therapeutics, Vilya, A-Alpha Bio, and others affiliated with the IPD) had begun integrating RFdiffusion or its derivatives into their internal therapeutic-discovery pipelines, and use spread quickly into academic groups outside Seattle. As of 2024, the GitHub repository was among the most-starred biology-oriented machine-learning projects on the platform, and NVIDIA, Microsoft, and other cloud vendors began offering hosted versions through their managed scientific-AI services.^[4]^[24]^[23]

Therapeutic protein binders: nanomolar to picomolar binders against cytokine receptors, viral surface proteins (influenza HA), checkpoint molecules (PD-L1), and disordered peptide hormones, intended as leads for biologics and diagnostics.^[1]^[17]
Vaccine and immunogen design: scaffolding of viral epitopes onto stable de novo carriers, an area where motif scaffolding is central.^[1]
Symmetric nanomaterials and protein cages: synthetic capsids and metal-binding cages with controlled symmetry, validated by electron microscopy.^[1]^[3]
Enzyme design: positioning catalytic residues on stable scaffolds, including the prompt-driven enzyme design pipelines reported with RFdiffusion2 and RFdiffusion3.^[1]^[20]
AI drug discovery pipelines that combine structure prediction with generative modeling for binder optimization, used by startups and pharma groups around the Baker lab and its commercial spinouts.^[4]^[23]
Cloud and platform deployments: NVIDIA distributes an RFdiffusion model as part of its BioNeMo NIM microservice catalog, lowering the entry barrier for industrial users.^[24]

What are RFdiffusion's limitations?

RFdiffusion is powerful but not magic, and several limitations are well documented in both the paper and follow-up work.

Backbone-only generation: the original RFdiffusion sees no side chains, ligands, ions, or nucleic acids; getting around this required the all-atom successors.^[6]^[16]
Two-stage pipeline: structure and sequence are designed sequentially (RFdiffusion plus ProteinMPNN plus AF2), which can produce backbones that no sequence folds to. ProteinMPNN and AlphaFold2 filtering are essential, and the workflow remains imperfect.^[1]^[8]
Variable success across targets: while binder design against HA, IL-7Rα and a few other targets achieved roughly 19 percent experimental hit rates, performance on harder targets is much lower, and independent benchmarks have reported sub-percent functional hit rates on some biochemical-detection targets.^[25]
Reliance on PDB-like training data: membrane proteins, very long monomers (beyond about 400 residues), non-canonical amino acids, and highly flexible regions are underrepresented in training and remain weaknesses.^[2]^[18]
Reproducibility caveats: like many large neural design tools, exact reproduction of paper-level results depends on specific hyperparameter settings, large GPU resources, and the surrounding ProteinMPNN and AlphaFold2 pipelines.^[2]
Biosecurity considerations: the ability to generate, on demand, binders against arbitrary proteins has prompted ongoing discussion in the protein-design community about responsible release and dual-use risk, including a 2023 voluntary commitment by Baker-lab leadership to safety practices.^[23]

Method	Year	Approach	Notes
Rosetta-based design	1990s onward	Physics-based Monte Carlo over fragments	Workhorse before deep learning; very large screening libraries^[3]
RoseTTAFold hallucination	2021-2022	Gradient descent through prediction net	Effective up to about 100 residues^[3]
RFjoint inpainting	2022	Conditional fill-in of structure with RoseTTAFold	Strong on partial-structure tasks but not unconstrained generation^[3]
Chroma (Generate Biomedicines)	2023	Programmable diffusion model for proteins	Concurrent with RFdiffusion^[23]
FrameDiff / SE(3) diffusion (Yim et al.)	2023	SE(3) diffusion trained from scratch	Inspired RFdiffusion's noise schedule^[14]
RFdiffusion	2023	RoseTTAFold fine-tuned as SE(3) denoiser	Roughly 10x experimental improvement over prior Baker-lab pipelines^[1]^[3]
RFdiffusion All-Atom	2024	All-atom extension with small molecules and cofactors	Protein-ligand binder design^[6]
RFantibody	2025	Antibody-specialized RFdiffusion	First atomically accurate de novo antibody design^[22]

How was RFdiffusion received?

The Nature paper attracted broad media attention. The news article in the same issue, "AI tools are designing entirely new proteins that could transform medicine" (12 July 2023), described diffusion-based protein design as representing an "explosion in capabilities" and surveyed work from the Baker lab and competing groups, quoting Gevorg Grigoryan of Generate Biomedicines and Mohammed AlQuraishi at Columbia among others.^[26] Quanta Magazine's feature "How AI Revolutionized Protein Science, but Didn't End It" (Yasemin Saplakoglu, 26 June 2024) likewise placed RFdiffusion at the center of the post-AlphaFold wave of generative design, quoting David Baker and others on the shift from prediction to programmable design.^[23] Trade publications such as Chemical & Engineering News covered the broader rise of generative protein design and the role of RFdiffusion within it.^[27]

On 9 October 2024, David Baker was awarded one half of the Nobel Prize in Chemistry "for computational protein design," with the other half shared by Demis Hassabis and John Jumper of Google DeepMind for protein structure prediction. The Nobel Committee said Baker "has succeeded with the almost impossible feat of building entirely new kinds of proteins," the line of work that produced RFdiffusion, and Baker responded that "we have entered an era where we can not only understand biological systems but also create new ones."^[28]

The community impact is also visible in the academic literature. RFdiffusion accumulated thousands of citations within the first two years of publication and became a baseline against which essentially every subsequent generative protein-design method is compared, including flow-matching variants, sparse-denoising models, antibody-specific systems, and combined sequence-structure co-design frameworks.^[2] Critics have noted that the reliance on a single hand-engineered pipeline (RFdiffusion plus ProteinMPNN plus AlphaFold2 filtering) can mask the limits of each component, and that head-to-head benchmarks on functional binder design have at times favored newer AlphaFold-driven design methods such as BindCraft for certain target classes.^[25] Despite this, RFdiffusion remains the canonical reference point for diffusion-based protein generation and the entry point for most newcomers to deep-learning-based protein design.^[23]

ELI5: explain RFdiffusion like I'm five

Imagine a sculptor who starts with a shapeless lump of clay and slowly carves it into a statue. RFdiffusion does the same thing with proteins, the tiny machines that run living cells. It begins with a random, jumbled blob of atoms and, step by step, cleans up the mess until a brand-new, useful protein appears, one that has never existed in nature. You can even give it instructions, like "make something that sticks to this virus" or "make a perfectly symmetric cage," and it builds a protein to match. It learned how to do this by first studying how real proteins fold (using a tool called RoseTTAFold), which is why the new shapes it dreams up actually hold together in the lab.

References

Watson, J. L.; Juergens, D.; Bennett, N. R.; Trippe, B. L.; Yim, J.; Eisenach, H. E.; Ahern, W.; et al., "De novo design of protein structure and function with RFdiffusion", *Nature* 620(7976):1089-1100, 2023-08-31. https://www.nature.com/articles/s41586-023-06415-8. Accessed 2026-06-25. ↩
EmergentMind, "RFDiffusion: SE(3)-Equivariant Protein Design", topic page, 2026. https://www.emergentmind.com/topics/rfdiffusion. Accessed 2026-06-25. ↩
Baker Lab, "RFdiffusion: A generative model for protein design", bakerlab.org, 2023-07-11. https://www.bakerlab.org/2023/07/11/diffusion-model-for-protein-design/. Accessed 2026-06-25. ↩
Institute for Protein Design, "RFdiffusion now free and open source", ipd.uw.edu, 2023-03-30. https://www.ipd.uw.edu/2023/03/rf-diffusion-now-free-and-open-source/. Accessed 2026-06-25. ↩
RosettaCommons, "RFdiffusion (Code for running RFdiffusion)", GitHub repository README, 2023 onward. https://github.com/RosettaCommons/RFdiffusion. Accessed 2026-06-25. ↩
Krishna, R.; Wang, J.; Ahern, W.; et al., "Generalized biomolecular modeling and design with RoseTTAFold All-Atom", *Science* 384(6693), 2024-03-07. https://www.science.org/doi/10.1126/science.adl2528. Accessed 2026-06-25. ↩
Watson, J. L.; Juergens, D.; Bennett, N. R.; et al., "Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models", bioRxiv 2022.12.09.519842, 2022-12-10. https://www.biorxiv.org/content/10.1101/2022.12.09.519842v1. Accessed 2026-06-25. ↩
Ranomics, "ProteinMPNN: Sequence Design Explained", ranomics.com, 2024. https://www.ranomics.com/resource-hub/proteinmpnn-sequence-design-explained. Accessed 2026-06-25. ↩
Bennett, N. R.; Coventry, B.; Goreshnik, I.; Huang, B.; Allen, A.; Vafeados, D.; et al., "Improving de novo protein binder design with deep learning", *Nature Communications* 14:2625, 2023-05-06. https://www.nature.com/articles/s41467-023-38328-5. Accessed 2026-06-25. ↩
Jumper, J.; Evans, R.; Pritzel, A.; et al., "Highly accurate protein structure prediction with AlphaFold", *Nature* 596:583-589, 2021-07-15. https://www.nature.com/articles/s41586-021-03819-2. Accessed 2026-06-25. ↩
Baek, M.; DiMaio, F.; Anishchenko, I.; et al., "Accurate prediction of protein structures and interactions using a three-track neural network", *Science* 373(6557):871-876, 2021-08-19. https://www.science.org/doi/10.1126/science.abj8754. Accessed 2026-06-25. ↩
Institute for Protein Design, "A diffusion model for protein design", ipd.uw.edu, 2022-12-01. https://www.ipd.uw.edu/2022/12/a-diffusion-model-for-protein-design/. Accessed 2026-06-25. ↩
Ho, J.; Jain, A.; Abbeel, P., "Denoising Diffusion Probabilistic Models", arXiv:2006.11239, 2020-06-19. https://arxiv.org/abs/2006.11239. Accessed 2026-06-25. ↩
Yim, J.; Trippe, B. L.; De Bortoli, V.; Mathieu, E.; Doucet, A.; Barzilay, R.; Jaakkola, T., "SE(3) diffusion model with application to protein backbone generation", arXiv:2302.02277, 2023-02-05. https://arxiv.org/abs/2302.02277. Accessed 2026-06-25. ↩
Institute for Protein Design, "A diffusion model for protein design", ipd.uw.edu, 2022-12-01. https://www.ipd.uw.edu/2022/12/a-diffusion-model-for-protein-design/. Accessed 2026-06-25. ↩
Institute for Protein Design, "Introducing All-Atom versions of RoseTTAFold and RFdiffusion", ipd.uw.edu, 2023-10-30. https://www.ipd.uw.edu/2023/10/introducing-rosettafold-and-rfdiffusion-all-atom/. Accessed 2026-06-25. ↩
Vázquez Torres, S.; Leung, P. J. Y.; Venkatesh, P.; et al., "De novo design of high-affinity binders of bioactive helical peptides", *Nature* 626:435-442, 2023-12-18. https://www.nature.com/articles/s41586-023-06953-1. Accessed 2026-06-25. ↩
CBIRT, "Unveiling the Future of Protein Design: RFdiffusion's De Novo Revolution", cbirt.net, 2023. https://cbirt.net/unveiling-the-future-of-protein-design-rfdiffusions-de-novo-revolution/. Accessed 2026-06-25. ↩
Baker Lab, "Introducing All-Atom versions of RoseTTAFold and RFdiffusion", bakerlab.org, 2023-10-30. https://www.bakerlab.org/2023/10/30/introducing-all-atom-versions-of-rosettafold-and-rfdiffusion/. Accessed 2026-06-25. ↩
RosettaCommons, "RFdiffusion2 is Now Available On GitHub", rosettacommons.org, 2025-09-15. https://rosettacommons.org/2025/09/15/rfdiffusion2-is-now-available-on-github/. Accessed 2026-06-25. ↩
Institute for Protein Design, "RFdiffusion3 now available", ipd.uw.edu, 2025-12. https://www.ipd.uw.edu/2025/12/rfdiffusion3-now-available/. Accessed 2026-06-25. ↩
Institute for Protein Design, "Teaching AI to build antibodies from scratch (RFantibody in Nature)", ipd.uw.edu, 2025-11. https://www.ipd.uw.edu/2025/11/rfantibody-in-nature/. Accessed 2026-06-25. ↩
Saplakoglu, Y., "How AI Revolutionized Protein Science, but Didn't End It", *Quanta Magazine*, 2024-06-26. https://www.quantamagazine.org/how-ai-revolutionized-protein-science-but-didnt-end-it-20240626/. Accessed 2026-06-25. ↩
NVIDIA, "rfdiffusion Model by IPD", NVIDIA NIM model card, build.nvidia.com, 2024. https://build.nvidia.com/ipd/rfdiffusion/modelcard. Accessed 2026-06-25. ↩
bioRxiv, "RFdiffusion Exhibits Low Success Rate in De Novo Design of Functional Protein Binders for Biochemical Detection", preprint 2025.02.07.636769, 2025. https://www.biorxiv.org/content/10.1101/2025.02.07.636769. Accessed 2026-06-25. ↩
Institute for Protein Design (summarizing *Nature* news), "Nature: AI tools are designing entirely new proteins that could transform medicine", ipd.uw.edu, 2023-07-12. https://www.ipd.uw.edu/2023/07/ai-tools-are-designing-entirely-new-proteins-that-could-transform-medicine/. Accessed 2026-06-25. ↩
Arnaud, C. H., "Generative AI is dreaming up new proteins", *Chemical & Engineering News* 101(12), 2023. https://cen.acs.org/physical-chemistry/protein-folding/Generative-AI-dreaming-new-proteins/101/i12. Accessed 2026-06-25. ↩
NobelPrize.org, "The Nobel Prize in Chemistry 2024 (David Baker, 'for computational protein design')", nobelprize.org, 2024-10-09. https://www.nobelprize.org/prizes/chemistry/2024/press-release/. Accessed 2026-06-25. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributor · full history

Suggest edit

What links here

AlphaFold-Multimer AlphaProteo Boltz-2 Chai-1 Chai-2 Generate Biomedicines Latent Labs RoseTTAFold Xaira Therapeutics

Infobox

What is RFdiffusion?

Background

When was RFdiffusion released?

How does RFdiffusion work?

From RoseTTAFold to denoiser

SE(3) equivariance and self-conditioning

What can RFdiffusion design?

How well does RFdiffusion perform experimentally?

Unconditional monomers

Motif scaffolding

Symmetric oligomers

Binder design

Enzyme active-site scaffolding

Designability and novelty

What are the RFdiffusion variants and follow-ups?

Bennett et al. 2023: deep learning improves binder design tenfold

Vázquez Torres et al. 2023: high-affinity peptide binders

RoseTTAFold All-Atom and RFdiffusion All-Atom

RFdiffusion2 and RFdiffusion3

RFantibody

What is RFdiffusion used for?

What are RFdiffusion's limitations?

How does RFdiffusion compare with related approaches?

How was RFdiffusion received?

ELI5: explain RFdiffusion like I'm five

See also

References

Improve this article

Related Articles

AI in drug discovery

AlphaFold 3

AlphaProteo

Boltz

RoseTTAFold

OpenFold

What links here

Related Articles

AI in drug discovery

AlphaFold 3

AlphaProteo

Boltz

RoseTTAFold

OpenFold

What links here