ESMFold

AI Models AI for Science Meta AI

25 min read

Updated Jun 25, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 25, 2026

Fact-checked

In review queue

Sources

12 citations

Revision

v3 · 5,083 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

ESMFold is a protein structure prediction model from Meta AI's Fundamental AI Research (FAIR) team that predicts the three-dimensional atomic structure of a protein directly from a single amino acid sequence, without the multiple sequence alignment (MSA) search that AlphaFold 2 depends on. ^[1]^[2] It is built on the ESM-2 protein language model (scaled up to 15 billion parameters) and is reported to be up to 60 times faster than prior state-of-the-art folders while maintaining competitive accuracy. ^[1]^[3] Meta used ESMFold to build the ESM Metagenomic Atlas, an open database of more than 617 million predicted protein structures released in 2022. ^[1]^[3]

ESMFold is an end-to-end protein structure prediction model developed by the Meta AI Fundamental AI Research (FAIR) Protein Team that infers atomic-level three-dimensional structures directly from a single amino acid sequence using the ESM-2 protein masked language model as a backbone. ^[1]^[2] Unlike earlier high-accuracy folders such as AlphaFold 2 and RoseTTAFold, ESMFold does not perform a multiple sequence alignment (MSA) search at inference time and does not require an external sequence or template database; structural information is extracted from the embeddings produced by ESM-2, which was pretrained on tens of millions of natural protein sequences from UniRef. ^[1]^[2]^[3] On benchmark sets, the system reaches accuracies competitive with single-sequence variants of AlphaFold 2 while running roughly an order of magnitude faster on a comparable GPU. ^[1]^[4] Meta released a 15-billion-parameter ESM-2 checkpoint together with ESMFold and used the model to fold more than 617 million metagenomic proteins, which were collected into the publicly accessible ESM Metagenomic Atlas. ^[3]^[5]

Infobox

Attribute	Value
Type	Single-sequence protein structure prediction model
Developer	Meta AI FAIR Protein Team
Lead authors	Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Alexander Rives, et al.
Backbone	ESM-2 transformer protein language model (8M to 15B parameters)
Folding head	Folding Trunk (48 blocks) plus Structure Module with Invariant Point Attention
Training data	UniRef50 / UR50/D (ESM-2 pretraining) and Protein Data Bank (folding head)
Pretraining objective	Masked language modeling on amino-acid sequences
Initial preprint	bioRxiv, 21 July 2022 (10.1101/2022.07.20.500902) ^[2]
Journal publication	Science 379(6637):1123-1130, 16 March 2023 (DOI 10.1126/science.ade2574) ^[1]
ESM Metagenomic Atlas launch	1 November 2022 ^[3]
Production checkpoint	facebook/esmfold_v1 on Hugging Face ^[6]
License	MIT (model code), CC BY 4.0 (Atlas data) ^[4]^[6]

What is ESMFold?

ESMFold predicts the full atomic-level structure of a protein from its raw single-letter amino acid sequence, with no MSA, no templates, and no external homology search at inference time. ^[1]^[4] The model has two parts: the ESM-2 protein language model, which produces per-residue embeddings and attention maps from the sequence, and a "folding head" that turns those representations into 3D coordinates and per-residue confidence scores. ^[1]^[2] The central scientific claim of the work is that structure is a learned property of a large language model trained only on sequences: as the authors put it, "As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations." ^[1]

The Hugging Face transformers documentation describes the design as follows: ESMFold "relies on the token embeddings from the large pre-trained protein language model stem and does not perform a multiple sequence alignment (MSA) step at inference time, which means that ESMFold checkpoints are fully 'standalone' and do not require a database of known protein sequences and structures with associated external query tools." ^[4]

Background

The use of unsupervised neural networks for protein sequences was pioneered by Alexander Rives and collaborators at FAIR, who in 2019 posted a preprint titled "Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences," eventually published in PNAS in 2021. ^[7] That work introduced the ESM-1 family of transformer encoders, trained with a masked language modeling objective on UniParc-derived sequence sets, and showed that representations learned without any structural supervision encoded secondary structure, tertiary contacts and remote homology relationships. ^[7] The successor ESM-1b model (33 layers, 650M parameters) used the same masked objective on UniRef50 and became a widely cited baseline for protein representation learning. ^[4]^[7]

The protein structure prediction landscape was reshaped in 2020 to 2021 by DeepMind's AlphaFold 2, which combined deep evolutionary information from a multiple sequence alignment with a transformer-style "EvoFormer" trunk and an SE(3)-equivariant structure module to reach atomic accuracy at the 14th Critical Assessment of protein Structure Prediction (CASP14). ^[1]^[8] RoseTTAFold from David Baker's lab followed shortly afterward with a related three-track architecture. ^[8] Both models rely heavily on MSAs and templates, which are expensive to build and become unreliable for orphan sequences or rapidly evolving viral proteins. ^[8]^[9]

ESMFold was first announced through a bioRxiv preprint posted on 21 July 2022 by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido and Alexander Rives under the title "Language models of protein sequences at the scale of evolution enable accurate structure prediction." ^[2] A second version was posted in October 2022 expanding the analysis to metagenomic sequences; the title was changed to "Evolutionary-scale prediction of atomic level protein structure with a language model" for the Science submission. ^[2] The peer-reviewed paper appeared in Science volume 379, issue 6637, pages 1123 to 1130 on 16 March 2023, with the additional co-authors Nikita Smetanin, Robert Verkuil, Ori Kabeli and Yaniv Shmueli listed on the journal version. ^[1] On 1 November 2022 Meta AI's blog announced the ESM Metagenomic Atlas, releasing more than 600 million predicted structures along with the ESM-2 and ESMFold weights and code. ^[3]

In the weeks following the bioRxiv release, the Hugging Face team ported the ESMFold inference path into the transformers library, simplifying the original implementation (which depended on OpenFold, PyTorch Geometric and several Meta-internal utilities) into a stand-alone EsmForProteinFolding class that can be invoked through standard transformers conventions. ^[4]^[8] The HuggingFace blog and the transformers documentation became the most popular entry point for new ESMFold users, particularly in conjunction with Google Colab notebooks that wrapped a single-GPU inference loop. By the end of 2023, ESMFold had become one of the most-downloaded biology-specific models on the HuggingFace Hub and was integrated into a wide range of downstream applications. ^[6]^[8]

After Meta wound down its dedicated protein team in 2023, several core authors, including Rives, Lin and Hie, co-founded the New York startup EvolutionaryScale, which released ESM3 in June 2024 as a multimodal generative successor. ^[10] The Meta-published ESMFold weights remain freely available on GitHub and Hugging Face. ^[4]^[6]

How does ESMFold work?

Two-stage design

ESMFold is composed of two stacked modules trained largely independently. The first is the ESM-2 transformer language model, pretrained from scratch on raw amino-acid sequences with a masked language modeling objective. The second is a "folding head" that consumes the per-residue embeddings and attention maps from the frozen or lightly fine-tuned ESM-2 stem and outputs atomic coordinates and confidence scores. ^[1]^[2] Unlike AlphaFold 2, no MSA features, templates or external homology search are used at inference time; the input is the raw single-letter amino-acid string of one chain. ^[1]^[4]

ESM-2 protein language model

ESM-2 is an encoder-only Transformer with the same general topology as BERT but applied to protein sequences. The paper trained a family of six model sizes that span more than three orders of magnitude in parameters; the Meta GitHub repository and the Hugging Face transformers integration list them as follows: ^[4]

Model	Parameters	Layers	Hidden dim
ESM-2 (8M)	7.5M	6	320
ESM-2 (35M)	35M	12	480
ESM-2 (150M)	150M	30	640
ESM-2 (650M)	650M	33	1280
ESM-2 (3B)	3B	36	2560
ESM-2 (15B)	15B	48	5120

All variants were pretrained on UniRef50, release 2021_04 (specifically the UR50/D variant that samples sequences from UniRef100 uniformly across UniRef50 clusters), comprising roughly 65 million unique sequences and around 48 million distinct cluster representatives. ^[4]^[11] Training used random masking of 15% of residue positions with a BERT-style masked language modeling loss, in which the model must reconstruct the masked amino acids conditioned on the surrounding sequence. ^[11] Compared with ESM-1b, the 2022 architecture replaces learned absolute positional embeddings with rotary position embedding (RoPE), allowing longer sequences and improved length generalization. ^[4]

Beyond raw perplexity, the central empirical finding of the ESM-2 study is that as the model is scaled from 8M up to 15B parameters the structural information accessible from its representations grows monotonically: linear probes on attention maps recover ever sharper contact predictions, and downstream folding accuracy rises in lock-step with scaling laws established for natural-language transformers. ^[1]^[2] The 15-billion-parameter ESM-2 was the largest publicly released protein language model at the time of publication; Meta described it as "the largest language model of proteins to date." ^[1]^[3]

Folding Trunk

The folding head consumes two tensors derived from the ESM-2 forward pass: a sequence representation of size (L, d_s) built from the final hidden states and a pairwise representation of size (L, L, d_p) initialized from the model's attention maps across heads and layers. These representations are fed into a "Folding Trunk" that consists of 48 stacked folding blocks. Each block applies axial-style sequence and pair attention plus updates that allow information to flow between the two tracks, analogous to a single-sequence simplification of the AlphaFold 2 EvoFormer that drops the MSA dimension and the row-wise gated MSA attention. ^[1]^[4]^[12]

The trunk is run multiple times in a recycling loop, with the default ESMFold v1 configuration using up to four recycles in which the trunk's output structure and pair representation are fed back as additional inputs. After the final recycle, the pair representation is passed to a structure module based on Invariant Point Attention (IPA), the SE(3)-equivariant attention introduced by AlphaFold 2 that operates on rigid-body frames per residue and emits up to 14 atom coordinates per amino acid, covering backbone and side-chain heavy atoms. ^[12] The structure module also predicts a per-residue pLDDT confidence score using the same network head and confidence calibration as AlphaFold 2. ^[1]^[12]

The published ESMFold checkpoints (v0 and v1) pair the 3-billion-parameter ESM-2 stem with the 48-block trunk; ablation experiments in the paper that train only the structure module on top of each of the six ESM-2 sizes (0 trunk blocks) demonstrate how downstream folding accuracy improves smoothly with backbone scale. ^[1]^[12]

Training of the folding head

The Folding Trunk and structure module were trained on a curated subset of the Protein Data Bank with a training cutoff of May 2020 to allow fair benchmarking against AlphaFold 2 on CASP14 targets and CAMEO assessments collected after that date. ^[1]^[9] Following AlphaFold 2 practice, the authors augmented the training set with high-confidence predicted structures (self-distillation), and the folding losses combined frame-aligned point error (FAPE), distogram, secondary-structure, masked-LM language consistency and a confidence-prediction loss. ^[1]^[2] During folding-head training the ESM-2 weights are lightly fine-tuned end-to-end with a low learning rate. ^[2]

The paper reports that training the folding head against PDB targets took roughly two weeks on tens of NVIDIA A100 GPUs, considerably less than the multi-month compute budget for the underlying ESM-2 language model. After the folding head converges, the authors apply an OpenFold-style refinement protocol using AMBER force-field relaxation to remove minor stereochemical violations, although this final step is optional and is omitted in the production HuggingFace inference path. ^[1]^[4]

How does ESMFold differ from AlphaFold?

The defining difference is the MSA: AlphaFold 2 reads a multiple sequence alignment of evolutionarily related proteins (often thousands of sequences deep) plus structural templates, whereas ESMFold reads only the one target sequence and substitutes the language model's learned representation for that evolutionary signal. ^[1]^[4] This makes ESMFold standalone (no sequence database needed at inference) and much faster, at the cost of some average accuracy when a high-quality MSA is available. ^[1]^[9]

While the Folding Trunk inherits many architectural ideas from AlphaFold 2's EvoFormer, several simplifications are notable. AlphaFold 2 operates on an (s, L, d) MSA tensor with depth s up to a few thousand, requiring row-wise gated attention with mean-pooled gating across the MSA dimension to keep memory tractable. ESMFold drops the MSA dimension entirely; the row-track becomes a single (L, d_s) sequence representation and the entire row-wise gated MSA attention is replaced by standard self-attention on a single track. The triangular multiplicative updates and triangle attention on the pair representation are retained, because they remain the most efficient way to propagate information across pairs of residues. AlphaFold 2's "extra MSA stack," which processes a much deeper but lower-dimensional MSA before the main EvoFormer, has no analog in ESMFold and is replaced by the embeddings produced by the upstream ESM-2 model. ^[1]^[4]^[12]

The performance gap is direction-dependent: AlphaFold 2 keeps a clear accuracy lead on protein families with rich evolutionary histories, while ESMFold matches or beats single-sequence variants of AlphaFold 2 and RoseTTAFold (which lose 30 to 50 TM-score points when their MSA input is removed) and narrows the gap sharply on targets with shallow or unavailable alignments. ^[1]^[2]^[9]

How accurate is ESMFold?

On CAMEO assessments (194 targets in the original paper window) ESMFold reaches an average TM-score of approximately 0.83, and on the 71 publicly assessed targets of CASP14 it reaches approximately 0.68. ^[1]^[2]^[9] The full AlphaFold 2 pipeline with MSAs and templates achieves about 0.88 on CAMEO and 0.85 on CASP14 on the same selections, and RoseTTAFold reaches roughly 0.82 and 0.81 respectively. ^[9] The performance gap between ESMFold and AlphaFold 2 narrows substantially on targets with shallow MSAs, where evolutionary signal is sparse, and ESMFold matches or beats single-sequence variants of AlphaFold 2 and RoseTTAFold (which lose 30 to 50 TM-score points when their MSA input is removed). ^[1]^[2] An independent 2025 benchmark in Frontiers in Genetics reported a median TM-score of 0.95 for ESMFold versus 0.96 for AlphaFold 2 on a curated globular set, and median pLDDT of 87.4 for ESMFold versus 92.65 for AlphaFold 2. ^[9]

Method	CAMEO TM-score	CASP14 TM-score	Inputs needed
AlphaFold 2 (full pipeline)	~0.88	~0.85	Sequence + MSA + templates
RoseTTAFold	~0.82	~0.81	Sequence + MSA
ESMFold (v1)	~0.83	~0.68	Sequence only
AlphaFold 2 (single-seq)	substantially lower	~0.37	Sequence only

Numbers are approximate values reported in the Science paper and a 2025 Frontiers benchmark. ^[1]^[9]

How fast is ESMFold?

The Science paper reports that on a single NVIDIA V100, ESMFold predicts the structure of a 384-residue protein in roughly 14 seconds (14.2 seconds in the paper's benchmark), about 6 times faster than the AlphaFold 2 neural network alone and approximately 60 times faster than the AlphaFold 2 end-to-end pipeline including MSA construction and template search. ^[1]^[4]^[9] Meta summarized the headline result as "Predictions are up to 60x faster than the current state-of-the-art while maintaining accuracy." ^[3] At scale, Meta reported folding more than 600 million metagenomic proteins in approximately two weeks on a cluster of about 2,000 GPUs, an operation that the team described as previously requiring years of wall-clock time with MSA-based methods. ^[3]^[5]

Confidence and failure modes

ESMFold inherits AlphaFold 2's pLDDT confidence metric and its calibration tracks the empirical accuracy reasonably well: predictions with pLDDT > 70 are usually globally correct, while regions below 50 are typically disordered or low confidence. ^[1] In the bulk metagenomic fold, roughly one third of predicted structures have a mean pLDDT above 70, and about 13% have mean pLDDT above 90 across the full sequence. ^[3]^[5]

Variants and implementations

Official checkpoints

The reference open-source repository, github.com/facebookresearch/esm, distributes both ESM-2 language model weights at six sizes (8M, 35M, 150M, 650M, 3B, 15B) and two ESMFold checkpoints, esmfold_v0 (used for the original Atlas release) and esmfold_v1 (the recommended production checkpoint). ^[4] Both production ESMFold checkpoints use the 3B-parameter ESM-2 stem with a 48-block folding trunk and an 8-block structure module, while ablation checkpoints that train only the structure module on top of each ESM-2 size are also published for research purposes. ^[4]^[12] The repository was archived as read-only in August 2024 after the team moved to EvolutionaryScale, but weights and inference code remain downloadable under an MIT license. ^[4]^[10]

Hugging Face integration

Hugging Face shipped ESMFold and ESM-2 in the transformers library in late 2022 via the EsmModel and EsmForProteinFolding classes, ported in part from the OpenFold reimplementation. The canonical production checkpoint is published at facebook/esmfold_v1, with separate repositories for each ESM-2 size (e.g. facebook/esm2_t33_650M_UR50D for the 650M variant and facebook/esm2_t48_15B_UR50D for the 15B model). ^[4]^[6] The transformers integration removed many heavyweight dependencies from the original Meta implementation, making single-GPU inference straightforward through the standard from_pretrained API. As of 2026 the facebook/esmfold_v1 model card reports roughly two million monthly downloads. ^[6]

ESM Metagenomic Atlas API

Meta deployed a hosted API at api.esmatlas.com and a web search interface at esmatlas.com on 1 November 2022, providing folding-as-a-service and search-by-sequence over the 617-million-structure database under a CC BY 4.0 license. ^[3]^[4]^[5] The Atlas allows users to retrieve precomputed structures, search by sequence identity using MMseqs2-based tooling, and submit short sequences for on-demand folding via the API. ^[3]^[5]

What is the ESM Metagenomic Atlas?

The ESM Metagenomic Atlas is an open database of more than 617 million predicted protein structures, computed with ESMFold from MGnify-clustered metagenomic sequences collected from soil, ocean and host-associated microbiomes. ^[3]^[5] It was the flagship application announced alongside ESMFold and represents the first dense survey of the structural "dark matter" of the protein universe. According to the Science abstract, the Atlas covers "more than 617 million metagenomic protein sequences, including more than 225 million that are predicted with high confidence." ^[1] Meta described it as "the largest database of high resolution predicted structures, 3x larger than any existing protein structure database, and the first to cover metagenomic proteins comprehensively and at scale." ^[3] At release it was roughly three times larger than the AlphaFold Protein Structure Database snapshot then available. ^[3] Subsequent studies have used the Atlas to identify novel folds, characterize uncharacterized protein families and expand the known protein structural space. ^[5]^[9]

What is ESMFold used for?

Metagenomic dark matter

The flagship application announced alongside ESMFold was the ESM Metagenomic Atlas, a snapshot of the structure of more than 617 million predicted proteins derived from MGnify-clustered metagenomic sequences from soil, ocean and host-associated microbiomes. ^[3]^[5] At the time of release, this was described in Meta's blog as "the largest database of high resolution predicted structures" and was roughly three times larger than other public structure databases such as the AlphaFold Protein Structure Database snapshot then available. ^[3] Subsequent studies have used the Atlas to identify novel folds, characterize uncharacterized protein families and expand the known protein structural space. ^[5]^[9]

Drug discovery and protein design

Because ESMFold provides sequence-only inference, it is well suited to high-throughput screening pipelines in AI drug discovery and protein engineering campaigns where MSAs are difficult to construct, for example for designed proteins, antibodies, intrinsically disordered regions, or rapidly evolving pathogens. ESMFold is widely used as a fast structure oracle in diffusion model-based protein design loops, including the RFdiffusion family, where candidate sequences must be folded and scored thousands of times per design experiment. ^[9]

Embeddings for downstream tasks

Independently of the folding head, the underlying ESM-2 embeddings have become a default representation for downstream protein tasks including function prediction, variant effect prediction, contact and binding-site prediction, and remote homology detection. ^[1]^[4] The medium-sized 150M and 650M ESM-2 checkpoints are frequently used as feature extractors in academic pipelines because they balance accuracy with modest GPU memory requirements. ^[11] Independent transfer-learning studies, including a 2025 Scientific Reports analysis, found that the medium-sized ESM-2 checkpoints often match or exceed the 15B variant on practical downstream tasks once labeled fine-tuning data is available, suggesting a saturation effect for some tasks even though raw structural information continues to improve with backbone scale. ^[11]

Variant effect prediction and zero-shot fitness

The original Lin et al. 2023 paper reported that zero-shot fitness predictions made from masked-LM log-likelihood ratios on ESM-2 (15B) correlate strongly with experimentally measured deep mutational scanning datasets, replicating and extending earlier findings from the ESM-1 line. This zero-shot capability makes ESM-2 a popular starting point for predicting the functional effects of human genetic variants and for guiding directed evolution campaigns where no labeled data is available. ^[1]^[7]

AI for Science context

ESMFold sits within a broader wave of foundation models applied to structural biology, including the AlphaFold family from Google DeepMind, RoseTTAFold from the Baker lab, OmegaFold from Helixon, and successor multimodal models such as ESM3 and AlphaFold 3. It is a recurring case study in the broader AI for Science literature. ^[9]^[10]

Limitations and criticisms

The Science paper and several follow-up evaluations note that ESMFold does not match the full AlphaFold 2 pipeline on average accuracy when high-quality MSAs are available; on CASP14 targets specifically the gap is roughly 0.17 TM-score points, which can translate into qualitatively wrong folds on hard cases. ^[1]^[9] ESMFold also produces lower confidence predictions on average than AlphaFold 2, with median pLDDT roughly five points lower on globular benchmarks. ^[9] The original release supports monomeric chains only and does not natively handle multimers, ligands, nucleic acids, post-translational modifications, or membrane environments; AlphaFold 3 and Chai-1 (2024) added these capabilities but use diffusion-style architectures that depart from the ESMFold design. ^[9]

The 15-billion-parameter ESM-2 model is large enough that single-GPU inference of long sequences is memory limited. The Meta team initially reported that ESMFold handled proteins of up to roughly 3,000 residues on then-current data center hardware, with quadratic memory scaling in sequence length limiting longer chains. ^[5] Reproducibility of the headline benchmark numbers has been the subject of community discussion: the official GitHub repository acknowledged an issue thread in which users could not exactly reproduce the published CAMEO and CASP14 averages without using the precise target lists and evaluation protocol described in the paper supplement. ^[4]

A more conceptual critique is that ESMFold remains a discriminative regressor that maps sequence to a single best-guess structure. It does not capture conformational ensembles, alternative folds, or unstructured-to-folded transitions, and its outputs should not be interpreted as physically sampled states. ^[9] Generative successors such as ESM3 and diffusion-based folders such as AlphaFold 3 explicitly address some of these limitations. ^[10]

There is also a continuing debate about whether single-sequence prediction can ever fully recover the information present in deep evolutionary alignments. Several benchmark studies have noted that ESMFold's accuracy advantage over MSA-based methods materializes only when MSAs are shallow or unavailable; on protein families with rich evolutionary histories, AlphaFold 2's accuracy lead remains substantial. ^[9] Critics have noted that the 15B parameter ESM-2 model effectively learns an internal representation of evolutionary signal during pretraining, raising the question of whether it has merely shifted the MSA bottleneck from inference time to training time rather than eliminating it. The compute resources required to train ESM-2 (15B), reported in the supplement at over 4 million A100 GPU hours, are themselves considerable, even though inference-time cost is dramatically reduced. ^[1]^[2]

Finally, the Atlas itself is a snapshot rather than a continuously updated resource. After the 2022 release Meta did not refresh the Atlas with newer metagenomic sequences, and maintenance of the esmatlas.com infrastructure transferred to the EvolutionaryScale team in 2024. Users seeking up-to-date metagenomic structures must either re-fold from updated MGnify releases or rely on third-party mirrors. ^[3]^[10]

Comparison

System	Year	Backbone	MSA needed	Approx. CASP14 TM	Speed vs AlphaFold 2 pipeline	Multimer / ligands
AlphaFold 2	2021	EvoFormer + IPA structure module	Yes	~0.85	1x (baseline)	Multimer extension (separate)
RoseTTAFold	2021	Three-track network	Yes	~0.81	Similar order to AlphaFold 2	Limited
OmegaFold	2022	OmegaPLM (670M) + Geoformer	No	Lower than AlphaFold 2 with MSA, higher than AlphaFold 2 without	Faster than AlphaFold 2	No
ESMFold	2022 to 2023	ESM-2 (up to 15B) + Folding Trunk + IPA	No	~0.68	~6x faster on neural net, ~60x with MSA search	Monomer only
ESM3	2024	Multimodal generative encoder-decoder	No	n/a (generative model)	Variable	Sequence/structure/function
AlphaFold 3	2024	Diffusion-based	Optional	Higher than AlphaFold 2	Comparable	Multimer + ligands + nucleic acids

Values are illustrative and drawn from the cited primary papers and benchmark studies. ^[1]^[9]^[10]

When was ESMFold released?

ESMFold was first released as a bioRxiv preprint on 21 July 2022, and Meta launched the ESM Metagenomic Atlas with the model weights and code on 1 November 2022. ^[2]^[3] The peer-reviewed paper was published in Science on 16 March 2023. ^[1] The table below summarizes the timeline.

Date	Milestone
21 July 2022	First bioRxiv preprint of ESMFold and ESM-2 ^[2]
October 2022	Updated preprint adding metagenomic analysis ^[2]
1 November 2022	ESM Metagenomic Atlas launch (>600M structures), weights and code released ^[3]
Late 2022	Hugging Face transformers integration (EsmForProteinFolding) ^[4]
16 March 2023	Peer-reviewed publication in Science 379(6637):1123-1130 ^[1]
June 2024	EvolutionaryScale releases ESM3, the generative successor ^[10]
August 2024	facebookresearch/esm repository archived as read-only ^[4]

Is ESMFold open source?

Yes. Meta released ESMFold and the ESM-2 weights publicly: the model code is distributed under an MIT license through github.com/facebookresearch/esm, the production checkpoint is available as facebook/esmfold_v1 on Hugging Face, and the ESM Metagenomic Atlas data is released under a CC BY 4.0 license. ^[4]^[6] The GitHub repository was archived as read-only in August 2024 after the core team moved to EvolutionaryScale, but the weights and inference code remain downloadable. ^[4]^[10]

Significance

ESMFold demonstrated that the scaling laws established for general-purpose language models transfer to biological sequence modeling: as ESM-2 grows from 8M to 15B parameters, structural information emerges in its attention patterns at a steady rate, and the corresponding folding head improves smoothly. ^[1]^[2] By eliminating the MSA bottleneck, the model enabled the first dense survey of the structural universe of metagenomic proteins on the time scale of weeks rather than years, and helped catalyze the founding of EvolutionaryScale and the broader generative protein modeling agenda that followed in 2024. ^[3]^[10] AI drug discovery pipelines and self-supervised learning research alike continue to use ESMFold and ESM-2 as default baselines. It also became a default piece of infrastructure for protein design pipelines that need to fold thousands of candidate sequences inside an inner loop, because of its standalone single-sequence operation and permissive MIT licensing. ^[4]^[6]

ESMFold belongs to a family of deep learning protein structure predictors that emerged after 2018. The most direct comparison is AlphaFold 2 (DeepMind, 2021), which set the bar for accuracy by using deep MSAs. RoseTTAFold (Baker lab, 2021) used a similar approach with a three-track design. OmegaFold (Helixon, 2022) is the closest single-sequence peer to ESMFold, pairing the smaller OmegaPLM language model with a Geoformer trunk. Inside Meta's lineage, ESMFold was preceded by ESM-1, ESM-1b and ESM-MSA-1, and was succeeded by ESM3 from EvolutionaryScale (2024), which generalizes the architecture to a generative multimodal foundation model jointly trained on sequence, structure and function. AlphaFold 3 (DeepMind, 2024) and Chai-1 (2024) extend protein folding to ligands, nucleic acids and complexes using diffusion-based generative architectures. ^[1]^[9]^[10]

ELI5

Proteins are tiny molecular machines that fold up into specific 3D shapes, and the shape decides what the protein does. Older AI tools like AlphaFold figure out the shape by lining up the protein against thousands of related proteins from other species, like solving a puzzle with lots of hints. ESMFold skips the hints: it read so many protein "sentences" during training that it learned the grammar of how proteins fold, so it can guess the shape from just one sequence in a few seconds instead of minutes or hours. That speed let Meta map the shapes of more than 617 million proteins from soil and ocean microbes that nobody had ever seen folded before. ^[1]^[3]

References

^[1]: Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives, "Evolutionary-scale prediction of atomic-level protein structure with a language model", Science 379(6637):1123-1130, 2023-03-16. https://www.science.org/doi/10.1126/science.ade2574. Accessed 2026-05-21. ^[2]: Zeming Lin, Halil Akin, Roshan Rao, Brian Hie et al., "Language models of protein sequences at the scale of evolution enable accurate structure prediction", bioRxiv preprint 2022.07.20.500902, 2022-07-21 (v1) and 2022-10 (v2 renamed). https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2. Accessed 2026-05-21. ^[3]: Meta AI, "ESM Metagenomic Atlas: The first view of the 'dark matter' of the protein universe", Meta AI Blog, 2022-11-01. https://ai.meta.com/blog/protein-folding-esmfold-metagenomics/. Accessed 2026-05-21. ^[4]: facebookresearch, "esm: Evolutionary Scale Modeling, pretrained language models for proteins", GitHub repository, archived 2024-08-01. https://github.com/facebookresearch/esm. Accessed 2026-05-21. ^[5]: Meta AI / EvolutionaryScale, "ESM Metagenomic Atlas", esmatlas.com, launched 2022-11-01. https://esmatlas.com/. Accessed 2026-05-21. ^[6]: Meta AI (Facebook), "facebook/esmfold_v1 model card", Hugging Face Hub, 2022-10-31. https://huggingface.co/facebook/esmfold_v1. Accessed 2026-05-21. ^[7]: Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, Rob Fergus, "Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences", PNAS 118(15):e2016239118, 2021-04-13. https://www.pnas.org/doi/10.1073/pnas.2016239118. Accessed 2026-05-21. ^[8]: Hugging Face, "ESM (model documentation)", Hugging Face Transformers documentation, 2022-2024. https://huggingface.co/docs/transformers/en/model_doc/esm. Accessed 2026-05-21. ^[9]: Frontiers in Genetics editorial team / NCBI PMC, "Balancing speed and precision in protein folding: a comparison of AlphaFold2, ESMFold, and OmegaFold", 2025. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12844563/. Accessed 2026-05-21. ^[10]: EvolutionaryScale, "ESM3: Simulating 500 million years of evolution with a language model", EvolutionaryScale Blog, 2024-06-25. https://www.evolutionaryscale.ai/blog/esm3-release. Accessed 2026-05-21. ^[11]: NVIDIA, "esm2_uniref_pretraining_data dataset card", Hugging Face Datasets. https://huggingface.co/datasets/nvidia/esm2_uniref_pretraining_data. Accessed 2026-05-21. ^[12]: facebookresearch/esm DeepWiki, "ESMFold architecture page", DeepWiki, 2024. https://deepwiki.com/facebookresearch/esm/2.3-esmfold. Accessed 2026-05-21.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

AlphaFold AlphaFold-Multimer Chai Discovery Chai-1 Protein folding Transformers

Infobox

What is ESMFold?

Background

How does ESMFold work?

Two-stage design

ESM-2 protein language model

Folding Trunk

Training of the folding head

How does ESMFold differ from AlphaFold?

How accurate is ESMFold?

How fast is ESMFold?

Confidence and failure modes

Variants and implementations

Official checkpoints

Hugging Face integration

ESM Metagenomic Atlas API

What is the ESM Metagenomic Atlas?

What is ESMFold used for?

Metagenomic dark matter

Drug discovery and protein design

Embeddings for downstream tasks

Variant effect prediction and zero-shot fitness

AI for Science context

Limitations and criticisms

Comparison

When was ESMFold released?

Is ESMFold open source?

Significance

Related work

ELI5

See also

References

Improve this article

Related Articles

Galactica (language model)

Open Catalyst Project

Weather

AlphaGeometry

AlphaFold 3

AlphaProof

What links here

Related Articles

Galactica (language model)

Open Catalyst Project

Weather

AlphaGeometry

AlphaFold 3

AlphaProof

What links here