ESM3

AI Models Drug Discovery Healthcare AI Open Source AI

18 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v2 · 3,645 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

ESM3 (Evolutionary Scale Modeling 3) is a frontier multimodal generative language model for biology, released by EvolutionaryScale on June 25, 2024, that was the first model to reason jointly over the sequence, three dimensional structure, and biological function of proteins inside a single neural network ^[1]^[2]. Its flagship version has 98 billion parameters, which makes it the largest protein language model ever trained, and a smaller 1.4 billion parameter checkpoint was released the same day with open weights under a non commercial license ^[2]^[7]. EvolutionaryScale describes ESM3 as "the first generative model for biology that simultaneously reasons over the sequence, structure, and function of proteins" ^[2].

The model drew broad scientific attention because its 98B training run consumed more than 1 x 10^24 FLOPs, which the company called "the most compute ever applied to training a biological model" ^[2], and because the launch demonstration produced a novel green fluorescent protein, esmGFP, whose amino acid sequence is only 58 percent identical to the closest natural fluorescent protein ^[1]^[2]. The authors estimated that arriving at such a divergent functional sequence through natural mutation and selection would have taken roughly 500 million years, the headline figure that became the title of the Science publication ^[1].

What is ESM3?

ESM3 is the third generation of the ESM family of protein language models, which began at Meta AI Research with the original ESM work in 2019 and continued with ESM-2 and ESMFold in 2022. After Meta dissolved its protein team in 2023, the lead researchers founded EvolutionaryScale as an independent public benefit company and built ESM3 from scratch ^[12]. Unlike earlier ESM models, which were trained only on amino acid sequences, ESM3 ingests three discrete token tracks at once: sequence, structure, and function. The model is trained with a masked language modeling objective across all three tracks simultaneously, which lets users prompt it with any combination of the three modalities and receive completions in the others ^[1].

This multimodal design gives ESM3 a flexibility that pure structure prediction systems do not have. The same network can predict structure from sequence in the manner of ESMFold or AlphaFold, but it can also do the inverse design problem of generating a sequence that folds into a desired structure, infer plausible function annotations for an unknown protein, or generate proteins conditioned on arbitrary combinations of structural motifs and functional tags. EvolutionaryScale frames the model as a programmable platform for protein engineering rather than a single task system, which is how it is positioned alongside specialist tools such as AlphaFold 3 in the broader landscape of AI for science and biological foundation models.

ESM3 is built on a transformer architecture but adds geometric attention in the first block so that atomic coordinates can be folded into the same latent space as sequence and function tokens. The architectural choices, the order of magnitude jump in training compute, and the open release of a usable checkpoint together pushed ESM3 to the center of the conversation about AI for drug discovery in 2024 and 2025.

Where did ESM3 come from?

The ESM line of work originated at Meta AI, then known as Facebook AI Research, where Alexander Rives and colleagues showed in 2019 that large transformer language models trained on amino acid sequences alone could learn rich representations of protein biology. The 2022 ESM-2 release scaled the family up to 15 billion parameters and was paired with ESMFold, a single sequence structure predictor that ran roughly 60 times faster than AlphaFold 2 on short proteins because it skipped the multiple sequence alignment step. ESM-2 and ESMFold were used to compute structures for more than 600 million metagenomic proteins, releasing the ESM Metagenomic Atlas in 2022.

In the summer of 2023 Meta closed the protein team as part of a broader reorganization that refocused the lab on generative consumer AI. Rives and several colleagues spun out a new company, EvolutionaryScale, in July 2023 ^[12]. The startup operated in stealth for roughly a year before unveiling ESM3 and a 142 million dollar seed round on June 25, 2024 ^[4]^[12]. The round was led by Nat Friedman, Daniel Gross, and Lux Capital, with participation from Amazon and NVentures, the venture arm of NVIDIA, among other angel investors ^[4]^[12]. The company is structured as a public benefit corporation, which the founders said was meant to give them flexibility to balance commercial pressure with open science commitments.

The original ESM3 preprint appeared on bioRxiv on July 1, 2024 under the title "Simulating 500 million years of evolution with a language model" ^[3]. After more than a year of peer review the work was published in Science on January 17, 2025 under DOI 10.1126/science.ads0018, marking one of the most prominent publications of a frontier AI for biology system to date ^[1].

How does the ESM3 architecture work?

ESM3 is described in the Science paper as a multi track masked generative transformer ^[1]. Three independent token tracks are fed into a shared trunk that mixes them through bidirectional self attention. Each track has its own discrete vocabulary so that all three modalities can share the same modeling objective.

The sequence track uses the 20 standard amino acid tokens plus rare and gap tokens. The structure track converts the local three dimensional environment of each residue into a discrete code using a learned VQ-VAE that quantizes backbone geometry into 4,096 structural tokens, which means that protein structures can be expressed as a token string analogous to text. The function track uses a hierarchical vocabulary built from InterPro functional annotations along with keyword and Gene Ontology tags. A separate set of secondary structure and solvent accessibility tracks is included for finer grained control during inference.

The trunk uses Pre Layer Normalization, rotary position embeddings, and SwiGLU feed forward non linearities, which are the same building blocks used in modern large language models such as Llama. The first transformer block is augmented with an SE(3) invariant geometric attention layer that conditions on raw backbone atomic coordinates when they are available, which lets ESM3 use real structures without first quantizing them into structure tokens when the user prefers to provide continuous geometry.

The training objective is a generalized masked language modeling loss applied across all three tracks at varying mask ratios, which means the model has to learn to fill in any subset of tokens given any other subset. At inference time this is what enables prompting in any combination, since the model has been trained on every possible mask pattern. EvolutionaryScale also describes a chain of thought style inference procedure in which the model generates an intermediate structural reasoning trace before producing a final sequence, conceptually similar to reasoning prompts used in language models ^[1].

What sizes does ESM3 come in?

ESM3 is released in a family of three sizes that share architecture but differ in width, depth, and training compute.

Model	Parameters	Transformer blocks	Hidden dim (approx)	Availability
ESM3-small (open)	1.4 billion	48	1,536	Open weights on Hugging Face and GitHub under the Cambrian Non Commercial License ^[6]^[7]
ESM3-medium	7 billion	96	2,560	Available through the Forge API
ESM3-large (98B)	98 billion	216	6,144	Available through the Forge API and via partner platforms such as AWS and NVIDIA BioNeMo

The 1.4 billion parameter open model is officially named ESM3-sm-open-v1 and is the version that most academic researchers use ^[7]. The 7 billion medium model is positioned for users who want a balance of cost and capability through the API. The 98 billion parameter flagship is the model used for the headline generative experiments in the Science paper and is what powers EvolutionaryScale's commercial offerings ^[1].

What data was ESM3 trained on?

ESM3 was trained on a corpus assembled from public sequence and structure databases together with extensive metagenomic data, covering proteins from environments as diverse as the Amazon rainforest, the deep ocean, hydrothermal vents, and ordinary soil microbiomes. The published figures emphasize the breadth of natural diversity that the model has seen.

Quantity	Value
Unique protein sequences	2.78 billion ^[2]^[4]
Protein sequences after augmentation	3.15 billion
Protein structures	236 million
Function annotations	539 million
Total training tokens	771 billion
Compute for the 98B model	1.07 x 10^24 FLOPs (about one trillion teraflops) ^[2]
Largest model size	98 billion parameters

EvolutionaryScale describes the training as "the most compute ever applied to training a biological model" ^[2]. The infrastructure used was provided in part by Amazon Web Services and built around NVIDIA H100 GPU clusters ^[4]^[5]. The compute figure of roughly 10^24 FLOPs puts ESM3 in the same order of magnitude as contemporary general purpose large language models, which is a sharp break from the much smaller compute budgets that biology models typically received in earlier years.

What can ESM3 do?

Because ESM3 is trained to predict masked tokens across all three tracks, the same checkpoint can perform a wide range of tasks depending on what is supplied at inference time.

Capability	What ESM3 does	Example use case
Sequence prediction	Generates a plausible amino acid sequence from partial sequence, structure, or function constraints	Filling in a missing loop in a known protein
Structure prediction	Predicts a three dimensional structure from sequence	Single sequence structure prediction comparable to ESMFold
Function annotation	Predicts likely InterPro or Gene Ontology tags for an unknown protein	Annotating metagenomic dark matter
Inverse folding	Designs a sequence that folds into a specified structural template	Protein engineering for a target backbone
Conditional generation	Generates novel proteins that satisfy combinations of constraints	Designing a binder to a specified surface with a desired active site geometry
Atomic coordination	Solves tasks where specific residues must form a precise geometric arrangement	Engineering metal binding sites or catalytic triads
Chain of thought reasoning	Produces intermediate structural reasoning before final sequence output	Multi step generative design with internal scratch space

The most striking demonstrated capability is conditional generation of fully novel proteins. In the Science paper, ESM3 is prompted with high level instructions such as "design a fluorescent protein" together with a small set of conserved residues, and the model produces candidate sequences whose synthesized versions actually express, fold, and fluoresce in the wet lab ^[1].

How did ESM3 design esmGFP?

The headline experiment in the launch announcement and in the Science paper was the design of a new green fluorescent protein, esmGFP ^[1]^[2]. Green fluorescent proteins are one of the most studied tool kits in modern biology, used as visual reporters in everything from neuroscience to gene expression studies. Every known natural and engineered GFP variant shares a deeply conserved chromophore forming motif and a characteristic eleven stranded beta barrel fold.

To test whether ESM3 could go beyond paraphrasing the GFPs in its training data, the team prompted the 98 billion parameter model with a small set of conserved chromophore residues plus the structure of residues 58 through 71, then ran a chain of thought style generation procedure ^[1]. After in silico ranking and laboratory screening of the top candidates, a 229 residue protein the authors named esmGFP folded correctly, formed the chromophore, and produced bright green fluorescence. Its sequence differed from the nearest natural fluorescent protein, tagRFP, by 96 mutations out of 229 amino acids and was only 58 percent identical to it, far more diverged than any GFP that had been engineered before ^[1]^[2].

Using published estimates for the rate at which the GFP family has diversified in nature, the authors calculated that traversing this much sequence space through ordinary mutation and selection would have taken roughly 500 million years of natural evolution ^[1]. The figure is approximate, since rates of molecular evolution vary, but it captured public attention because it made concrete the idea that a generative model could shortcut evolutionary time.

The esmGFP result is best understood as a proof of concept that ESM3 can produce functional proteins well outside the distribution of natural sequences while respecting the structural and chemical constraints needed for activity. The same generative procedure has since been used by EvolutionaryScale and its partners to design candidates for binding proteins, enzymes, and other targets, although most of those results remain unpublished.

How does ESM3 compare with AlphaFold?

ESM3 sits at the intersection of two earlier research traditions. The ESM family produced fast, sequence only models that learned biology through self supervised pretraining, while AlphaFold and AlphaFold 2 produced highly accurate structure predictors that used multiple sequence alignments and equivariant attention over protein geometry. ESM3 is closer in spirit to ESM-2 than to AlphaFold 2 in that it relies on language modeling at scale, but it borrows the idea of geometry aware attention from the AlphaFold lineage and pushes into generative territory that AlphaFold 2 was never designed for.

Feature	ESM-2 (2022)	AlphaFold 2 (2021)	AlphaFold 3 (2024)	ESM3 (2024)
Developer	Meta AI	Google DeepMind	Google DeepMind, Isomorphic Labs	EvolutionaryScale
Modalities	Sequence only	Sequence with MSA	Sequence with MSA, plus ligands and nucleic acids	Sequence, structure, function
Largest version	15 billion parameters	Fixed architecture	Fixed architecture	98 billion parameters
Primary task	Representation learning, masked language modeling	Structure prediction	Structure prediction including biomolecular complexes	Multimodal generation, prediction, and design
Requires MSA	No	Yes	Yes	No
Generative	No	No	No	Yes
Structure accuracy	Below AlphaFold 2	State of the art at release	Improved over AlphaFold 2, especially for complexes	Below AlphaFold 2 on monomer accuracy benchmarks, but supports tasks the AlphaFold family cannot
Open weights	Yes, fully open	No, AlphaFold 3 weights not released initially	Restricted access	Yes for the 1.4B small model, API only for 7B and 98B
License of weights	MIT	Not applicable	Restricted	Cambrian Non Commercial License for the open checkpoint

ESM3 does not displace AlphaFold 2 or AlphaFold 3 for pure structure prediction. Published benchmarks show that AlphaFold 2 remains the most accurate single chain structure predictor and that AlphaFold 3 improved on it for complexes and for biomolecular interactions involving small molecules and nucleic acids ^[10]. ESM3 trades some monomer accuracy for the flexibility of generative design, which is a different and largely complementary capability.

Is ESM3 open source?

The 1.4 billion parameter ESM3 model was released with open weights on launch day and hosted on Hugging Face under the name EvolutionaryScale/esm3-sm-open-v1, with code and inference utilities on the evolutionaryscale/esm GitHub repository ^[6]^[7]. EvolutionaryScale used a custom license that it initially called the ESM3 Community License Agreement and later revised under the name Cambrian Non Commercial License Agreement when releasing the related ESMC family ^[6].

The license allows use of the weights for non commercial research at universities, non profit research institutes, government laboratories, and similar organizations. It explicitly disallows hosting the model as a service, using outputs for commercial activities, and training a competing model on outputs of the released weights ^[6]. Users must accept the license before downloading the weights on Hugging Face. The Cambrian revision removed an earlier restriction that excluded drug development, added an attribution and naming requirement for derived models, and clarified that fine tuned weights are considered derivative works.

The 7 billion and 98 billion parameter models are not released as weights and are accessible only through the Forge API operated by EvolutionaryScale, which offers free tier access for academic users and paid commercial access for industry. The commercial path is also offered through enterprise partners, most notably AWS and the NVIDIA BioNeMo platform ^[5].

How is ESM3 used in industry?

ESM3 was positioned from launch as a platform for industrial drug discovery rather than a purely academic curiosity, and the surrounding partnerships reflected that.

NVIDIA announced on launch day that ESM3 would be optimized for inference and training through the NVIDIA BioNeMo platform and offered as a NIM microservice on NVIDIA AI Enterprise, which is the company's stack for deploying foundation models inside regulated industries ^[5]. This made ESM3 available alongside other biology models in the BioNeMo catalog, including AlphaFold derivatives, MolMIM, and DiffDock. NVIDIA cited collaborations with more than 200 biotech and pharmaceutical users of BioNeMo at the time of launch, with that number growing in subsequent updates ^[5].

Amazon Web Services partnered with EvolutionaryScale to host the Forge API and to make the full ESM3 family accessible on AWS to enterprise customers, reaching nine out of the top ten global pharmaceutical companies ^[4]. The collaboration included support for secure fine tuning on proprietary protein data without exposing the data to EvolutionaryScale.

The Chan Zuckerberg Initiative added ESM3 to its Virtual Cells Platform in 2025 as part of its catalog of open biology models ^[8]. Academic groups began publishing applications quickly, including a 2025 study from the Tranos lab that used ESM3 representations to build vESM, a variant effect predictor for clinical genetics. ESM3 has also been adopted as a baseline in benchmarking papers for protein structure prediction and design.

How was ESM3 received?

The scientific and trade press described ESM3 in unusually large terms. Several outlets called it a "ChatGPT moment for biology," referring both to the openness of the release and to the conceptual leap of treating proteins as a multimodal generative modeling problem rather than a structure prediction problem. The 142 million dollar seed round was the largest disclosed seed financing in AI for biology at the time and helped catalyze further venture investment in the space during the second half of 2024 ^[12].

The Science publication in January 2025 added formal peer reviewed weight to the launch claims, especially the esmGFP demonstration, which had initially appeared only in a preprint and a press release ^[1]^[9]. Commentators noted both the achievement and the limits of the result, observing that ESM3 still lags AlphaFold 2 on monomer structure prediction and that the open 1.4 billion parameter model captures only a fraction of the capability of the 98 billion parameter flagship ^[11]. The community response also raised familiar concerns about biosecurity that apply to any generative model that can design functional proteins, although the design of pathogen related proteins is restricted by the license and is not a marketed use case.

Who owns ESM3 now?

On November 6, 2025, the Chan Zuckerberg Initiative announced that EvolutionaryScale and its team would join the Chan Zuckerberg Biohub, a research nonprofit funded by Mark Zuckerberg and Priscilla Chan ^[13]^[14]. Roughly 50 EvolutionaryScale employees moved to Biohub, and co-founder Alexander Rives took the role of Biohub Head of Science, leading research across the organization ^[13]^[14]. Financial terms of the transaction were not disclosed ^[13]^[14].

The move shifts ESM3 and its successors out of a venture backed public benefit corporation and into a philanthropically funded research institute. CZI framed the deal as removing commercial pressure so the team can pursue long horizon science, and Biohub committed to continue sharing AI models and datasets, consistent with the open non commercial release model that ESM3 launched under ^[13]^[14]. As of mid 2026 the ESM3 open checkpoint remains available on Hugging Face under its existing license.

What came after ESM3?

In December 2024 EvolutionaryScale released ESMC, a refreshed family of sequence only ESM models that includes a 600 million parameter ESMC-600m checkpoint with open weights. ESMC is positioned as a focused upgrade to the sequence only path that ESM-2 occupied, designed for fast embedding generation and for downstream tasks such as variant effect prediction. The company has signaled that future ESM generations will continue to expand the multimodal training corpus and incorporate additional biological modalities.

Through 2025 EvolutionaryScale also expanded the Forge platform with new features including conditional generation templates and fine tuning interfaces, building on the foundation laid by the initial ESM3 release. Following the move to the Chan Zuckerberg Biohub in late 2025, the team has continued developing the ESM lineage as the foundation for Biohub's AI driven biology efforts ^[13].

References

Hayes, T. et al. "Simulating 500 million years of evolution with a language model." *Science*, January 17, 2025. DOI: 10.1126/science.ads0018 ↩
EvolutionaryScale. "ESM3: Simulating 500 million years of evolution with a language model." EvolutionaryScale blog, June 25, 2024. https://www.evolutionaryscale.ai/blog/esm3-release ↩
Hayes, T. et al. "Simulating 500 million years of evolution with a language model." bioRxiv preprint, July 1, 2024. https://www.biorxiv.org/content/10.1101/2024.07.01.600583v1 ↩
"EvolutionaryScale Launches with ESM3: A Milestone AI Model for Biology." Amazon Press Center / Business Wire, June 25, 2024. https://press.aboutamazon.com/aws/2024/6/evolutionaryscale-launches-with-esm3-a-milestone-ai-model-for-biology ↩
"EvolutionaryScale Debuts With ESM3 Generative AI Model for Protein Design." NVIDIA Blog, June 25, 2024. https://blogs.nvidia.com/blog/evolutionaryscale-esm3-generative-ai-nim-bionemo-h100/ ↩
EvolutionaryScale. "Cambrian Non Commercial License Agreement." GitHub repository evolutionaryscale/esm, LICENSE.md. https://github.com/evolutionaryscale/esm/blob/main/LICENSE.md ↩
EvolutionaryScale/esm3-sm-open-v1 model card. Hugging Face. https://huggingface.co/EvolutionaryScale/esm3-sm-open-v1 ↩
ESM3 model entry, Chan Zuckerberg Initiative Virtual Cells Platform. https://virtualcellmodels.cziscience.com/model/esm3 ↩
"AI model simulates 500 million years of evolution to generate a new fluorescent protein." Phys.org, January 2025. https://phys.org/news/2025-01-ai-simulates-million-years-evolution.html ↩
"Comparing AI Biology Foundation Models: AlphaFold 3 and ESM3." IntuitionLabs analysis. https://intuitionlabs.ai/articles/biology-foundation-models-comparison ↩
"ESM3 and the Future of Protein Language Models." Chris Hayduk technical analysis. https://www.chrishayduk.com/p/esm3-and-the-future-of-protein-language ↩
"EvolutionaryScale Emerges from Stealth with ESM3, the Largest Protein Language Model, and $142M Seed Round." Maginative, June 25, 2024. https://www.maginative.com/article/evolutionaryscale-emerges-from-stealth-with-esm3-the-largest-protein-language-model-and-142m-seed-round/ ↩
"Gunderson Client EvolutionaryScale Team Joins Biohub in Strategic Transaction with Chan Zuckerberg Initiative." Gunderson Dettmer, November 2025. https://www.gunder.com/en/news-insights/client-news/gunderson-client-evolutionaryscale-team-joins-biohub-in-strategic-transaction-with-chan-zuckerberg-initiative ↩
"Zuckerberg, Chan pivot philanthropy to AI-powered biology." Axios, November 6, 2025. https://www.axios.com/2025/11/06/zuckerberg-chan-biohub-ai-disease ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Chai Discovery Chai-1 ESMFold EvolutionaryScale OpenFold RFdiffusion RoseTTAFold Transformers

What is ESM3?

Where did ESM3 come from?

How does the ESM3 architecture work?

What sizes does ESM3 come in?

What data was ESM3 trained on?

What can ESM3 do?

How did ESM3 design esmGFP?

How does ESM3 compare with AlphaFold?

Is ESM3 open source?

How is ESM3 used in industry?

How was ESM3 received?

Who owns ESM3 now?

What came after ESM3?

References

Improve this article

Related Articles

EvolutionaryScale

IsoDDE

Boltz

AI in drug discovery

Isomorphic Labs

Waypoint Bio

What links here

Related Articles

EvolutionaryScale

IsoDDE

Boltz

AI in drug discovery

Isomorphic Labs

Waypoint Bio

What links here