Boltz is a family of open-source biomolecular structure prediction models developed primarily at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Jameel Clinic. The family includes Boltz-1, released in November 2024, and Boltz-2, released in June 2025 in collaboration with Recursion Pharmaceuticals. Both models are distributed under the MIT license, making them freely available for academic and commercial use.
Boltz-1 was the first fully open-source model to reach accuracy comparable to AlphaFold 3, Google DeepMind's closed-access structure prediction system. Boltz-2 extended this foundation by jointly predicting molecular structure and binding affinity in a single model, approaching the accuracy of physics-based free energy perturbation (FEP) methods while running roughly 1,000 times faster. Together, the two releases established Boltz as a significant open alternative to proprietary tools in computational structural biology.
Protein structure prediction has been transformed since AlphaFold 2 was released in 2021 under an Apache 2.0 license, enabling broad academic and commercial use. When Google DeepMind released AlphaFold 3 in 2024, the model was substantially more capable, handling proteins, RNA, DNA, and small-molecule ligands together in a single co-folding framework. However, AlphaFold 3 came with significant access restrictions. The model weights could only be accessed upon request and only for non-commercial research; commercial organizations were prohibited from using the model outputs. A web server was provided but capped the number of predictions per day, making large-scale work impractical.
This created a gap in the field. Researchers and companies working on drug discovery needed an open model that could handle biomolecular complexes at AlphaFold 3 accuracy without the access constraints. Several groups moved quickly to fill that gap. Chai Discovery released Chai-1 under an Apache 2.0 license. RoseTTAFold All-Atom, from the University of Washington, released its code under the MIT license but kept model weights non-commercial only. The MIT team behind Boltz-1 went furthest in openness, releasing not just the weights but the full training pipeline and datasets under the MIT license with no commercial restrictions.
Boltz-1 was developed over approximately four months in 2024 by a team led by Jeremy Wohlwend and Gabriele Corso, both MIT graduate students working at CSAIL and the MIT Jameel Clinic. Saro Passaro, a research affiliate at the Jameel Clinic, also contributed centrally. Faculty advisors included Tommi Jaakkola and Regina Barzilay, both professors in MIT's Department of Electrical Engineering and Computer Science. Additional contributors came from Genesis Research and CHARM Therapeutics.
For Boltz-2, the team expanded through a research partnership with Recursion Pharmaceuticals, a biotechnology company headquartered in Salt Lake City, Utah, that operates one of the largest supercomputers in the pharmaceutical industry (BioHive-2, an NVIDIA DGX SuperPOD with 504 H100 GPUs). Researchers from Valence Labs and ETH Zurich also contributed to the Boltz-2 paper, with Jaakkola and Barzilay again as senior authors.
Boltz-1 was first posted as a preprint on bioRxiv on November 19, 2024, and publicly announced at MIT's Stata Center on December 5, 2024. The official MIT News release followed on December 17, 2024.
Boltz-1 follows the general framework established by AlphaFold 3, which uses a diffusion model operating on atomic coordinates. The overall pipeline includes a trunk that processes sequence information through Multiple Sequence Alignments (MSAs) and pairwise representations, followed by a diffusion module that iteratively denoises atomic positions from noise to a final predicted structure.
The team began by reproducing AlphaFold 3's published architecture, then introduced several modifications:
MSA processing. The order of operations in the MSAModule was changed so that updates to single and pair representations feed back to each other more effectively, improving information flow through the network.
Diffusion transformer. The DiffusionTransformer layer ordering was modified to include proper residual connections, correcting what the team identified as a structural issue in the baseline.
Confidence model. The confidence module was redesigned to use the full trunk composition rather than just PairFormer layers, giving more expressive confidence estimates.
Computational efficiency. Several optimizations reduced memory use and wall-clock time. These included sequence-local atom representations with GPU-efficient sparse attention, attention bias sharing and caching across diffusion timesteps, greedy symmetry correction for efficient chain alignment, and a custom Triton kernel called trifast that reduced Triangle Self Attention complexity from O(n^3) to O(n^2).
Data processing. The team developed new algorithms for handling the Protein Data Bank (PDB), which contains structures submitted by thousands of research groups over 70 years with inconsistent formatting, missing data, and ambiguous chain annotations. They created a dense MSA pairing algorithm that preserves taxonomy information for multimeric complexes, a unified cropping strategy that interpolates between spatial and contiguous cropping, and a robust pocket-conditioning mechanism that allows a single model to handle ligand binding predictions without requiring separate models.
Boltz-1 was trained on PDB structures released before September 30, 2021, with a minimum resolution of 9 angstroms. Training ran for 68,000 steps with a batch size of 128, using progressive crop sizes increasing from 384 to 512 tokens. For reference, AlphaFold 3 trained for approximately 150,000 steps with a batch size of 256, using roughly four times the compute. MSA construction used colabfold_search with MMseqs2, with up to 4,096 sequences per input.
The base Boltz-1 model, like AlphaFold 3, can generate predicted structures with physical violations: steric clashes where atoms overlap, incorrect bond lengths and angles, chirality errors, chain overlaps, and non-planar aromatic rings. To address this, the team developed Boltz-steering, an inference-time correction method using physics-inspired potential functions.
Boltz-steering applies a rigid alignment step with the Kabsch algorithm after every step of the diffusion inference procedure, correcting geometrically non-physical intermediates before the next denoising step. The resulting model variant, called Boltz-1x, passes 97% of poses through the PoseBusters physical quality check, compared to 57% for the base model and 58% for AlphaFold 3.
Boltz-1 was evaluated on a curated test set of 541 structures from the PDB (released after January 13, 2023) and on 66 CASP15 structures with ground-truth coordinates. The key benchmark results comparing Boltz-1, AlphaFold 3, and Chai-1 on top-1 predictions are shown below:
| Metric | Boltz-1 | AlphaFold 3 | Chai-1 |
|---|---|---|---|
| Mean LDDT | 0.716 | ~0.73 | ~0.71 |
| DockQ > 0.23 (protein-protein) | 0.625 | 0.63 | 0.60 |
| Protein-ligand interface LDDT | 0.580 | 0.59 | 0.57 |
| Ligand RMSD < 2 Å | 0.545 | 0.56 | 0.52 |
| PoseBusters physical quality | 57% | 58% | 27% |
| PoseBusters with steering (1x) | 97% | N/A | N/A |
Boltz-1 attained accuracy within the confidence interval of AlphaFold 3 across most metrics. Its performance on physical quality of ligand poses (57%) matched AlphaFold 3 (58%) and significantly exceeded Chai-1 (27%), with the Boltz-1x variant bringing physical quality to 97%.
Boltz-2 was released on June 6, 2025, as a joint project of MIT and Recursion Pharmaceuticals. The preprint appeared on bioRxiv on June 14, 2025.
The defining innovation of Boltz-2 is the integration of binding affinity prediction directly into the structure prediction pipeline. Prior models, including Boltz-1, predicted only the three-dimensional structure of a complex. Binding affinity, the strength with which a small molecule binds to its protein target, had to be estimated separately using physics-based methods or standalone machine learning models trained on affinity data.
Boltz-2 introduces a new affinity module built on PairFormer layers, added alongside the existing trunk, denoising, and confidence modules from Boltz-1. The affinity module has two output heads:
affinity_probability_binary: A score between 0 and 1 indicating the probability that a ligand is a binder, used for virtual screening and hit discovery.affinity_pred_value: A continuous predicted log10(IC50) value in micromolar units, used for lead optimization.This design means a single forward pass through Boltz-2 produces both a predicted 3D structure and a quantitative binding affinity estimate, making the model directly useful for two of the most computationally intensive steps in early-stage drug discovery.
Beyond the affinity module, Boltz-2 introduced several other changes relative to Boltz-1:
Controllability features. Users can now condition predictions on the experimental modality (X-ray crystallography, NMR, molecular dynamics), provide multi-chain templates with optional strict enforcement, specify distance constraints, and use contact or pocket conditioning to direct the model toward specific binding sites. These features are particularly useful when partial experimental data is available.
B-factor supervision. Boltz-2 was trained to predict B-factors (crystallographic temperature factors that encode local atomic mobility) using both experimental data and molecular dynamics simulation data. This gives the model some ability to capture local protein dynamics, not just a single static structure.
Enlarged crop size. Training crop size was increased from 512 to 768 tokens, allowing the model to handle larger complexes.
Mixed precision and kernel improvements. Training and inference use bfloat16 mixed precision, and the trifast kernel from Boltz-1 was carried forward.
Boltz-2x. Similar to Boltz-1x, Boltz-2x refers to the Boltz-2 model with inference-time steering potentials activated. These physics-inspired corrections reduce steric clashes and ligand chemistry errors (incorrect bond orders, aromaticity violations) at the cost of some additional inference time.
Boltz-2's training incorporated more data types than Boltz-1:
Binding affinity data was standardized to log10 scale in micromolar units to allow mixing across assay types.
Boltz-2 was evaluated on several benchmarks covering both structure prediction and binding affinity.
Binding affinity (relative FEP). On the FEP+/OpenFE benchmark, which tests relative binding affinity prediction for lead optimization, Boltz-2 achieved an average Pearson correlation of 0.62. For comparison, OpenFE, a widely used open-source physics-based FEP pipeline, achieves similar correlation values but requires hours to days of GPU time per compound pair. A subset of 4 targets from the FEP+ commercial benchmark (CDK2, TYK2, JNK1, P38) showed a Pearson correlation of 0.66.
Binding affinity (CASP16). In the CASP16 affinity prediction challenge, which tested predictions on 140 complexes blind, Boltz-2 outperformed all other submitted methods.
Virtual screening. On the MF-PCBA benchmark for hit discovery, Boltz-2 nearly doubled the average precision of machine learning and docking-based baselines.
Structure prediction. Structure prediction performance was competitive with Chai-1 and showed improvements over Boltz-1, with particularly notable gains on RNA structures, DNA-protein complexes, and antibody-antigen interactions. Performance remained slightly below AlphaFold 3 on some metrics.
Molecular dynamics. When Boltz-2 was conditioned on molecular dynamics modality, its predicted B-factors correlated with experimental values at rates competitive with specialized dynamics models such as AlphaFlow and BioEmu.
The table below summarizes key properties of the major biomolecular structure prediction models as of mid-2025:
| Model | Developer | Release | License | Affinity prediction | Commercial use |
|---|---|---|---|---|---|
| AlphaFold 3 | Google DeepMind | May 2024 | Non-commercial only | No | No |
| Chai-1 | Chai Discovery | Sep 2024 | Apache 2.0 | No | Yes |
| Boltz-1 | MIT | Nov 2024 | MIT | No | Yes |
| RoseTTAFold All-Atom | University of Washington | 2024 | MIT (code), non-commercial (weights) | No | No (weights) |
| Boltz-2 | MIT + Recursion | Jun 2025 | MIT | Yes | Yes |
AlphaFold 3 remains the highest-accuracy model on most structure prediction benchmarks, but its license restricts use to academic research and prohibits commercial applications. Chai-1 matches or exceeds AlphaFold 3 on several benchmarks and allows commercial use under Apache 2.0. Boltz-1 attained similar accuracy to both while releasing everything including training code and data under the MIT license.
Boltz-2 is unique among this group in offering joint structure and affinity prediction in a single model. RoseTTAFold All-Atom, developed at the University of Washington and published in 2024, handles all-atom structure prediction and showed strong results on diverse molecular types, but its weights remain non-commercial, limiting industrial adoption.
Free energy perturbation (FEP) is the established computational chemistry approach for predicting small molecule binding affinity. It works by running molecular dynamics simulations that gradually transform one ligand into another while tracking changes in binding energy, using classical force field physics rather than learned representations. FEP is considered the gold standard for lead optimization because it produces physically grounded predictions.
The main drawback of FEP is computational cost. A single FEP calculation for a pair of ligands typically requires 10 or more GPU-hours on high-performance hardware, and large-scale virtual screening campaigns covering thousands of compounds can take days to weeks even with access to cloud computing infrastructure. The commercial software FEP+ (from Schrödinger) requires significant licensing fees on top of compute costs, making it inaccessible to smaller research groups.
Boltz-2 runs a complete structure-plus-affinity prediction in approximately 20 seconds on a single consumer GPU. The 1,000x speed advantage cited by the authors compares Boltz-2 to FEP on standard benchmarks in terms of wall-clock time. On the same FEP+/OpenFE benchmark targets, Boltz-2's Pearson correlation of 0.62 is comparable to OpenFE's, which achieves similar figures after orders of magnitude more compute.
Boltz-2's authors and outside commentators have noted that FEP and Boltz-2 are more complementary than competing tools. FEP is more reliable for fine-grained ranking of similar analogs, handles cofactors, ions, and explicit solvation, and produces physically interpretable results. Boltz-2 is better suited for early-stage screening of large compound libraries where speed matters and FEP's cost is prohibitive. Using Boltz-2 to narrow a large virtual library to a small set of candidates before applying FEP to the shortlist is a natural workflow that combines both advantages.
In October 2025, researchers from Recursion's Valence Labs division and Recursion Pharmaceuticals published Boltz-ABFE, a pipeline combining Boltz-2 with absolute binding free energy (ABFE) calculations. The pipeline addresses a gap in FEP's usual workflow: conventional FEP requires an experimental crystal structure of the protein-ligand complex as a starting point, which is rarely available in early drug discovery.
Boltz-ABFE works by first using Boltz-1 or Boltz-2 to predict a protein-ligand complex structure from sequence and SMILES alone, then refining the prediction to correct steric clashes and chemistry errors, re-docking the ligand using OpenEye POSIT, and finally running ABFE simulations on the refined structure. In benchmarking on four FEP+ targets (CDK2, TYK2, JNK1, P38), the Boltz-ABFE pipeline achieved mean unsigned errors below 1 kcal/mol, comparable to calculations that start from crystal structures. Boltz-1 with POSIT re-docking outperformed Boltz-2 alone on some targets due to subtle sidechain orientation differences.
Boltz-ABFE expands the practical applicability of FEP to targets where no experimental structure is available, which covers a substantial fraction of early drug discovery programs.
All models in the Boltz family are released under the MIT license, which permits unrestricted academic and commercial use, modification, and redistribution. The license covers the model weights, training code, inference code, datasets, and benchmarks. The GitHub repository is available at github.com/jwohlwend/boltz and had accumulated over 4,000 stars by mid-2025.
The MIT license stands in contrast to the access restrictions of AlphaFold 3, which prohibits commercial use and requires application for weight access. It also differs from RoseTTAFold All-Atom, whose code is MIT-licensed but whose weights are restricted to non-commercial use.
The decision to release everything openly was deliberate. Jeremy Wohlwend and Gabriele Corso stated at the December 2024 announcement event that their goal was to foster global collaboration and provide a platform for advancing biomolecular modeling worldwide. The full release of training infrastructure also allows researchers to fine-tune, distill, or extend the models without restriction.
Recursion Pharmaceuticals, a publicly traded biotechnology company (RXRX) based in Salt Lake City, collaborated with MIT on Boltz-2 and provided the computational infrastructure used for its development. Recursion operates BioHive-2, an NVIDIA DGX SuperPOD containing 504 NVIDIA H100 GPUs, which ranked 35th in the TOP500 list of most powerful supercomputers across all industries at the time of its completion in May 2024.
Recursion's core business is using AI and high-throughput biology to accelerate drug discovery. The company maintains a biological and chemical database exceeding 50 petabytes. The collaboration with MIT on Boltz-2 aligned with Recursion's existing focus on machine learning for molecular biology and gave both parties access to each other's strengths: MIT's machine learning research and Recursion's data and compute resources.
Chris Gibson, Recursion's CEO, has described Boltz-2 as enabling researchers to "triage more effectively and focus resources on the most promising compounds," directly addressing bottlenecks in molecular selection during drug discovery. Regina Barzilay of MIT has noted that the model "helps scientists uncover new biological insights and ask questions they couldn't before with standard approaches that are more computationally intensive."
The Boltz models are designed to support multiple stages of the drug discovery pipeline, particularly the early computational stages before molecules are synthesized and tested in the laboratory.
Hit discovery. In the earliest stage of drug discovery, researchers screen large libraries of compounds to find ones that bind to a disease target. Boltz-2's affinity_probability_binary output enables virtual screening of large compound libraries at a fraction of the cost of traditional docking or FEP methods. On the MF-PCBA benchmark, Boltz-2 nearly doubled the average precision of existing ML and docking-based approaches.
Hit-to-lead optimization. Once a hit compound is identified, medicinal chemists synthesize analogs to improve potency, selectivity, and drug-like properties. Boltz-2's affinity_pred_value output provides a continuous affinity estimate (log10 IC50 in micromolar units) that can rank analogs without synthesizing them all. This allows chemists to prioritize which molecules to make.
Structure-based design. Knowing the predicted structure of a protein-ligand complex, even without a crystal structure, allows structure-based drug design. Boltz-1 and Boltz-2 provide predicted complex structures that can guide medicinal chemistry decisions about where and how to modify a molecule.
Virtual screening at scale. Boltz-2 has been coupled with SynFlowNet, a generative model that proposes synthesizable compounds from known chemical reactions and purchasable building blocks. In a prospective study on the TYK2 kinase target, the Boltz-2 + SynFlowNet pipeline screened 524,120 commercially available compounds and also generated novel candidates from a space of 76 billion synthesizable molecules. Ten of the top 10 predicted binders were confirmed as actual binders via alchemical binding free energy simulations.
Impact at Recursion. Recursion has reported that drug discovery programs using its AI platform, which now includes Boltz-2, have been completed in approximately 18 months compared to an industry average of 42 months. The number of compounds requiring physical synthesis has dropped to a few hundred per program, versus an industry average of 5,000 to 10,000.
Boltz-2's authors acknowledged several limitations in the model:
Cofactors, ions, and water. The model does not explicitly handle metal ions, cofactors, or water molecules, which can be important for correctly predicting binding modes in metalloproteins and other targets.
Dependence on accurate structure. The affinity prediction is conditional on the model predicting the correct binding pocket and protein conformation. If the structure prediction is wrong (for example, due to a large conformational change upon binding), the affinity prediction will likely also be wrong.
Variable performance across protein families. GPCRs, membrane proteins, and other structurally challenging targets tend to produce higher errors than soluble proteins.
MD ensemble diversity. Boltz-2's molecular dynamics conditioning produces B-factor predictions that correlate well with experiment, but the model does not fully capture the diversity of conformational ensembles that explicit MD simulations produce.
Stereochemistry errors. Even with Boltz-2x steering, some stereochemistry errors persist in predicted ligand poses, which can affect downstream FEP calculations.
For Boltz-1 specifically, the base model without steering (non-1x version) shows a 57% pass rate on PoseBusters physical quality checks, meaning approximately 43% of ligand poses have physically unrealistic geometries. Boltz-1x largely corrects this but adds inference time.
The Boltz GitHub repository accumulated over 4,000 stars in the months after release, and an active Slack community formed around the models. The repository tracks 17 releases as of late 2025, with the latest version (v2.2.1) released September 8, 2025.
Installation is straightforward via PyPI: pip install boltz[cuda] -U. A CPU-only installation is available but is significantly slower. The models accept protein sequences, SMILES strings for small molecules, and nucleotide sequences for RNA and DNA, with optional MSA input.
ABCFold, an open-source tool for comparing predictions from AlphaFold 3, Boltz-1, and Chai-1 through a single interface, was published in Bioinformatics Advances in 2025. Third-party deployments include the Rowan computational chemistry platform, Tamarind Bio's web server, and Neurosnap's cloud API. A community fork added support for Tenstorrent AI hardware.
Several academic groups have published studies using Boltz-1 for specific prediction tasks. One study found that over 90% of ligands predicted by Boltz-1x for allosteric and orthosteric binding sites passed standard quality criteria. Another evaluated Boltz-1, AlphaFold 3, Chai-1, and RoseTTAFold All-Atom on nanobody-small molecule complexes.