AlphaMissense
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,249 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,249 words
Add missing citations, update stale details, or suggest a clearer explanation.
AlphaMissense is a machine learning model from Google DeepMind that predicts whether a missense variant, a single amino-acid substitution in a protein, is likely to cause disease. It was described in a paper published in the journal Science on 19 September 2023, and it adapts the protein structure prediction system AlphaFold to the problem of variant effect prediction. Alongside the paper, DeepMind released a catalogue of predictions covering roughly 71 million possible human missense variants, classifying about 89% of them as either likely pathogenic or likely benign. [1][2]
DNA encodes proteins as sequences of three-letter codons, each of which specifies an amino acid. A point mutation that changes a single DNA letter can change the codon so that it codes for a different amino acid. This kind of single amino-acid substitution is called a missense variant. Some missense variants have no meaningful effect on the resulting protein, while others disrupt its folding, stability, or function and can cause genetic disease. [1][3]
The difficulty is one of scale and evidence. There are on the order of 71 million possible missense variants in the human genome, but at the time AlphaMissense was published only about 0.1% of them had been clinically annotated as pathogenic or benign by human experts. The remainder were "variants of uncertain significance," a gap that complicates the diagnosis of rare genetic disorders. AlphaMissense was built to fill in predictions across that entire space. [1][3][4]
AlphaMissense is adapted from AlphaFold, the model that predicts a protein's three-dimensional structure from its amino-acid sequence. The authors started from the AlphaFold codebase (a fork of version 2.3.2) and repurposed the architecture so that, instead of predicting a structure, the network predicts how damaging an amino-acid change is likely to be. The model combines structural context with a protein language model component, a neural network that learns the statistical distribution of amino acids from large collections of related sequences. [2][5]
Rather than training on labelled clinical data, the model was fine-tuned on the frequencies of variants observed in human and primate populations. The intuition is that substitutions which are common across closely related species are generally tolerated, while those that are absent are more likely to be harmful. After this weakly supervised training, the predictions were calibrated so that the output could be read as a probability of pathogenicity. A notable feature of this approach is that the model was not explicitly trained on known disease variants, yet it performs well on benchmarks built from them. [1][2]
For each variant, AlphaMissense outputs a pathogenicity score between 0 and 1. The released predictions use fixed thresholds to sort variants into three classes. [5]
| Score range | Classification |
|---|---|
| Below 0.34 | Likely benign |
| 0.34 to 0.564 | Ambiguous / uncertain |
| Above 0.564 | Likely pathogenic |
These cutoffs were chosen to achieve roughly 90% precision when checked against databases of known disease-causing variants. [4][5]
DeepMind applied AlphaMissense across more than 19,000 human proteins and produced predictions for all of their possible single amino-acid substitutions, about 71 million variants in total. Using the calibrated thresholds, the model assigned a confident label to roughly 89% of those variants. The breakdown was about 57% classified as likely benign and about 32% as likely pathogenic, with the remaining variants left as uncertain. [1][2][6]
| Figure | Value |
|---|---|
| Possible missense variants classified | ~71 million |
| Human proteins covered | >19,000 |
| Variants given a confident label | ~89% |
| Likely benign | ~57% |
| Likely pathogenic | ~32% |
| Previously clinically classified (at publication) | ~0.1% |
In the Science paper the authors reported that AlphaMissense outperformed other computational variant effect predictors across a range of genetic and experimental benchmarks, and that its scores correlated with measurements from deep mutational scanning experiments and with clinical annotations. The Zenodo release also extends beyond the canonical 71 million figure, with larger datasets covering amino-acid substitutions across roughly 20,000 UniProt isoforms and additional non-canonical transcripts. [2][5][6]
The predictions were released as an open resource rather than as a hosted prediction service. DeepMind published the precomputed scores for all possible human missense variants, deposited the data on Zenodo, and open-sourced the model code on GitHub under the Apache 2.0 licence. The trained model weights themselves were not released. [2][5]
To make the scores usable in existing genomics workflows, the predictions were integrated into the Ensembl Variant Effect Predictor (VEP) at EMBL-EBI, so that researchers analysing genetic variants could pull AlphaMissense scores directly. The data accompanies the AlphaFold protein structures already hosted there. [7][8]
The licensing of the predictions changed after launch. They were initially distributed under a non-commercial Creative Commons licence, and in March 2024 the predictions were re-released under the more permissive Creative Commons Attribution 4.0 (CC BY) licence, allowing both commercial and academic reuse. [2][5]
DeepMind and the paper's authors were explicit that AlphaMissense is a research tool, not a clinical diagnostic. The GitHub repository states that the predictions are "not intended to be a substitute for professional medical advice, diagnosis, or treatment" and that the system "has not been validated for, and is not approved for, any clinical use." DeepMind's accompanying blog post similarly stressed that the predictions are not designed to be used in the clinic directly and should be interpreted together with other lines of evidence. [2][5]
There are technical limits as well. AlphaMissense only addresses missense variants and does not cover other classes of genetic change, such as variants in non-coding regulatory regions, insertions and deletions, or variants that affect splicing. Because it is trained largely on evolutionary and population-frequency signals, it predicts whether a variant is likely damaging but does not by itself explain the molecular mechanism or link a variant to a specific disease. Commentators have also cautioned against treating its confident-looking 0-to-1 scores as definitive clinical calls. [3][4]
AlphaMissense was widely covered as a significant step in applying the AlphaFold lineage to human genetics, and follow-up work in journals including Nature Reviews Genetics and Nature Biotechnology examined its accuracy and its place among existing variant predictors. Independent analyses generally found it competitive with or stronger than prior tools, while reiterating that its outputs are predictions to be weighed alongside experimental and clinical evidence rather than used on their own. [3][4]
The model also fits into a broader DeepMind effort to predict the functional consequences of genetic variation. AlphaMissense focuses on protein-coding regions, and the company later introduced AlphaGenome, a DNA sequence model aimed at the regulatory, largely non-coding parts of the genome, positioning the two as complementary approaches to interpreting human DNA. [9]