AlphaMissense

AI for Science Google DeepMind Healthcare AI

6 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v1 · 1,249 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

AlphaMissense is a machine learning model from Google DeepMind that predicts whether a missense variant, a single amino-acid substitution in a protein, is likely to cause disease. It was described in a paper published in the journal Science on 19 September 2023, and it adapts the protein structure prediction system AlphaFold to the problem of variant effect prediction. Alongside the paper, DeepMind released a catalogue of predictions covering roughly 71 million possible human missense variants, classifying about 89% of them as either likely pathogenic or likely benign. ^[1]^[2]

Background (missense variants)

DNA encodes proteins as sequences of three-letter codons, each of which specifies an amino acid. A point mutation that changes a single DNA letter can change the codon so that it codes for a different amino acid. This kind of single amino-acid substitution is called a missense variant. Some missense variants have no meaningful effect on the resulting protein, while others disrupt its folding, stability, or function and can cause genetic disease. ^[1]^[3]

The difficulty is one of scale and evidence. There are on the order of 71 million possible missense variants in the human genome, but at the time AlphaMissense was published only about 0.1% of them had been clinically annotated as pathogenic or benign by human experts. The remainder were "variants of uncertain significance," a gap that complicates the diagnosis of rare genetic disorders. AlphaMissense was built to fill in predictions across that entire space. ^[1]^[3]^[4]

How it works (AlphaFold basis)

AlphaMissense is adapted from AlphaFold, the model that predicts a protein's three-dimensional structure from its amino-acid sequence. The authors started from the AlphaFold codebase (a fork of version 2.3.2) and repurposed the architecture so that, instead of predicting a structure, the network predicts how damaging an amino-acid change is likely to be. The model combines structural context with a protein language model component, a neural network that learns the statistical distribution of amino acids from large collections of related sequences. ^[2]^[5]

Rather than training on labelled clinical data, the model was fine-tuned on the frequencies of variants observed in human and primate populations. The intuition is that substitutions which are common across closely related species are generally tolerated, while those that are absent are more likely to be harmful. After this weakly supervised training, the predictions were calibrated so that the output could be read as a probability of pathogenicity. A notable feature of this approach is that the model was not explicitly trained on known disease variants, yet it performs well on benchmarks built from them. ^[1]^[2]

For each variant, AlphaMissense outputs a pathogenicity score between 0 and 1. The released predictions use fixed thresholds to sort variants into three classes. ^[5]

Score range	Classification
Below 0.34	Likely benign
0.34 to 0.564	Ambiguous / uncertain
Above 0.564	Likely pathogenic

These cutoffs were chosen to achieve roughly 90% precision when checked against databases of known disease-causing variants. ^[4]^[5]

Results and scale

DeepMind applied AlphaMissense across more than 19,000 human proteins and produced predictions for all of their possible single amino-acid substitutions, about 71 million variants in total. Using the calibrated thresholds, the model assigned a confident label to roughly 89% of those variants. The breakdown was about 57% classified as likely benign and about 32% as likely pathogenic, with the remaining variants left as uncertain. ^[1]^[2]^[6]

Figure	Value
Possible missense variants classified	~71 million
Human proteins covered	>19,000
Variants given a confident label	~89%
Likely benign	~57%
Likely pathogenic	~32%
Previously clinically classified (at publication)	~0.1%

In the Science paper the authors reported that AlphaMissense outperformed other computational variant effect predictors across a range of genetic and experimental benchmarks, and that its scores correlated with measurements from deep mutational scanning experiments and with clinical annotations. The Zenodo release also extends beyond the canonical 71 million figure, with larger datasets covering amino-acid substitutions across roughly 20,000 UniProt isoforms and additional non-canonical transcripts. ^[2]^[5]^[6]

The predictions database

The predictions were released as an open resource rather than as a hosted prediction service. DeepMind published the precomputed scores for all possible human missense variants, deposited the data on Zenodo, and open-sourced the model code on GitHub under the Apache 2.0 licence. The trained model weights themselves were not released. ^[2]^[5]

To make the scores usable in existing genomics workflows, the predictions were integrated into the Ensembl Variant Effect Predictor (VEP) at EMBL-EBI, so that researchers analysing genetic variants could pull AlphaMissense scores directly. The data accompanies the AlphaFold protein structures already hosted there. ^[7]^[8]

The licensing of the predictions changed after launch. They were initially distributed under a non-commercial Creative Commons licence, and in March 2024 the predictions were re-released under the more permissive Creative Commons Attribution 4.0 (CC BY) licence, allowing both commercial and academic reuse. ^[2]^[5]

Limitations and clinical caveats

DeepMind and the paper's authors were explicit that AlphaMissense is a research tool, not a clinical diagnostic. The GitHub repository states that the predictions are "not intended to be a substitute for professional medical advice, diagnosis, or treatment" and that the system "has not been validated for, and is not approved for, any clinical use." DeepMind's accompanying blog post similarly stressed that the predictions are not designed to be used in the clinic directly and should be interpreted together with other lines of evidence. ^[2]^[5]

There are technical limits as well. AlphaMissense only addresses missense variants and does not cover other classes of genetic change, such as variants in non-coding regulatory regions, insertions and deletions, or variants that affect splicing. Because it is trained largely on evolutionary and population-frequency signals, it predicts whether a variant is likely damaging but does not by itself explain the molecular mechanism or link a variant to a specific disease. Commentators have also cautioned against treating its confident-looking 0-to-1 scores as definitive clinical calls. ^[3]^[4]

Reception

AlphaMissense was widely covered as a significant step in applying the AlphaFold lineage to human genetics, and follow-up work in journals including Nature Reviews Genetics and Nature Biotechnology examined its accuracy and its place among existing variant predictors. Independent analyses generally found it competitive with or stronger than prior tools, while reiterating that its outputs are predictions to be weighed alongside experimental and clinical evidence rather than used on their own. ^[3]^[4]

The model also fits into a broader DeepMind effort to predict the functional consequences of genetic variation. AlphaMissense focuses on protein-coding regions, and the company later introduced AlphaGenome, a DNA sequence model aimed at the regulatory, largely non-coding parts of the genome, positioning the two as complementary approaches to interpreting human DNA. ^[9]

References

Google DeepMind. "A catalogue of genetic mutations to help pinpoint the cause of diseases." 19 September 2023. https://deepmind.google/discover/blog/a-catalogue-of-genetic-mutations-to-help-pinpoint-the-cause-of-diseases/ ↩
Cheng, J. et al. "Accurate proteome-wide missense variant effect prediction with AlphaMissense." *Science* 381, 6664 (22 September 2023), published online 19 September 2023. DOI: 10.1126/science.adg7492. https://www.science.org/doi/10.1126/science.adg7492 ↩
"Predicting variant pathogenicity with AlphaMissense." *Nature Reviews Genetics* (2023). https://www.nature.com/articles/s41576-023-00668-9 ↩
"Advancing missense variant pathogenicity prediction." *Nature Biotechnology* (2023). https://www.nature.com/articles/s41587-023-01999-y ↩
google-deepmind/alphamissense, GitHub repository (README, licence, and disclaimer). https://github.com/google-deepmind/alphamissense ↩
"Predictions for AlphaMissense." Zenodo dataset record. https://zenodo.org/records/10813168 ↩
EMBL-EBI. "New predictions of genetic variant pathogenicity using AlphaFold protein structures." https://www.ebi.ac.uk/about/news/technology-and-innovation/genetic-variant-pathogenicity/ ↩
Inside Precision Medicine. "AlphaMissense Classifies Mutation Pathogenicity." https://www.insideprecisionmedicine.com/topics/precision-medicine/alphamissense-classifies-mutation-pathogenicity/ ↩
Google DeepMind. "AlphaGenome: AI for better understanding the genome." https://deepmind.google/blog/alphagenome-ai-for-better-understanding-the-genome/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Pushmeet Kohli

Background (missense variants)

How it works (AlphaFold basis)

Results and scale

The predictions database

Limitations and clinical caveats

Reception

References

Improve this article

Related Articles

AlphaGenome

AI in drug discovery

Virtual Biology Initiative

Isomorphic Labs

IsoDDE

AlphaGeometry