TxGemma
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,157 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,157 words
Add missing citations, update stale details, or suggest a clearer explanation.
TxGemma is a collection of open language models from Google built for therapeutic development and drug discovery. The models are fine-tuned from Gemma 2, the second generation of Google's lightweight open Gemma family, and are trained to predict and reason about the properties of molecules and other entities that matter across the drug development pipeline. Announced by Google DeepMind and Google Research on March 25, 2025, TxGemma is the open successor to an earlier internal research model called Tx-LLM, and it ships alongside an agentic companion system named Agentic-Tx that is orchestrated by Gemini.[1][2]
The models are released under the Health AI Developer Foundations (HAI-DEF) program, the same umbrella that hosts Google's MedGemma models for medical text and imaging. Within that program TxGemma is the line aimed specifically at therapeutics rather than clinical applications.[3]
TxGemma builds directly on Tx-LLM, a research model that Google Research and DeepMind described on October 9, 2024. Tx-LLM was fine-tuned from PaLM-2 and trained to predict properties of a wide range of entities relevant to therapeutic development, including small molecules, proteins, nucleic acids, cell lines, and diseases. It learned from 66 drug discovery datasets spanning the pipeline from early target gene identification to late-stage clinical trial approval, which the team reformatted into instruction-answer prompts through a collection they called Therapeutics Instruction Tuning (TxT). Those tasks fall into three categories: classification, regression, and generation.[4]
Tx-LLM was never released as open weights. Google described it as a research result and invited outside scientists to submit an expression of interest rather than download the model.[4] TxGemma was created to fill that gap: an open, practically sized family of models that developers can download and fine-tune on their own therapeutic data, while carrying forward the task design and data foundation that Tx-LLM established. Where Tx-LLM rested on the proprietary PaLM-2 base, TxGemma swaps in the openly available Gemma 2 backbone, which is what allows the weights to be distributed.[1][2]
TxGemma comes in three parameter sizes and two functional variants. The Predict models exist at all three sizes and are tuned for narrow, structured prompting on specific therapeutic tasks. The Chat models exist at the 9B and 27B sizes and are conversational: they handle multi-turn dialogue and can explain the reasoning behind a prediction, at the cost of some raw predictive accuracy. The Chat variants are trained on a mixture of therapeutic data and general Gemma 2 instruction-tuning data so that they keep broad conversational ability.[1][5]
| Model | Sizes | Type | Purpose |
|---|---|---|---|
| TxGemma-Predict | 2B, 9B, 27B | Prediction | Narrow, structured prompts for classification, regression, and generation across therapeutic tasks |
| TxGemma-Chat | 9B, 27B | Conversational | Multi-turn dialogue and reasoning, including explaining the rationale for a prediction |
All variants use the decoder-only transformer architecture inherited from Gemma 2. The Hugging Face model cards list the model as created on March 18, 2025, a week before the public announcement, and distribute the weights under the Health AI Developer Foundations terms of use, which require explicit acceptance before download.[5]
TxGemma's training and evaluation are anchored to the Therapeutics Data Commons (TDC), a public collection of curated drug discovery datasets. Google drew on TDC to assemble roughly 7 million training examples (the model card cites more than 15 million data points across biomedical entities, with about 7,080,338 used for training and the remainder held out for validation and testing).[1][5] These cover 66 therapeutic development tasks spanning the discovery and development of safe and effective medicines.[1][5]
The entities the models reason about include small molecules, proteins, nucleic acids, diseases, and cell lines.[3] In practice the tasks cover work such as predicting molecular toxicity, estimating drug-target binding affinity, identifying targets, and predicting clinical trial approval, and they take the same three forms inherited from Tx-LLM: classification, regression, and generation.[1][5]
On benchmark performance, Google reported that the largest model, TxGemma-27B-Predict, outperforms or roughly matches its predecessor (the state-of-the-art generalist model) on 64 of the 66 tasks, beating it outright on 45. Measured against specialized, single-task models, the same model is at or near the state of the art on 50 of the tasks and surpasses them on 26.[1][2] These results are notable because a single generalist model is competing with models purpose-built for individual assays.
Because the models can be fine-tuned, Google also published Colab notebooks showing how to adapt TxGemma to proprietary datasets, with one example using the TrialBench dataset to predict adverse events in clinical trials.[1]
Alongside the standalone models, Google described Agentic-Tx, an agentic system that addresses a known weakness of fixed language models: tasks that need up-to-date external knowledge or several steps of reasoning. Agentic-Tx is powered by Gemini 2.0 Pro and is equipped with 18 tools, among them TxGemma-Predict and TxGemma-Chat themselves, general search utilities (PubMed, Wikipedia, and web search), and specialized molecular and gene or protein tools. With these it can run multi-step workflows and answer research questions in either autonomous or interactive settings.[1][2]
Google evaluated Agentic-Tx on reasoning-heavy chemistry and biology problems and reported state-of-the-art results on benchmarks including Humanity's Last Exam (Chemistry and Biology) and ChemBench, with reported gains over competing reasoning models on those tasks.[1][2] A demonstration of the agentic workflow was released through Google's Gemma cookbook on GitHub.[1]
TxGemma is available to download from Hugging Face and through Google Cloud's Vertex AI Model Garden, where developers can run the models or fine-tune them on their own data.[1][3] As an open release under the HAI-DEF terms, it gives researchers control over their infrastructure and data, which is a meaningful consideration for sensitive biomedical work.[3]
The models carry clear limitations. They are intended as a starting point for research and development, not as validated tools for clinical or diagnostic decisions, and the Chat variants trade away some predictive accuracy for their conversational flexibility.[1][5] Performance is also bounded by the scope of the Therapeutics Data Commons tasks the models were trained on, so adaptation to a specific use case generally calls for additional fine-tuning. Google positions TxGemma as a way to compress the time and cost of early therapeutic development by letting one efficient model stand in for many specialized predictors, while leaving experimental validation to the laboratory.[1][2]
TxGemma sits within a broader Google effort to apply machine learning to biology and chemistry that also includes structure-prediction systems such as AlphaFold, though TxGemma addresses a different part of the problem: predicting development-relevant properties and outcomes rather than 3D structure.[2]