MedGemma
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,253 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,253 words
Add missing citations, update stale details, or suggest a clearer explanation.
MedGemma is a collection of open generative models from Google built on the Gemma 3 architecture and tuned for understanding medical text and medical images. The models were first released on 20 May 2025 during Google I/O and are distributed as part of Google's Health AI Developer Foundations (HAI-DEF) program. Rather than a finished clinical tool, MedGemma is positioned as a starting point that developers can fine-tune and validate for specific healthcare and life-sciences applications. [1][2]
Google's medical language work began with research systems such as Med-PaLM, which were large but closed. MedGemma takes a different path by reusing the open-weight Gemma family so that the model weights can be downloaded, inspected, run locally, and adapted on private infrastructure. This matters in healthcare, where data governance and the ability to keep patient information on-premises are recurring concerns. [1][3]
MedGemma is produced by teams associated with Google DeepMind and Google Research, and it inherits the multimodal design of Gemma 3, which couples a text decoder with a vision encoder. That lineage connects it to Google's broader strategy of releasing domain-specialized derivatives of Gemma, alongside general-purpose vision-language work such as PaliGemma and the proprietary Gemini line. The medical variants are trained on de-identified medical data so that the same general recipe transfers to clinical text and imaging. [1][2]
The initial May 2025 release included a 4B multimodal model and a 27B text-only model. On 9 July 2025 Google added a 27B multimodal model and the separate MedSigLIP image encoder. A subsequent update, MedGemma 1.5, was released on 13 January 2026 and focused on the 4B multimodal model with expanded imaging support. Sizes below follow Google's own naming, which rounds the larger checkpoints to "27B." [1][2][4]
| Model | Modality | Base | First released | Notes |
|---|---|---|---|---|
| MedGemma 4B | Multimodal (image + text in, text out) | Gemma 3 4B | 20 May 2025 | Compute-efficient; pairs the text decoder with a medical SigLIP image encoder |
| MedGemma 27B (text-only) | Text in, text out | Gemma 3 27B | 20 May 2025 | Optimized for medical text and inference-time reasoning |
| MedGemma 27B (multimodal) | Image + text in, text out | Gemma 3 27B | 9 July 2025 | Adds image and longitudinal electronic-health-record interpretation |
| MedGemma 1.5 (4B) | Multimodal | Gemma 3 4B | 13 January 2026 | Adds CT, MRI, whole-slide pathology, longitudinal imaging, anatomical localization |
A reported benchmark figure for the text-only 27B model is 87.7% on the MedQA (four-option) medical question-answering set in a zero-shot setting, which Google described as within a few points of leading open reasoning models at roughly a tenth of the inference cost. The 4B model scored 64.4% on the same benchmark. These numbers are research evaluations rather than evidence of clinical readiness. [1][5]
The multimodal MedGemma models accept an image together with a text prompt and return free-form text. Typical research uses include generating draft radiology reports, answering visual questions about an image, classifying findings, and summarizing or reasoning over clinical notes. The image encoder underpinning the multimodal variants was pre-trained on de-identified data spanning several modalities: chest X-rays, dermatology photographs, ophthalmology fundus images, and histopathology slides. In MedGemma, images are normalized to 896 by 896 resolution and encoded into 256 tokens. [1][3]
In one evaluation Google reported, 81% of chest X-ray reports generated by MedGemma 4B were judged by a US board-certified radiologist as accurate enough to lead to similar patient management as the original report. Google frames such results as promising research signals, not as validated clinical performance. [1]
The January 2026 MedGemma 1.5 4B update extended the model to higher-dimensional imaging. It added the ability to interpret 3D computed tomography (CT) and magnetic resonance imaging (MRI) volumes, whole-slide histopathology images, time-series of chest X-rays for longitudinal review, anatomical localization, and structured extraction from lab reports. Google reported gains over the earlier 4B model on internal tasks, including CT disease classification (61% versus 58%) and MRI findings (65% versus 51%). The same release introduced MedASR, a separate open speech-to-text model fine-tuned for medical dictation. [4][6]
MedGemma sits inside Health AI Developer Foundations, a Google collection of open models, tooling, and recipes for building medical AI that was introduced in November 2024. HAI-DEF predates MedGemma and already included embedding-oriented foundation models such as CXR Foundation, trained on more than 800,000 chest X-rays, and Path Foundation for pathology, along with related models for dermatology and health acoustics. MedGemma added generative, instruction-following capability to that toolkit. [2][7]
Alongside the July 2025 expansion, Google released MedSigLIP, a lightweight vision-language encoder adapted from SigLIP-400M. It contains roughly a 400M-parameter vision encoder paired with a text encoder, operates at 448 by 448 image resolution, and was trained on de-identified medical image and text pairs (chest X-rays, dermatology, ophthalmology, histopathology, and CT and MRI slices) mixed with natural images to retain general visual understanding. The same encoder powers the vision capabilities of the MedGemma multimodal models. Google recommends MedSigLIP for tasks with structured outputs, such as data-efficient classification, zero-shot classification, and semantic image retrieval, where text generation is not required, while suggesting MedGemma for tasks that need generated text. [3][8]
Google is explicit that MedGemma is a development starting point rather than a clinical product. The model card states that MedGemma is intended to enable more efficient development of downstream healthcare applications and that it is not meant to be used "without appropriate validation, adaptation and/or making meaningful modification by developers for their specific use case." Outputs "are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice applications" and "should be considered preliminary and require independent verification, clinical correlation, and further investigation." [3]
The documentation also notes evaluation gaps. The multimodal capabilities were assessed primarily on single-image tasks and had not been validated for multi-image comprehension, and the models were not optimized for multi-turn conversational use. Google characterizes MedGemma as not yet clinical-grade and likely to require further fine-tuning before deployment, and the models are not cleared or approved as medical devices. Anyone building with them is responsible for the regulatory, safety, and validation work appropriate to their jurisdiction and use case. [2][3]
The MedGemma weights are open and distributed on Hugging Face and through Google Cloud's Vertex AI Model Garden, under the Health AI Developer Foundations terms of use, which users must accept before downloading. Because the weights are open, developers can run the models locally or on their own cloud infrastructure and fine-tune them on proprietary data. Google has supported the ecosystem with tutorial notebooks, reference code on GitHub, and community programs, including a MedGemma Impact Challenge that drew hundreds of participating teams and a Kaggle challenge tied to the 1.5 release. [1][2][6]