Med-PaLM 2

Google Healthcare AI Large Language Models

7 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

8 citations

Revision

v1 · 1,393 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Med-PaLM 2 is a medical large language model developed by Google Research and Google DeepMind, built on the PaLM 2 foundation model and tuned to answer questions about medicine and health. It was the first AI system reported to reach an "expert" test-taker level on questions styled after the United States Medical Licensing Examination (USMLE), and it served as the research foundation for Google's later commercial healthcare offering, MedLM. Google positioned it as a research and limited-access system rather than a clinical product, and the team repeatedly cautioned that it was not ready for use in patient care.

Background (Med-PaLM)

Med-PaLM 2 is the second generation of Google's Med-PaLM line. The original Med-PaLM, described in late 2022, adapted Google's first PaLM model to the medical domain through instruction prompt tuning and a curated set of expert demonstrations. It became the first model to pass a benchmark of USMLE-style questions, reaching roughly 67% accuracy on the MedQA dataset, just above the commonly cited passing threshold of about 60% ^[1]^[2]. Med-PaLM also introduced an evaluation rubric in which clinicians rated long-form answers along axes such as scientific consensus, reasoning, and potential for harm. That groundwork carried directly into the second generation. For the lineage and the first-generation methods, see Med-PaLM.

Built on PaLM 2

Med-PaLM 2 replaces the underlying model with PaLM 2, the general-purpose language model Google unveiled at its I/O developer conference in May 2023. On top of that stronger base, the Med-PaLM team applied medical domain finetuning and new prompting strategies. The most notable of these was "ensemble refinement," in which the model generates several reasoning paths for a question and then conditions on those candidate answers to produce a refined final response. The published work also describes a chain-of-retrieval approach for grounding answers in relevant context ^[1]^[3]. The combination of a better base model, targeted finetuning, and improved inference-time reasoning is what the authors credit for the jump in performance over the first Med-PaLM.

Performance (MedQA and expert evaluation)

Med-PaLM 2 scored up to 86.5% on MedQA, the dataset of USMLE-style multiple-choice questions. That was an improvement of more than 19 percentage points over the original Med-PaLM and set a new state of the art at the time ^[1]^[3]. Google framed crossing this threshold as the first time a large language model performed at an expert test-taker level on the benchmark. The model also performed at or near the state of the art on other medical question-answering datasets, and it was reported as the first AI system to reach a passing score on MedMCQA, a set drawn from Indian AIIMS and NEET medical entrance examinations, where it scored 72.3% ^[4].

Benchmark	Med-PaLM 2	Notes
MedQA (USMLE-style)	up to 86.5%	More than 19 points above Med-PaLM ^[1]^[3]
MedMCQA (AIIMS/NEET)	72.3%	Reported as first passing score ^[4]
MMLU clinical topics, PubMedQA	At or near state of the art	Specific figures vary by configuration ^[1]

Multiple-choice accuracy was only part of the assessment. In a pairwise study of 1,066 consumer medical questions, a panel of physicians compared Med-PaLM 2's long-form answers against answers written by other physicians. The model's responses were preferred on eight of nine axes related to clinical utility, a result the authors reported as statistically significant ^[1]^[3]. In the peer-reviewed version of the work, a pilot using real-world questions found that specialists preferred Med-PaLM 2 answers to those from generalist physicians about 65% of the time, while both specialist and generalist raters judged the model's answers to be as safe as physician answers ^[3]. The team also built adversarial question sets designed to surface weaknesses, and Med-PaLM 2 showed marked gains over its predecessor on those harder probes.

The research was first posted as a preprint, "Towards Expert-Level Medical Question Answering with Large Language Models," on 16 May 2023, with Karan Singhal as lead author and roughly thirty collaborators from Google ^[1]. A peer-reviewed version, "Toward expert-level medical question answering with large language models," appeared in Nature Medicine in 2025 (published online in January 2025 and printed in the March 2025 issue) ^[3].

Testing and deployment

Google first discussed Med-PaLM 2 publicly at The Check Up, its annual health event, on 14 March 2023, where company leaders demonstrated answers to questions such as warning signs of pneumonia and showed that the model could match or exceed clinician-written answers in some cases ^[5]. On 13 to 14 April 2023, Google Cloud announced that it would open limited access to Med-PaLM 2 for a small group of customers to explore use cases and give feedback, while stressing a focus on safety, equity, and evaluation of unfair bias ^[2].

In July 2023, the Wall Street Journal reported that Google had been testing Med-PaLM 2 with hospital customers since around April, including the Mayo Clinic, and other coverage noted HCA Healthcare among the systems experimenting with Google's language-model technology ^[6]^[7]. Greg Corrado, a senior Google research director who worked on the project, was widely quoted as saying he did not feel the technology was yet at a place where he would want it in his own family's healthcare journey, even as he described its potential to expand the areas of healthcare where AI can help ^[6]. Google productized the underlying model in December 2023 as MedLM, a family of healthcare foundation models built on Med-PaLM 2 and offered to allowlisted Google Cloud customers through Vertex AI, initially in two sizes for different tasks ^[8].

A separate research effort, Med-PaLM M, extended the line into multimodal inputs such as chest X-rays and mammograms; it is a distinct model from the text-focused Med-PaLM 2 rather than a direct upgrade of it ^[4].

Limitations

Google was consistent that Med-PaLM 2 was a research system and not a finished clinical tool. At The Check Up, the company acknowledged that significant gaps remained between benchmark performance and real-world medical use and that the model did not meet its internal bar for a clinical product ^[5]. Strong exam scores do not guarantee safe behavior on novel or ambiguous cases, and the evaluation work specifically constructed adversarial questions to expose failure modes. Like other large language models, Med-PaLM 2 can produce fluent but incorrect statements, and the team highlighted ongoing concerns about factual accuracy, reasoning errors, and potential bias or harm. Access was deliberately restricted to vetted partners under feedback agreements rather than offered as an open or consumer service, reflecting the regulatory and patient-safety stakes of medical advice.

Successors (MedGemma / Gemini in medicine)

Med-PaLM 2 sat at the leading edge of Google's medical AI work in 2023, but the field moved quickly. The commercial MedLM line was slated to incorporate models based on Gemini, Google's next-generation multimodal family, as those became available. Google's open medical models then shifted to the Gemma architecture: MedGemma, announced in 2025, is a collection of Gemma-based variants for medical text and image understanding, released as open weights for developers to build on. Later MedGemma releases reported MedQA scores in the high 80s for the larger text variant, illustrating how the expert-level performance Med-PaLM 2 first demonstrated has since been delivered in smaller, more deployable, and openly available models. Med-PaLM 2 remains significant as the system that crossed the expert-level threshold on USMLE-style questions and bridged Google's medical research from PaLM-era models toward this newer generation.

References

Singhal, K. et al. "Towards Expert-Level Medical Question Answering with Large Language Models." arXiv preprint, 16 May 2023. arxiv.org/abs/2305.09617 ↩
Google Cloud. "Sharing Google's Med-PaLM 2 medical large language model, or LLM." Google Cloud Blog, 14 April 2023. cloud.google.com/blog/topics/healthcare-life-sciences/sharing-google-med-palm-2-medical-large-language-model ↩
Singhal, K. et al. "Toward expert-level medical question answering with large language models." Nature Medicine, vol. 31, no. 3, 2025, pp. 943-950. nature.com/articles/s41591-024-03423-7 (PubMed: pubmed.ncbi.nlm.nih.gov/39779926) ↩
Google Research. "Med-PaLM: A large language model from Google Research, designed for the medical domain." sites.research.google/gr/med-palm/ ↩
Google. "Our latest health AI research updates." The Keyword (blog.google), 14 March 2023. blog.google/technology/health/ai-llm-medpalm-research-thecheckup ↩
Reuters / Fortune. "Google wants its A.I. to transform health care next, as it partners with the Mayo Clinic, report says." Fortune, 10 July 2023. fortune.com/2023/07/10/google-ai-mayo-clinic-healthcare-med-palm-2-large-language-model ↩
Healthcare Dive. "Google expands generative AI model Med-PaLM to more health customers." healthcaredive.com/news/google-med-palm-ai-expansion-healthcare/691677 ↩
Google Cloud. "Introducing MedLM for the healthcare industry." Google Cloud Blog, 14 December 2023. cloud.google.com/blog/topics/healthcare-life-sciences/introducing-medlm-for-the-healthcare-industry ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Google Research Med-PaLM MedQA PaLM 2

Background (Med-PaLM)

Built on PaLM 2

Performance (MedQA and expert evaluation)

Testing and deployment

Limitations

Successors (MedGemma / Gemini in medicine)

References

Improve this article

Related Articles

Med-PaLM

MedGemma

BioBERT

BioGPT

LaMDA

Bard

What links here

Related Articles

Med-PaLM

MedGemma

BioBERT

BioGPT

LaMDA

Bard

What links here