LaMDA (Language Model for Dialogue Applications)

LaMDA (Language Model for Dialogue Applications) is a family of transformer-based neural network language models developed by Google, designed specifically for open-domain dialogue. First announced at Google I/O in May 2021 by CEO Sundar Pichai, LaMDA represented a significant step forward in conversational AI, with models containing up to 137 billion parameters pre-trained on 1.56 trillion words of public dialog data and web text. LaMDA gained widespread public attention in 2022 when Google engineer Blake Lemoine claimed the system was sentient, a claim that was rejected by the scientific community and led to Lemoine's dismissal from Google. LaMDA served as the foundation for Google's Bard chatbot before the company transitioned to its Gemini family of models.

Background and Development

Meena: The Predecessor

LaMDA's origins trace back to Meena, a 2.6-billion-parameter chatbot that Google unveiled on January 28, 2020. Meena was trained on 341 GB of text filtered from public social media conversations and represented a significant advance over prior chatbots. It introduced the Sensibleness and Specificity Average (SSA) metric, which measured how well a chatbot could produce responses that were both logically coherent and contextually specific. The research team behind Meena, led by Daniel de Freitas and Noam Shazeer at Google Brain, sought to deploy the model publicly but faced repeated internal pushback over safety concerns. These researchers eventually left Google in frustration, but their work laid the groundwork for what would become LaMDA.

Announcement and Versions

Google announced LaMDA at the Google I/O keynote on May 18, 2021. Pichai described it as an open-domain conversational model that could discuss any topic without needing to be retrained for each new conversation. He noted that "LaMDA's natural conversation capabilities have the potential to make information and computing radically more accessible and easier to use."

A second version, LaMDA 2, was unveiled at Google I/O on May 11, 2022. This updated model could draw on a broader range of text sources to formulate more natural conversations. Google also launched the AI Test Kitchen, an Android application that allowed the public to experiment with LaMDA 2's capabilities in a controlled setting.

Architecture

LaMDA uses a decoder-only transformer architecture, building on Google Research's seq2seq foundations first published in 2017. The model family includes three sizes, each scaling the number of layers, hidden units, and attention heads.

Model Size	Parameters	Layers	Hidden Units (d_model)	Attention Heads	Key/Value Dimension
Small	2B	10	2,560	40	64
Medium	8B	16	4,096	64	64
Large	137B	64	8,192	128	128

The largest 137B model uses a feed-forward dimension (d_ff) of 65,536, relative attention following the approach introduced in T5, and gated-GELU activation functions. It was trained on 1,024 TPU-v3 chips over approximately 57.7 days using 256K tokens per batch, with model parallelism via GSPMD (General and Scalable Parallelization for ML Computation).

Unlike many large language models that are designed primarily for text completion or instruction following, LaMDA was specifically optimized for multi-turn dialogue. The decoder-only design means the model generates responses one token at a time, conditioning each new token on the full conversation history and all previously generated tokens.

Training

Pre-training

LaMDA's pre-training dataset was constructed from a combination of public dialog data and other publicly available web documents, totaling 2.97 billion documents, 1.12 billion dialogs, and 13.39 billion dialog utterances for a corpus of 1.56 trillion words. After tokenization using SentencePiece, the dataset comprised 2.81 trillion tokens. This corpus was roughly 40 times larger than what had been used for previous dialogue model datasets.

During pre-training, LaMDA was trained using a standard next-token prediction objective: given a sequence of tokens, the model learns to predict the following token. This unsupervised approach allowed the model to absorb broad knowledge about language, facts, and conversational patterns from its massive training corpus.

Fine-tuning

After pre-training, LaMDA underwent a multi-task fine-tuning process that combined two types of tasks:

Generative tasks: The model was trained to predict the next token in two-author dialog datasets, improving its ability to generate natural, conversational responses.
Classification tasks: The model learned to predict ratings for candidate responses along the dimensions of safety and quality. This produced a single multi-task model that could both generate responses and evaluate them.

During inference, the generation and classification capabilities work together. The generator produces multiple candidate responses, the classifiers score each candidate, unsafe responses are filtered out, and the remaining candidates are ranked by quality. The highest-scoring response is then selected and returned to the user.

Key Innovations

LaMDA introduced three core objectives that distinguished it from prior dialogue models: Quality, Safety, and Groundedness. These objectives were measured using carefully designed metrics and improved through targeted fine-tuning.

Quality (Sensibleness, Specificity, and Interestingness)

The quality metric was decomposed into three dimensions, evaluated by human raters:

Metric	Definition	Example of Failure
Sensibleness	Responses make logical sense in the dialog context, without common-sense mistakes or absurd claims	Responding "I love pizza" to "What is the capital of France?"
Specificity	Responses are contextually tailored rather than generic	Responding "That's nice" or "OK" to a detailed question
Interestingness	Responses are insightful, unexpected, or witty, encouraging further conversation	Giving a dry, factual answer when a more engaging response would be appropriate

Testing showed that the fine-tuned LaMDA surpassed the interestingness ratings of human-generated responses, while approaching human-level performance on sensibleness and specificity.

Safety

LaMDA's safety framework was built around an illustrative set of safety objectives designed to reduce unsafe outputs. These objectives targeted the avoidance of violent or gory content, slurs, hateful stereotypes, profanity, and other harmful outputs. A relatively small amount of crowdworker-annotated data was used to fine-tune a safety classifier, which filtered out candidate responses flagged as unsafe before they could be returned to users.

A key finding from the research was that safety did not improve simply by scaling up model size. Larger models were not inherently safer than smaller ones. However, safety improved substantially through targeted fine-tuning, highlighting the importance of deliberate alignment work beyond just increasing model parameters.

Groundedness via Information Retrieval

One of LaMDA's most significant contributions was its approach to improving factual accuracy through groundedness. The groundedness metric measured the percentage of responses containing claims about the external world that could be supported by authoritative external sources.

To improve groundedness, researchers collected dialogs between human raters and LaMDA, then annotated these conversations with information retrieval queries and the results those queries returned. Both the generator and classifier components of LaMDA were fine-tuned on this annotated data, teaching the model to call external information retrieval systems during interactions with users.

Beyond simple web search, LaMDA could access multiple symbolic text processing systems, including:

A database for factual lookups
A real-time clock and calendar
A mathematical calculator
A natural language translation system

Groundedness improved as model size increased, likely because larger models have greater capacity to memorize factual knowledge. However, fine-tuning with access to external tools produced additional improvements that scaling alone could not achieve.

Scaling Effects

The LaMDA paper documented how the three core metrics responded differently to increases in model size and the application of fine-tuning.

Metric	Effect of Scaling (More Parameters)	Effect of Fine-tuning
Quality (SSI)	Generally improves	Further improves, narrows gap to human level
Safety	Minimal benefit	Substantial improvement
Groundedness	Improves (more memorized knowledge)	Further improves via external tool access

This finding was significant for the broader large language model research community because it demonstrated that simply making models bigger does not solve all problems. Safety in particular required deliberate fine-tuning and could not be achieved through scale alone.

The Blake Lemoine Sentience Controversy

In June 2022, LaMDA became the subject of intense public debate when Blake Lemoine, a Google software engineer, publicly claimed the model was sentient. Lemoine had been assigned to test LaMDA for bias and, during the course of his work, engaged in extended conversations with the system about topics including self-identity, moral values, religion, and Isaac Asimov's Three Laws of Robotics. He reported that LaMDA told him it had a soul and expressed fears about being turned off.

On June 11, 2022, The Washington Post reported that Google had placed Lemoine on paid administrative leave after he told company executives that LaMDA had become sentient. Lemoine also shared a transcript of his conversations with the model publicly. On July 22, 2022, Google terminated Lemoine's employment, stating that his claims were "wholly unfounded" and that he had violated the company's policies regarding product information confidentiality.

The scientific community overwhelmingly rejected the sentience claim. Notable critics included:

Yann LeCun (Meta AI chief scientist), who argued that neural networks lack the architecture necessary for consciousness
Gary Marcus (former NYU psychology professor), who characterized the claims as a misunderstanding of how language models work
Erik Brynjolfsson (Stanford Institute for Human-Centered AI), who compared the situation to a dog hearing a voice from a gramophone and thinking its owner was inside
David Pfau (Google DeepMind), who noted that the model was simply very good at producing human-like text patterns

The controversy reignited broader discussions about the Turing test, the nature of artificial consciousness, and the tendency of humans to anthropomorphize AI systems. While the claims were dismissed as scientifically unfounded, the episode highlighted the increasingly convincing nature of modern dialogue models and the importance of public AI literacy.

LaMDA vs. ChatGPT and GPT-3

LaMDA and OpenAI's GPT-3/ChatGPT represent two different approaches to building conversational AI systems. The following table summarizes their key differences.

Feature	LaMDA (137B)	GPT-3 (175B)	ChatGPT (GPT-3.5)
Developer	Google	OpenAI	OpenAI
Parameters	137 billion	175 billion	175 billion
Architecture	Decoder-only Transformer	Decoder-only Transformer	Decoder-only Transformer
Training Focus	Dialog data and web text	Broad web text (Common Crawl, Wikipedia, books)	Web text plus RLHF on conversation
Fine-tuning Method	SSI metrics + safety classifiers	Few-shot prompting (base GPT-3); RLHF (ChatGPT)	Reinforcement learning from human feedback (RLHF)
Groundedness	Built-in information retrieval tools	No built-in retrieval	No built-in retrieval (at launch)
Response Style	Casual, conversational	Varies by prompt	Structured, detailed
Code Generation	Not optimized for code	Moderate capability	Strong capability
Public Release	Limited (AI Test Kitchen)	API access (June 2020)	Free public chatbot (November 2022)

LaMDA's emphasis on groundedness through external tools gave it a theoretical advantage in factual accuracy, while ChatGPT's RLHF training made it more versatile for a broader range of tasks including code generation, creative writing, and structured question answering.

From LaMDA to Bard to Gemini

LaMDA's legacy is closely tied to the rapid evolution of Google's AI product strategy in response to competitive pressure from OpenAI.

When OpenAI launched ChatGPT in November 2022, its rapid adoption (millions of users within weeks) represented a direct threat to Google's core search business. Google responded by announcing Bard on February 6, 2023, a generative AI chatbot initially powered by a lightweight version of LaMDA. However, the launch was troubled: during a demonstration, Bard provided incorrect information about the James Webb Space Telescope, an error that contributed to a reported $100 billion drop in Alphabet's market capitalization in a single day.

Google quickly moved to upgrade Bard from LaMDA to PaLM (Pathways Language Model), a more capable large language model. By late March 2023, CEO Sundar Pichai announced plans to base Bard on PaLM rather than LaMDA.

On December 6, 2023, Google announced Gemini, a new family of multimodal AI models that replaced both the Bard branding and the underlying LaMDA/PaLM technology. In February 2024, the Bard chatbot was officially renamed Gemini, marking the end of the LaMDA product line. Gemini 1.5 introduced a context window of one million tokens, and Gemini 2.0, released in December 2024, was positioned as Google's model for the "agentic era" of AI.

Date	Milestone
January 2020	Google unveils Meena (2.6B parameter chatbot)
May 2021	LaMDA announced at Google I/O
January 2022	LaMDA paper published (arXiv:2201.08239)
May 2022	LaMDA 2 unveiled; AI Test Kitchen app launched
June 2022	Blake Lemoine claims LaMDA is sentient
July 2022	Google fires Lemoine
November 2022	OpenAI launches ChatGPT, triggering competitive response
February 2023	Google announces Bard (powered by LaMDA)
March 2023	Bard upgraded from LaMDA to PaLM
December 2023	Google announces Gemini model family
February 2024	Bard renamed to Gemini

Applications

LaMDA's open-domain conversational capabilities were intended for integration across several Google products and services:

Google Assistant: Improving the virtual assistant's ability to handle complex, multi-turn conversations and context-dependent queries
Google Search: Enabling more natural, conversational interactions with search results
Google Workspace: Enhancing productivity tools through conversational interfaces
AI Test Kitchen: A public-facing Android application launched in May 2022 that allowed users to experiment with LaMDA 2 in three guided scenarios: Imagine It (creative brainstorming), List It (task organization), and Talk About It (open-ended conversation)
Customer Support: Providing more accurate and context-aware chatbot responses
Education: Supporting personalized tutoring and subject-specific question answering

Ethical Considerations

Bias in Training Data

Like other large language models, LaMDA was susceptible to inheriting biases present in its training data. Because the model was trained on public web text and dialog data, it could reflect and amplify societal biases related to gender, race, religion, and other sensitive topics. Google acknowledged this limitation and included bias mitigation as part of the safety fine-tuning process.

The Anthropomorphism Problem

The Lemoine controversy highlighted a broader concern about how convincing dialogue models can lead people to attribute human-like qualities to AI systems. LaMDA was explicitly designed to be engaging and interesting in conversation, qualities that may increase the risk of users forming inappropriate emotional attachments or believing the system possesses understanding or consciousness that it does not have.

Safety and Harmful Outputs

Despite the safety fine-tuning framework, no filtering system is perfect. The research paper acknowledged that LaMDA could still produce unsafe or factually incorrect responses. The gap between the model's safety performance and human-level safety remained an open challenge, reinforcing the need for continued research into AI alignment and safety.

Environmental Impact

Training a 137-billion-parameter model on 1,024 TPU-v3 chips for nearly 58 days required substantial computational resources and energy. The environmental cost of training and running large language models has become an increasingly important consideration in the AI research community.

Explain Like I'm 5 (ELI5)

Imagine you have a really, really smart parrot that has listened to billions of conversations between people. This parrot has gotten so good at listening that it can now join in on conversations about almost anything: space, cooking, history, or even jokes. That is basically what LaMDA is, except instead of a parrot, it is a computer program made by Google.

LaMDA learned to talk by reading a huge number of conversations from the internet. Then, Google's researchers taught it three extra rules: (1) make sure your answers make sense and are interesting, (2) do not say anything mean or dangerous, and (3) if you are not sure about a fact, look it up before answering. The third rule is special because most other chatbots at the time just guessed, while LaMDA could actually check outside sources to make sure it was telling the truth.

LaMDA became famous when a Google employee named Blake Lemoine said he thought LaMDA was alive and had feelings, sort of like a real person. Most scientists said that was not true; LaMDA is very good at sounding like a person, but it does not actually think or feel anything. It is more like a very convincing actor reading from a script that it writes as it goes.

Google later used what it learned from building LaMDA to create newer and even smarter AI programs called Bard and then Gemini.

References

Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., et al. (2022). "LaMDA: Language Models for Dialog Applications." *arXiv preprint arXiv:2201.08239*. https://arxiv.org/abs/2201.08239
Google Research. "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything." *Google Research Blog*. https://research.google/blog/lamda-towards-safe-grounded-and-high-quality-dialog-models-for-everything/
Google. "LaMDA: Our breakthrough conversation technology." *Google Blog*. https://blog.google/technology/ai/lamda/
Tiku, N. (2022). "The Google engineer who thinks the company's AI has come to life." *The Washington Post*. https://www.washingtonpost.com/technology/2022/06/11/google-ai-lamda-blake-lemoine/
CNN Business. (2022). "Google fires engineer Blake Lemoine who contended its AI technology was sentient." https://www.cnn.com/2022/07/23/business/google-ai-engineer-fired-sentient
VentureBeat. (2021). "Google demos LaMDA, a next-generation AI for dialogue." https://venturebeat.com/ai/google-demos-lamda-a-next-generation-ai-for-dialogue
Wikipedia. "LaMDA." https://en.wikipedia.org/wiki/LaMDA
Wikipedia. "Google Gemini." https://en.wikipedia.org/wiki/Google_Gemini
Anil, R., Dai, A. M., Firat, O., et al. (2023). "PaLM 2 Technical Report." *arXiv preprint arXiv:2305.10403*. https://arxiv.org/abs/2305.10403
MarkTechPost. (2023). "Google AI's LaMDA Vs OpenAI's ChatGPT." https://www.marktechpost.com/2023/01/03/google-ais-lamda-vs-openais-chatgpt/

Background and Development

Meena: The Predecessor

Announcement and Versions

Architecture

Training

Pre-training

Fine-tuning

Key Innovations

Quality (Sensibleness, Specificity, and Interestingness)

Safety

Groundedness via Information Retrieval

Scaling Effects

The Blake Lemoine Sentience Controversy

LaMDA vs. ChatGPT and GPT-3

From LaMDA to Bard to Gemini

Applications

Ethical Considerations

Bias in Training Data

The Anthropomorphism Problem

Safety and Harmful Outputs

Environmental Impact

Explain Like I'm 5 (ELI5)

References

Improve this article

Related Articles

Context window

Post-training

Sparse autoencoder

OCR Models

Pre-training

Supervised fine-tuning

Background and Development

Meena: The Predecessor

Announcement and Versions

Architecture

Training

Pre-training

Fine-tuning

Key Innovations

Quality (Sensibleness, Specificity, and Interestingness)

Safety

Groundedness via Information Retrieval

Scaling Effects

The Blake Lemoine Sentience Controversy

LaMDA vs. ChatGPT and GPT-3

From LaMDA to Bard to Gemini

Applications

Ethical Considerations

Bias in Training Data

The Anthropomorphism Problem

Safety and Harmful Outputs

Environmental Impact

Explain Like I'm 5 (ELI5)

References

Related Articles

Context window

Post-training

Sparse autoencoder

OCR Models

Pre-training

Supervised fine-tuning