AI Test Kitchen was a Google application for Android, iOS, and the web that let users interact with experimental generative AI models behind a controlled access program. Announced at Google I/O 2022 and first launched on Android in August 2022, the app was Google's primary public-facing surface for early demos of LaMDA and, later, MusicLM. It served as Google's attempt to gather user feedback on emerging AI capabilities in a controlled way before launching them as full products. The mobile apps were delisted in August 2023, after which the experience continued only on the web. AI Test Kitchen was effectively superseded by Bard, launched in March 2023, which was renamed to the Gemini app in February 2024.
AI Test Kitchen mattered less for what it shipped than for what it signalled. It was Google's first big consumer-facing generative AI release after years of internal research, and it tried to do the cautious, drip-feed thing right at the moment OpenAI was about to throw the doors open with ChatGPT. Whether that turned out well for Google is, depending on who you ask, either a textbook case of being lapped by a less careful competitor or an early sign that responsible AI rollouts are harder than they look.
Google CEO Sundar Pichai introduced AI Test Kitchen during the Google I/O 2022 keynote on May 11, 2022. The app was unveiled alongside LaMDA 2, the second generation of Google's Language Model for Dialogue Applications, which had been announced as a research project the previous year at I/O 2021. Pichai framed AI Test Kitchen as a way for users to "explore, get a feel for, and provide feedback on" emerging Google AI without the company having to commit those capabilities to a finished product first.
During the keynote, Pichai walked through demos in which a user asked LaMDA 2 to describe deep-sea creatures at the Marianas Trench, and a separate demo where the model broke down the steps for planting a vegetable garden into actionable subtasks. He was careful to set expectations: "These are not products," he said, "they are quick sketches that allow us to explore what models like LaMDA 2 can do."
The initial plan was for the app to roll out over "the coming months" in the US, starting with "select academics, researchers, and policymakers" before opening to a broader waitlist. That cautious phasing was deliberate. LaMDA had already attracted controversy: in June 2022, Google engineer Blake Lemoine publicly claimed the model had become sentient based on its responses about self-identity, religion, and Isaac Asimov's Three Laws of Robotics. Google rejected the claim and fired Lemoine on July 22, 2022. The episode raised LaMDA's profile, but it also raised the bar for how carefully any consumer-facing demo had to be presented.
| Date | Event |
|---|---|
| May 11, 2022 | AI Test Kitchen announced at Google I/O 2022 alongside LaMDA 2 |
| June 11, 2022 | Blake Lemoine publicly claims LaMDA is sentient; controversy raises scrutiny |
| July 22, 2022 | Lemoine fired by Google |
| August 25, 2022 | Android app launches in the US with three demos: Imagine It, List It, Talk About It |
| October 18, 2022 | iOS app released in the US App Store |
| November 2, 2022 | "Season 2" preview adds City Dreamer and Wobble, plus expansion to Australia, Canada, Kenya, New Zealand, and the UK |
| November 30, 2022 | OpenAI launches ChatGPT |
| December 2022 | Google reportedly issues internal "Code Red" in response to ChatGPT's growth |
| January 26, 2023 | MusicLM research paper posted to arXiv (2301.11325) |
| February 6, 2023 | Sundar Pichai announces Bard |
| March 21, 2023 | Bard launches publicly with US/UK waitlist |
| May 10, 2023 | Google Labs (labs.google) launches; MusicLM added to AI Test Kitchen as a public demo at Google I/O 2023 |
| August 1, 2023 | AI Test Kitchen mobile apps delisted from Google Play and the App Store; service moves to web only |
| December 6, 2023 | Google announces the Gemini model family |
| February 8, 2024 | Bard renamed to the Gemini app |
The app rotated experimental demos rather than presenting one stable product. Each demo was tied to a specific research model or capability that Google wanted to test publicly. New demos appeared in batches that the company informally called "seasons."
| Demo | Added | Underlying model | What it did |
|---|---|---|---|
| Imagine It | August 2022 | LaMDA 2 | User describes a place or scenario; the model returns immersive descriptions and follow-up questions about the imagined scene |
| List It (also "List It Out") | August 2022 | LaMDA 2 | User states a goal or activity; the model breaks it into a structured list of subtasks, which can be drilled down further |
| Talk About It (Dogs Edition) | August 2022 | LaMDA 2 | Open-ended conversation deliberately constrained to a single topic (initially dogs) to test the model's ability to stay on subject |
| City Dreamer | November 2022 (Season 2 preview) | Imagen-derived text-to-image research | User describes a city; the system generates imagined urban scenes |
| Wobble | November 2022 (Season 2 preview) | Text-to-image plus 2D-to-3D animation | User imagines a monster, which is then animated to dance using a research animation pipeline |
| MusicLM | May 10, 2023 | MusicLM | Text-to-music generation; users describe a musical idea and receive two generated clips, then vote for the preferred one to provide preference data |
The LaMDA-based demos shared a common philosophy. They were narrow, intentionally so, and they routed user output through safety filters that flagged sexually explicit, hateful, violent, illegal, and privacy-violating content. Google noted that response ratings ("nice, offensive, off topic, or untrue") were collected without being linked to user Google accounts, so that the company could refine the model without building a personalised profile of each tester.
The MusicLM demo was the most popular addition by some distance. Users typed a prompt such as "soulful jazz for a dinner party" or "industrial techno that is hypnotic" and received two generated audio clips. The interface borrowed a preference-collection pattern familiar from RLHF training: pick the better track. Google trained MusicLM on a corpus that excluded vocals to sidestep the hardest copyright issues, and the company published the MusicCaps dataset (5.5k expert-annotated clips) alongside the research.
AI Test Kitchen was a thin shell over a small set of research models. Knowing which model is doing the work in any given demo helps explain why the demos felt so different from each other.
LaMDA and LaMDA 2. LaMDA (Language Model for Dialogue Applications) was first announced in 2021 and described in detail in a January 2022 paper by Thoppilan and colleagues. It was a transformer-based model trained on dialogue and stories, with safety and groundedness tuning layered on top. LaMDA 2 was a refresh announced at I/O 2022. The Imagine It, List It, and Talk About It demos all ran on LaMDA 2.
Imagen. Imagen was Google Research's text-to-image diffusion model, originally introduced in May 2022. AI Test Kitchen's Season 2 demos (City Dreamer and Wobble) drew on Imagen-style image generation, though Google never gave testers an open Imagen prompt box, the way Stable Diffusion or DALL-E 2 did at the time. The constraint was deliberate. Google was nervous about uncontrolled image generation, particularly around faces and copyrighted styles, and the structured demo format was the workaround.
MusicLM. The MusicLM model was described in an arXiv paper (2301.11325) by Andrea Agostinelli, Timo Denk and colleagues, posted on January 26, 2023. It cast music generation as a hierarchical sequence-to-sequence task and produced 24 kHz audio that remained coherent over several minutes. When MusicLM appeared in AI Test Kitchen on May 10, 2023, it generated short clips with no copyrighted material in the training data and no support for vocals, both safety choices made before public release.
PaLM and successors. Some later AI Test Kitchen experiments drew on PaLM and PaLM 2 rather than LaMDA. By the time Bard was upgraded to PaLM in March 2023, it was clear LaMDA was being deprecated in favour of the PaLM family, which would in turn give way to Gemini in late 2023.
Google's framing for AI Test Kitchen was "responsible disclosure." The pitch ran something like this: generative AI models were powerful but unreliable, they sometimes produced offensive or false text, and shipping them as polished consumer products was premature. A controlled access app, with a waitlist and a feedback loop, would let Google gather real-world data, stress-test safety filters, and watch for failure modes before any wider release.
In practice, this meant a few specific design choices. Demos were narrow. Each one was scoped to a single task or a single topic, which kept failure modes predictable. Output ratings were collected at the response level. Safety filters caught a defined set of categories. Access required a Google account and agreement to a terms-of-service screen that explicitly warned about possible offensive output. The app was free and US-only at first, with selective international expansion in November 2022.
The contrast with OpenAI's ChatGPT release on November 30, 2022, was immediate and consequential. ChatGPT had no waitlist, no narrow demos, no rotating seasons. You signed in and you typed. Within five days it had a million users; within two months it had a hundred million. Google's careful staging suddenly looked like timidity, and AI Test Kitchen, which had been positioned as bold for its time, looked underpowered as a response.
During its lifetime, AI Test Kitchen was free to use but invite-only. Users registered interest on the AI Test Kitchen website, and Google rolled out access in waves. By November 2022, the app was available on Android and iOS in Australia, Canada, Kenya, New Zealand, the United Kingdom, and the United States, all in English. New testers had to read and acknowledge guidelines about what the model could and could not do, including disclaimers that responses might be inaccurate or offensive.
After the August 1, 2023 delisting of the mobile apps, the only way to use AI Test Kitchen was at aitestkitchen.withgoogle.com. By then, MusicLM was effectively the only active demo, and the LaMDA-based demos had been retired in favour of Bard, which was a far more capable open-ended chatbot built on the same family of models.
The initial coverage of AI Test Kitchen was cautiously positive. Tech press at TechCrunch, Engadget, The Verge, and 9to5Google framed it as a sensible, transparent way to let outsiders touch a real LLM without pretending it was a product. Some critics in early reviews noted the demos felt thin: List It produced fairly generic checklists, Talk About It refused to leave its assigned topic in ways that felt brittle, and Imagine It was charming but limited.
The sentiment changed quickly after ChatGPT. Once a free, general-purpose chatbot was available to anyone with an email address, AI Test Kitchen looked stale. The New York Times and other outlets reported that Google had declared an internal "Code Red" in December 2022, with founders Larry Page and Sergey Brin reportedly returning to executive meetings to discuss Google's response. Pichai later denied issuing a Code Red himself in a 2023 interview, but the company's pivot was unmistakable: by February 6, 2023, Bard was announced, and by March 21 it was in users' hands. AI Test Kitchen, which had been the flagship a few months earlier, was suddenly a sideshow.
Bard absorbed almost all of the consumer attention that AI Test Kitchen had been built to capture. It was a more familiar chat interface, it was not narrowly task-scoped, and it was tied directly to Google Search and Workspace integrations as those rolled out. AI Test Kitchen kept running in parallel, but the team's energy and the model investment moved to Bard.
The rebranding to the Gemini app on February 8, 2024 marked the formal end of Bard as a separate brand. With the Gemini family of models (announced December 6, 2023) underpinning the consumer chatbot, Google no longer needed a separate experimental venue for LaMDA-era demos. New experimental work went either into Google Labs (launched May 10, 2023 at labs.google), into Google AI Studio for developers, or directly into Gemini's own experimental features tab.
MusicLM in AI Test Kitchen did get a successor of sorts. In labs.google, Google launched MusicFX in early 2024, a more polished text-to-music tool with longer outputs, more controls, and SynthID watermarking. The underlying research line evolved into Lyria, a model later integrated into YouTube Shorts and other Google products, with Lyria 2 following in 2024-2025.
It is easy to dismiss AI Test Kitchen as a footnote, the cautious thing Google did just before everyone else stopped being cautious. That underrates it. A few specific things came out of it that mattered.
The waitlist plus rotating-demo model became a template. Anthropic's Claude had a waitlist, OpenAI ran private previews for GPT-4 vision and Sora, and almost every major lab today gates new modalities behind some version of "register your interest." That pattern was not invented at AI Test Kitchen, but the app helped normalise it for consumer-facing generative AI in 2022.
MusicLM became Lyria, which became a real product line. The clips collected from AI Test Kitchen testers were among the first large samples of human preference data on text-to-music output, and that work fed directly into Google DeepMind's later commercial music generation tools.
The LaMDA demos quietly proved out feedback infrastructure that Bard, and then Gemini, would lean on heavily. The thumbs-up and thumbs-down buttons on Gemini today are direct descendants of the "nice, offensive, off topic, or untrue" rating buttons in AI Test Kitchen.
And then there is the harder lesson. Internally at Google, AI Test Kitchen was used during 2022 and 2023 as evidence that the company could be both bold and careful. After ChatGPT, the company's executives concluded that careful was not enough, and they moved fast. AI Test Kitchen sits awkwardly in that arc. It is what Google chose to do when it had time, and it is what Google stopped doing when it ran out of time.
AI Test Kitchen sat in a small but growing category of vendor-run experimental venues for generative AI. The table below compares it with adjacent products from Google and competitors.
| Platform | Vendor | Year launched | Access model | Notable features | Status (2026) |
|---|---|---|---|---|---|
| AI Test Kitchen | 2022 | Waitlist, free, mobile then web only | Rotating demos for LaMDA, Imagen, MusicLM | Mobile apps delisted Aug 2023; web reduced to MusicLM, largely superseded | |
| Bard / Gemini app | 2023 | Open access (waitlist briefly) | General chatbot, Workspace and Search integrations | Active, primary Google consumer AI surface | |
| ChatGPT | OpenAI | November 2022 | Open access, free tier and paid | General chatbot, plugins, Code Interpreter, GPTs | Active, dominant consumer LLM product |
| Claude (claude.ai) | Anthropic | March 2023 | Open access, free tier and paid | General chatbot, long context, Artifacts, Projects | Active |
| Google Labs (labs.google) | May 2023 | Open access, mostly free | Umbrella for NotebookLM, ImageFX, MusicFX, VideoFX, Whisk | Active | |
| Google AI Studio | 2023 | Free with Google account, developer focus | Direct API access to Gemini, prompt design tools | Active | |
| Microsoft Copilot Labs | Microsoft | 2023 | Open access | Experimental Copilot features | Active |
| Meta AI Studio | Meta | 2024 | Open access | Build custom AI characters | Active |
Google Labs, in particular, picked up most of what AI Test Kitchen was supposed to do. Where AI Test Kitchen had to manage a waitlist, gate by region, and ship updates through three platforms, Labs was open by default, web only, and easier to iterate. NotebookLM, ImageFX, MusicFX, VideoFX, and Whisk all live there now. The lineage from AI Test Kitchen's MusicLM demo to MusicFX to Lyria 2 is the cleanest example of the original idea actually working: ship a research demo, gather feedback, productise.
Google did not replace AI Test Kitchen with a single thing. Instead, the work split across several surfaces, each aimed at a different audience.
Google Labs at labs.google is the closest direct successor for consumer-facing experiments. It hosts MusicFX, ImageFX, VideoFX, NotebookLM, Whisk, and a rotating set of others. Google AI Studio is the developer-facing equivalent, with direct prompt-level access to the Gemini model family including Gemini 3 Pro and on-device variants like Gemini Nano. The Gemini app itself ships experimental features behind a Labs toggle for paid users. And specialised research efforts, such as Gemini Robotics, have their own previews.
None of these inherited the original AI Test Kitchen brand, and that is probably deliberate. By 2024, Google had a clearer story to tell about Gemini as a product family, and a separate "experimental playground" name would have muddied it.
AI Test Kitchen launched into one of the most peculiar moments in the public conversation about AI. The Blake Lemoine LaMDA sentience controversy broke in June 2022, just weeks after the app was announced. Lemoine published transcripts in which LaMDA appeared to discuss its own feelings, and the Washington Post ran a profile that put the question of machine consciousness on front pages worldwide. Most AI researchers rejected the sentience claim outright. Yann LeCun, Meta's chief AI scientist, said the relevant neural networks were nowhere near capable of true intelligence.
The controversy nonetheless coloured how AI Test Kitchen was perceived. Reviewers came to the demos already primed to ask whether the model was "alive," and Google had to repeatedly clarify that LaMDA was a language model that produced statistically likely text, not a being. The careful, scoped demos in the app were partly a response to exactly that perception risk.
In parallel, the broader debate about AI safety was shifting. The Center for AI Safety statement and similar high-profile letters about existential risk were still months away in mid-2022, but the foundations were being laid. AI Test Kitchen represented one model for handling the tension: gate access, gather feedback, do not ship until you are confident. ChatGPT represented the opposite. The industry has spent the years since arguing about which approach is better, and the honest answer is that neither side fully won.