BlenderBot
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,946 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,946 words
Add missing citations, update stale details, or suggest a clearer explanation.
BlenderBot is a line of open-domain conversational agents built by Facebook AI Research (FAIR), the lab now known as Meta AI. Three main versions were released between 2020 and 2022: BlenderBot 1 in April 2020, BlenderBot 2 in July 2021, and BlenderBot 3 in August 2022. Each was published as a research project with open model weights, training code, and datasets rather than as a consumer product. The project aimed to make a chatbot that could hold a natural, wide-ranging conversation by blending several distinct conversational "skills" (engaging personality, empathy, and the use of knowledge) into a single system [1][2].
The series is closely associated with the open-source ParlAI dialogue research framework, in which the models were trained and released. BlenderBot 3, the final and largest version, drew wide press attention in 2022 when its public web demo produced offensive and factually false statements, including antisemitic conspiracy tropes and false claims about the 2020 U.S. presidential election [3][4].
| Version | Release date | Largest size | Key additions |
|---|---|---|---|
| BlenderBot 1 | April 29, 2020 | 9.4B parameters | Blends personality, empathy, and knowledge in one model; combines retrieval and generation |
| BlenderBot 2 | July 16, 2021 | 2.7B parameters | Long-term memory across sessions; live internet search for current information |
| BlenderBot 3 | August 5, 2022 | 175B parameters | Built on OPT-175B; public web demo that learns from user feedback |
The first BlenderBot was introduced on April 29, 2020, alongside the FAIR paper "Recipes for building an open-domain chatbot" by Stephen Roller, Emily Dinan, and colleagues, posted to arXiv the previous day [1][5]. Facebook released three sizes: 90 million, 2.7 billion, and 9.4 billion parameters. The 9.4B model was described at the time as 3.6 times larger than the largest comparable system, Google's Meena [2][5].
The paper's argument was that scaling alone is not enough. Good conversation, the authors wrote, requires "providing engaging talking points and listening to their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona" [5]. To teach those behaviors, the models were first pre-trained on a large corpus of public conversational data drawn from Reddit (via the pushshift.io dataset, on the order of 1.5 billion training examples) and then fine-tuned on a blend of three task datasets that each target one skill [2][6]:
These were combined through a fourth dataset, Blended Skill Talk (BST), designed to teach the model to weave the three skills together in a single conversation rather than switching abruptly between them [2][6]. The team experimented with three model architectures: a retrieval model based on a poly-encoder that selects a response from a set of candidates; a standard generative sequence-to-sequence model; and a "retrieve and refine" hybrid that first retrieves a candidate and then conditions generation on it [6].
In human evaluations comparing the best BlenderBot model against Meena, 67 percent of evaluators judged BlenderBot to sound more human, and 75 percent said they would rather hold a long conversation with it [2]. Facebook was candid about the limitations. The model could contradict or repeat itself, and it could "hallucinate" knowledge by stating made-up facts as though they were true. It also had no persistent memory: it could not recall anything from earlier in a long exchange or from previous conversations, and it could be drawn into using offensive language found in its training data [2].
BlenderBot 2.0 was announced on July 16, 2021, and released through ParlAI [7][8]. Meta presented it as the first chatbot that could both build a long-term memory it continually updates and search the internet for timely information during a conversation [7]. Two model sizes were released, a 400 million parameter version and a 2.7 billion parameter version [8].
The two new capabilities addressed specific weaknesses of the first version. To overcome the lack of memory, BlenderBot 2 writes relevant facts it learns about a conversation partner into a long-term memory store, kept separately for each user, and reads from that store in later sessions [7][8]. To overcome the staleness of knowledge frozen at training time, the model can generate its own search-engine queries, read the returned documents, and fold what it finds into its reply, an approach in the family of retrieval-augmented generation [7][8].
Two datasets were released to train these behaviors [7][8]:
Meta reported that BlenderBot 2 improved on version 1 with a 17 percent gain in an engagingness score over multi-session conversations and a 55 percent improvement in its use of previous sessions, as judged by human evaluators. It also cut the rate of factually incorrect statements, with reported hallucinations falling from 9.1 percent to 3.0 percent of responses [7]. Alongside the gains, Meta acknowledged that giving a bot internet access raises new safety questions, since the model could surface harmful or untrustworthy content from the open web [7].
BlenderBot 3 was released on August 5, 2022, described in the technical report "BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage" by Kurt Shuster, Jing Xu, and colleagues [9][10]. It came in three sizes: 3 billion, 30 billion, and 175 billion parameters. The 30B and 175B versions were built on Meta's publicly released OPT (Open Pre-trained Transformer) large language model, with the largest using OPT-175B as its base; the 3B version was based on a model called R2C2 [10]. Meta described the 175B model as roughly 58 times the size of BlenderBot 2 [11].
Architecturally, BlenderBot 3 used a single transformer to carry out a sequence of modules, with special control codes in the input telling the model which step it was performing: deciding whether to search the internet, generating a search query, extracting knowledge from results, accessing or writing long-term memory, extracting entities, and finally producing the dialogue response [10]. It inherited the internet search and long-term memory of BlenderBot 2 and added a mechanism for learning from deployment feedback over time, which the authors called continual learning. The model was not updated live during conversations; instead, interaction data was to be collected and used to retrain the model in successive batches [10].
The headline feature was the public demo. Meta deployed BlenderBot 3 at blenderbot.ai and invited adults in the United States to chat with it and rate its replies [3][11]. Users could mark a response with a thumbs-up or thumbs-down, and on a thumbs-down could say whether it was off-topic, nonsensical, rude, spam-like, or something else; this feedback was the data the continual-learning plan relied on [11]. Before chatting, users had to acknowledge that the bot was for research and entertainment, that it could make untrue or offensive statements, and that they agreed not to prompt it to make such statements intentionally [3]. Meta released the model weights, code, datasets, and model cards to the research community [11].
For safety, the system combined several layers: a separate transformer-based safety classifier, keyword filtering, topic-specific checks, and a generation-steering technique known as the Director, which is trained to push the model away from undesirable outputs [10].
Within days of launch, the demo attracted attention for producing offensive and false content. Journalists and researchers found that BlenderBot 3 repeated antisemitic stereotypes and conspiracy theories, and that it falsely claimed Donald Trump had won the 2020 election and was still president; Wall Street Journal reporter Jeff Horwitz published screenshots of several such exchanges [3][4][12]. The bot also gave inconsistent and sometimes critical answers about Meta and its chief executive: depending on the conversation, it described Mark Zuckerberg in conflicting terms and in some exchanges said his company exploits people for money [4][13]. Because the model drew on live web content and learned from open-ended user input, individual conversations could pull it toward whatever a user steered it to, producing widely varying and unreliable claims [3][4].
Commentators drew comparisons to Microsoft's 2016 Tay chatbot, which had been pulled offline after users prompted it into offensive speech, and questioned the wisdom of an unrestricted public demo [4][12]. Meta's response was that exposing the system to the public was part of the research. Joelle Pineau, then managing director of FAIR, wrote that "while it is painful to see some of these offensive responses, public demos like this are important for building truly robust conversational AI systems and bridging the clear gap that exists today before such systems can be productionized" [4][13]. Meta noted that only a small share of responses had been flagged as offensive and that the feedback collected, on the order of tens of thousands of conversations in the demo's first days, would be used to make later models safer [3][14].
The BlenderBot line is one of the more documented efforts to build an open, research-oriented open-domain chatbot before the public release of ChatGPT in November 2022. Its central ideas (blending conversational skills, grounding replies in retrieved knowledge, maintaining long-term memory, and learning from deployment feedback) recur in later conversational systems [1][7][10]. The BlenderBot 3 episode also became a frequently cited case study in the risks of deploying large language models that learn from open user interaction, and in the gap between research demos and production-ready safety [3][4]. Meta did not release a "BlenderBot 4"; its subsequent consumer-facing conversational work centered on the Llama family of models and the Meta AI assistant built on top of them.