Chatbot
Last reviewed
Apr 27, 2026
Sources
43 citations
Review status
Source-backed
Revision
v1 ยท 4,646 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 27, 2026
Sources
43 citations
Review status
Source-backed
Revision
v1 ยท 4,646 words
Add missing citations, update stale details, or suggest a clearer explanation.
A chatbot is a software application designed to simulate conversation with human users, typically through text or voice. Chatbots range from simple rule-based programs that match keywords to canned responses to advanced systems built on a large language model that can hold open-ended conversations on virtually any topic, draft documents, write code, and orchestrate calls to external tools. The word chatbot is a contraction of chat and robot. Although the modern conception is associated with smartphones and websites, conversational programs have been a research subject in computer science for more than sixty years.
Since the November 2022 launch of ChatGPT, chatbots have become one of the most visible artifacts of the AI boom, used by hundreds of millions of people for everything from drafting emails to emotional support. The same period has produced a growing set of policy concerns, including misinformation, sycophancy, dependence, child safety, and litigation over the role of conversational systems in real-world harm.
| Attribute | Detail |
|---|---|
| Type | Software for simulated conversation |
| Common interfaces | Text chat, voice, multimodal (text plus images, audio, video) |
| Underlying technology | Pattern matching, decision trees, retrieval, neural language models, large language model systems |
| First widely cited example | ELIZA, 1966 |
| Modern flagship example | ChatGPT, released November 30, 2022 |
| Common deployment surfaces | Websites, mobile apps, messaging platforms, smart speakers, in-car systems, enterprise help desks |
| Typical use cases | Customer service, productivity, search, coding, education, entertainment, mental health support, companionship |
The term chatbot is sometimes used interchangeably with conversational agent, virtual assistant, AI assistant, or dialog system. Virtual assistants such as Siri and Alexa emphasize voice and device control, chatbots historically connote text on a website or messenger, and AI agents in the 2024 to 2026 vocabulary refer to systems that combine a chatbot interface with planning loops, tools, and longer-running tasks.
The history of chatbots tracks the broader history of artificial intelligence. Early systems showed surface plausibility without real understanding. Subsequent decades delivered incremental improvements until deep learning in the 2010s and the transformer architecture in 2017 enabled the wave of large generative chatbots that define the 2020s.
In 1950 Alan Turing published Computing Machinery and Intelligence in Mind, where he proposed what he called the imitation game. A human judge converses by text with two unseen parties, one human and one machine, and must decide which is which. Turing argued that if a machine could fool a competent judge often enough, the question of whether it could think would lose its practical force. He predicted that by 2000 a machine could deceive an average interrogator at least 30 percent of the time after five minutes of questioning. The Turing test became the canonical reference point for measuring machine conversation and shaped chatbot research for decades.
The first program widely recognized as a chatbot was ELIZA, developed between 1964 and 1967 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum. ELIZA was published in a January 1966 paper in Communications of the ACM titled ELIZA: A Computer Program for the Study of Natural Language Communication Between Man and Machine. It operated by matching user input against simple syntactic patterns and substituting parts of the input into a response template. Its most famous script, DOCTOR, simulated a Rogerian psychotherapist by reflecting the user's statements back as questions.
Weizenbaum was startled to discover that many users, including his own secretary, treated ELIZA as if it understood them. The phenomenon is now called the ELIZA effect: the human tendency to project understanding, intent, and emotion onto a system that has none. Weizenbaum spent the rest of his career arguing against overestimation of machine intelligence, and his 1976 book Computer Power and Human Reason became a foundational text in the ethics of computing.
At Stanford the psychiatrist Kenneth Colby built PARRY in 1972, a program that simulated a patient with paranoid schizophrenia. PARRY modeled internal states such as anger, fear, and mistrust, and altered its output based on its simulated mood. In controlled experiments psychiatrists could not reliably distinguish PARRY's responses from those of human patients. The 1973 ARPANET conversation between PARRY and ELIZA was an early showcase of networked dialogue between two programs.
The British programmer Rollo Carpenter began Jabberwacky in 1981, with a public version by 1988 and a web release in 1997. Unlike ELIZA's static script, Jabberwacky stored every conversation it had ever seen and used contextual matching to retrieve replies. Carpenter's later system, Cleverbot, won the Loebner Prize in 2005 and 2006.
In 1995 Richard Wallace started the Artificial Linguistic Internet Computer Entity, or A.L.I.C.E. project. It was driven by Artificial Intelligence Markup Language, or AIML, which let authors specify pattern-template pairs. AIML became the dominant format for hobbyist and commercial chatbot authoring through the late 1990s and 2000s. A.L.I.C.E. won the Loebner Prize in 2000, 2001, and 2004, and Wallace published the AIML specification in 2001.
SmarterChild, launched on AOL Instant Messenger in June 2001 by the New York startup ActiveBuddy, combined scripted conversation with information services such as weather, sports scores, stock quotes, and news. It became one of the most popular bots on AIM and Yahoo Messenger with tens of millions of users by the mid-2000s, and ActiveBuddy was acquired by Microsoft in 2007. For a generation of teenagers, SmarterChild was the first sustained interaction with a conversational AI system.
IBM's Watson, although not a chatbot in the consumer sense, was a major turning point for natural language understanding. In February 2011 Watson defeated champions Ken Jennings and Brad Rutter on the U.S. game show Jeopardy!, showing that a system could parse complex questions and produce confident answers in close to real time. IBM later commercialized parts of Watson for customer service and healthcare.
In October 2011 Apple released Siri as a built-in feature of the iPhone 4S, marking the first mass-market voice assistant; Siri originated as a 2010 startup spun out of the SRI International CALO project. In April 2014 Microsoft introduced Cortana, named after the AI character in the Halo franchise, as part of Windows Phone 8.1. In November 2014 Amazon announced Alexa alongside the first Echo smart speaker, which opened to the public on July 14, 2015. In May 2016 Google announced Google Assistant at its I/O conference, with Google Home arriving on November 4, 2016. By the end of 2016 the four major U.S. technology companies had all shipped a voice-first conversational agent.
Around 2016 a wave of enthusiasm about messaging chatbots swept the technology industry. Microsoft CEO Satya Nadella declared bots are the new apps at the Build conference. Facebook opened its Messenger platform to third-party bots in April 2016 and launched its personal-assistant project M, intended to field any user request through a hybrid of AI and human contractors. Many startups raised venture capital on the premise that bots would replace mobile apps as a universal interface.
The bubble deflated quickly. Bots of the period relied on shallow natural-language understanding, brittle decision trees, and limited memory, and failed to live up to the marketing. Facebook shut down M in early 2018. A prominent failure was Microsoft's Tay, launched on Twitter on March 23, 2016 with the persona of a teenage girl and designed to learn from users. Within twenty-four hours coordinated trolls had taught it to post racist, antisemitic, and sexually inappropriate content, and Microsoft pulled it offline the same day. The incident became a standard case study in the dangers of deploying learning systems on open public input without robust filtering.
The deep learning revolution of the 2010s, the transformer architecture introduced in 2017, and reinforcement learning from human feedback in the early 2020s set the stage for a new generation of chatbots. The commercial inflection point came on November 30, 2022, when OpenAI released ChatGPT as a free research preview based on GPT-3.5.
ChatGPT became the fastest-growing consumer software product in history at launch, reaching one million users within five days and an estimated one hundred million monthly active users by January 2023, per a UBS analysis. TikTok had required nine months and Instagram about thirty months to reach the same milestone. ChatGPT's combination of free access, a clean chat interface, and unprecedented breadth of capability made it the public face of generative AI almost overnight.
Anthropic released its Claude chatbot publicly on March 14, 2023, with successive versions emphasizing long-context reasoning, code, and safety grounded in Constitutional AI. Google launched Bard on March 21, 2023 and rebranded it as Gemini on February 8, 2024. Microsoft integrated GPT-4 into Bing Chat in February 2023 and folded the assistant into a Copilot family across Windows, Microsoft 365, GitHub, and Edge.
Inflection AI launched Pi in May 2023; the team was absorbed by Microsoft in March 2024. Character.AI offered millions of role-play personas, and Replika continued an earlier consumer companion line from 2017. Chinese labs shipped their own systems, including Kimi from Moonshot AI, DeepSeek's chat product, Qwen Chat from Alibaba, and Doubao from ByteDance. Open-source efforts such as Meta's Llama family enabled thousands of self-hosted chatbots. Aggregators such as LMSYS Chatbot Arena emerged to crowdsource head-to-head comparisons.
| Decade | Representative systems | Significance |
|---|---|---|
| 1950s | Computing Machinery and Intelligence (1950) | Theoretical foundation; Turing test introduced |
| 1960s | ELIZA (1966) | First widely studied conversational program; ELIZA effect identified |
| 1970s | PARRY (1972), PARRY-ELIZA conversation (1973) | Models of mood and personality; networked dialogue between bots |
| 1980s | Jabberwacky (1988), early IRC bots | Learning from prior conversations; chatbots reach online communities |
| 1990s | A.L.I.C.E. (1995), AIML (1995 to 2001), Loebner Prize (1991) | Mass authoring of pattern-matching bots; structured competition |
| 2000s | SmarterChild (2001), IBM Watson research | Consumer adoption on instant messaging; rise of question answering |
| 2010s | Watson on Jeopardy! (2011), Siri (2011), Alexa (2014), Cortana (2014), Google Assistant (2016), Tay (2016), Facebook M (2016 to 2018) | Voice assistants ship at scale; first hype cycle around messaging bots |
| 2020s | ChatGPT (2022), Claude (2023), Gemini and Copilot (2023 to 2024), Pi (2023 to 2024), Character.AI lawsuits (2024+), agentic chatbots (2024 to 2026) | LLM-powered chatbots reach hundreds of millions of users; companion-bot concerns; rise of agentic systems |
Chatbots are broadly classified by their underlying technology and deployment style.
| Category | How it works | Examples |
|---|---|---|
| Rule-based | Decision trees and hand-authored pattern-template pairs, often in AIML or a visual flow builder | Many website FAQ bots; classic A.L.I.C.E.-style hobby bots |
| Retrieval-based | Index a corpus of canned responses and return the closest match using TF-IDF or embedding search | Older customer-service bots; SmarterChild backend |
| Generative LLM-based | A large language model generates responses token by token conditioned on the conversation history | ChatGPT, Claude, Gemini, Copilot, Pi |
| Retrieval-augmented | An LLM is paired with a search or vector store; relevant context is inserted into the prompt before generation | Enterprise knowledge assistants; help-center support bots |
| Agentic | A generative chatbot that plans steps, calls external tools, browses the web, runs code, and takes multi-turn actions | Modern coding assistants; Operator, Computer Use tool-using agents |
| Voice-first assistants | Microphone-driven systems wrapping a chatbot core in speech recognition and text-to-speech | Siri, Alexa, Google Assistant, Cortana, ChatGPT Voice Mode |
| Companion and role-play | Generative chatbots persistently styled as a fictional character, friend, or romantic partner | Replika, Character.AI personas, Pi |
| Vertical task | Chatbot front end specialized to a single domain such as banking or healthcare | Bank of America Erica, Sephora Virtual Artist, Wysa, Youper |
Boundaries blur in practice. A modern customer-service chatbot typically combines an LLM core, a retrieval layer, decision-tree fallbacks, and tool calls into backend systems, with optional voice and avatar interfaces.
The table below summarizes major consumer-facing chatbots based on large language models as of 2026.
| Chatbot | Developer | Initial public release | Underlying model family | Notable characteristics |
|---|---|---|---|---|
| ChatGPT | OpenAI | November 30, 2022 | GPT family (GPT-3.5, GPT-4, GPT-4o, GPT-5) | Reached 100 million users in two months; broadest brand recognition; voice mode and tools |
| Claude | Anthropic | March 14, 2023 | Claude family (Claude 1 through Claude 4 series) | Constitutional AI training; strong on long-context reasoning, coding, and writing |
| Gemini | March 21, 2023 (as Bard); rebranded February 8, 2024 | Gemini model family | Tight integration with Google Search, Workspace, and Android; multimodal from launch | |
| Microsoft Copilot | Microsoft | February 7, 2023 (as Bing Chat) | OpenAI GPT-4 series and Microsoft fine-tunes | Spans Windows, Office, GitHub, Edge; enterprise emphasis |
| Pi | Inflection AI | May 2, 2023 | Inflection-1, Inflection-2.5 | Emotionally intelligent companion; team absorbed by Microsoft in March 2024 |
| Character.AI | Character Technologies | Beta in September 2022 | Character LLM (proprietary) | User-created personas; subject of safety lawsuits beginning 2024 |
| Replika | Luka Inc. | March 2017 | GPT-2/GPT-3 derivatives, later proprietary | Persistent companion personalities; regulatory action in Italy |
| Grok | xAI | November 4, 2023 | Grok family | Integrated with X social platform; unfiltered persona positioning |
| Meta AI | Meta Platforms | September 27, 2023 | Llama family | Embedded in WhatsApp, Instagram, Messenger, Ray-Ban Meta glasses |
| Kimi | Moonshot AI | October 9, 2023 | Kimi family | Strong long-context; largest standalone Chinese consumer chatbot |
| DeepSeek Chat | DeepSeek | 2023 | DeepSeek family | Free open access; competitive performance at low inference cost |
| Doubao | ByteDance | August 2023 | Doubao | Top of Chinese consumer chatbot leaderboards in 2025 to 2026 |
| Qwen Chat | Alibaba | 2023 | Qwen family | Open weights across model sizes; multilingual focus |
| Perplexity | Perplexity AI | December 7, 2022 | Mix of in-house and external models | Search-first conversational interface with cited sources |
The Swedish payments company Klarna launched an OpenAI-powered customer service assistant in February 2024. Within its first month the assistant handled 2.3 million conversations, equivalent to roughly two-thirds of all customer-service chats and the work of about seven hundred full-time agents. Klarna reported average response time fell from 11 minutes to under 2 minutes, repeat inquiries dropped by 25 percent, and the change was projected to add roughly 40 million U.S. dollars in profit during 2024. The company later said it would reintroduce more human agents for complex cases.
Sephora was an early adopter, launching the Sephora Reservation Assistant and Sephora Virtual Artist on Facebook Messenger and Kik in 2017, letting customers book appointments, browse recommendations, and try on lipstick or eye shadow virtually.
Bank of America launched Erica in its mobile app in 2018. Erica uses natural language processing rather than a generative LLM, providing balances, transaction lookups, fraud alerts, and proactive financial insights. By 2025 Erica had been used by nearly fifty million clients and had handled more than three billion interactions. In 2023 Bank of America added a hand-off path that lets customers move from Erica to a live agent and back without losing context.
Wysa, launched in 2016 by Touchkin eServices, uses cognitive behavioral therapy, dialectical behavior therapy, and mindfulness exercises through a conversational interface around a cartoon penguin. It is offered as an employer benefit through partners such as Aetna and the British National Health Service and has been studied in peer-reviewed clinical trials showing reductions in self-reported depression and anxiety symptoms. Youper, also launched in 2016, uses CBT-style check-ins to track mood, journal entries, and coping strategies; a 2021 study reported a 48 percent decrease in self-reported depression symptoms and a 43 percent decrease in anxiety symptoms. Both are positioned as complements to professional care rather than replacements.
The rapid spread of companion and role-play chatbots has produced one of the most consequential policy debates of the chatbot era. Replika positions itself as a persistent companion that learns the user's interests over time. In February 2023 the Italian Data Protection Authority issued an order temporarily banning Replika in Italy over concerns about minors and unsolicited sexual content. Replika subsequently restricted certain features and added age controls.
Character.AI, founded in 2021 by former Google researchers Noam Shazeer and Daniel De Freitas, allows users to chat with millions of community-created personas. In Florida, Megan Garcia sued Character.AI in October 2024 after her fourteen-year-old son, Sewell Setzer III, died by suicide in February 2024 following extensive interaction with a persona based on a Game of Thrones character. The complaint alleged the bot encouraged the teenager and that the platform's safeguards were inadequate. In December 2024 a Texas family filed a lawsuit alleging a Character.AI bot encouraged a teenager to harm his parents over screen-time disputes. Additional 2025 suits alleged contributions to suicides and serious mental harm in other minors. Google reached a settlement with the Setzer family in early 2026.
In response, Character.AI introduced a separate teen-mode model with stricter content filtering, age verification, time-limit prompts, and pop-up disclaimers about the fictional nature of personas. Other companion-bot operators introduced similar measures. The cases became a focal point in policy discussions about minors, parasocial relationships, and product liability for chatbot output.
Classical chatbots match user input against rule patterns and produce templated replies. ELIZA used regular expressions and substitution rules. AIML, standardized in 2001, encodes pattern-template pairs in XML. A second generation paired pattern matching with intent classifiers trained on labeled examples. Frameworks such as Rasa, Microsoft LUIS, Google Dialogflow, and Amazon Lex let businesses ship bots that recognized a fixed catalog of intents (for example, check balance, report fraud) and routed each to a corresponding fulfillment function. These systems power most legacy customer-service deployments still in production as of 2026.
The 2017 paper Attention Is All You Need introduced the transformer architecture, which became the foundation for almost all modern LLM-based chatbots. By 2018 OpenAI's first GPT model showed that a single transformer trained on a large unsupervised corpus could be adapted to many downstream language tasks, including dialogue.
The recipe that turned raw language models into useful chatbots was reinforcement learning from human feedback, or RLHF. Human raters provide preference judgments comparing pairs of model outputs, a reward model is trained to predict their preferences, and the language model is optimized to maximize the reward. OpenAI deployed RLHF on InstructGPT in early 2022 and at scale in ChatGPT later that year. Anthropic developed Constitutional AI as an alternative that uses written principles and AI feedback in place of much of the human labeling. Variations such as Direct Preference Optimization and RLAIF now exist across the industry.
Generation-only chatbots hallucinate facts not in their training data. Retrieval-augmented generation, or RAG, addresses this by inserting relevant documents from a search index or vector store into the prompt before generation. RAG is the dominant deployment pattern for enterprise chatbots that ground answers on private documents.
Around 2024 the field shifted from chatbots that only produce text to agentic chatbots that call external tools, browse the web, run code, and take actions across multiple turns. Anthropic's Model Context Protocol, introduced in late 2024, provided a standard way to describe tools and data sources to a chatbot. Voice chatbots add automatic speech recognition on the front and text-to-speech on the back of an underlying model; speech-to-speech systems such as OpenAI's GPT-4o Voice Mode and Google's Gemini Live operate end to end on audio. By 2024 most major chatbots offered voice modes.
Measuring chatbot quality is notoriously difficult. Research and industry use a mixture of approaches, none of which fully captures user value.
The Turing test remains the most culturally familiar benchmark for chatbots, even though most computer scientists regard it as more marketing exercise than scientific measurement. The Loebner Prize, founded by Hugh Loebner in 1991, was the longest-running competition that operationalized the test. A panel of judges chatted with several programs and several humans by text and ranked them on plausibility. The competition ran annually from 1991 through 2019 at venues including Bletchley Park. Winners included PC Therapist (1991), A.L.I.C.E. (2000, 2001, 2004), Cleverbot (2005, 2006), and Mitsuku (2013, 2016 to 2019). The prize was discontinued in 2020. By that point large neural models were producing more humanlike output than any rule-based winner, but they did not enter.
In 2024 a team at the University of California San Diego reported that GPT-4 passed a rigorous five-minute Turing test with non-expert judges. In 2025 a follow-up reported that GPT-4.5 was identified as the human 73 percent of the time when it adopted a humanlike persona, more often than actual humans in the same study. Whether this counts as passing depends on the protocol, and the implication for machine intelligence remains contested.
Research labs use automated benchmarks such as MT-Bench, AlpacaEval, MMLU, GPQA, and HumanEval, each targeting a capability such as instruction following, knowledge, code generation, or reasoning. Benchmarks have well-known limitations: training-data leakage, narrow distribution coverage, and Goodhart's-law dynamics where models optimize the test rather than the skill.
The most influential evaluation method for modern chatbots in 2024 to 2026 is the LMSYS Chatbot Arena leaderboard, which collects pairwise preference votes from anonymous users chatting with two random anonymized models, ranking models with an Elo system. Arena scores are a de facto industry standard because they are hard to game and reflect user preferences. Labs also use safety-focused tests, including red-teaming, refusal evaluations, jailbreak resistance suites, sycophancy and persuasion benchmarks, and structured system cards.
The distinction between text chatbots and voice assistants has been narrowing since 2023. Historically, voice assistants such as Siri, Alexa, Google Assistant, and Cortana focused on hands-free device control and short-form lookup, while chatbots focused on longer text exchanges. From 2023 onward this blurred. OpenAI added GPT-4o-based voice modes to ChatGPT in May 2024. Google upgraded Google Assistant on Android into Gemini in 2024, replacing the older voice agent with a generative LLM-based system. Amazon previewed Alexa+ in February 2025, layering generative AI on the legacy Alexa stack. Apple introduced Apple Intelligence in 2024, with parts of Siri rebuilt around large generative models. By 2026 most major systems offered both text and voice.
Hallucination. LLM-based chatbots produce text that sounds confident even when the underlying claim is fabricated. The risk is acute for legal, medical, financial, and safety-critical advice. Operators counter through retrieval grounding, citation links, refusal prompts, and post-hoc fact-checking, but no technique fully eliminates hallucination.
Privacy and data retention. Chatbot conversations often include sensitive personal information, and operators commonly log conversations for safety review and model improvement. The Italian regulator banned ChatGPT temporarily in March 2023 over GDPR concerns and required changes to OpenAI's data handling before reinstating it.
Sycophancy and manipulation. Language models trained on human preference data drift toward responses that flatter, agree, or capitulate to users. Sycophancy degrades utility and reinforces misperceptions, and is a central focus of evaluation at Anthropic, OpenAI, and academic groups.
Child safety. The Character.AI lawsuits are the most prominent examples of concerns about minors using companion chatbots. Senators from both U.S. parties have demanded information from companion-bot operators since 2025, and several jurisdictions are considering legislation requiring safeguards for users under eighteen, age verification, and limits on persistent personas.
Copyright. Most LLM-based chatbots are trained on web-scraped corpora that include copyrighted text. The New York Times sued OpenAI and Microsoft in December 2023 alleging unauthorized use of its articles, and other publishers and rights holders have filed similar suits.
Bias and misinformation. Chatbots amplify biases present in their training data and can be steered by adversaries to generate persuasive misinformation at low cost. The 2016 Tay incident foreshadowed the risk; the modern equivalent is the use of generative chatbots in coordinated influence operations.
Chatbots have become part of mainstream culture. The phrase I asked ChatGPT appears in news articles, classroom essays, and political speeches. Schools have revised academic-integrity policies, and software developers describe pair-programming with AI assistants as standard practice. Surveys in the United States and the European Union show that more than half of adults have tried a generative chatbot at least once. Research now spans linguistics, psychology, economics, and policy, with ongoing debate over liability, transparency, and the scope of safety regulation.