Language Learning
Last reviewed
May 13, 2026
Sources
50 citations
Review status
Source-backed
Revision
v2 ยท 5,302 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
50 citations
Review status
Source-backed
Revision
v2 ยท 5,302 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Language Learning ChatGPT Plugins
Language learning has been one of the first consumer markets reshaped by large language models. Beginning in 2023, established apps like Duolingo, Babbel, Pimsleur, Rosetta Stone, and Memrise added generative AI tutors built on GPT-4 and similar systems, while newer startups such as Speak, ELSA, Loora, and Quazel built their products from the ground up around AI voice interaction. By the end of 2024, AI tutors had become the central marketing pitch of nearly every major language app, and the OpenAI Startup Fund had pushed Speak to a $1 billion valuation. The same year, Duolingo laid off around 10 percent of its contractor translators in favor of AI generation, which sparked one of the first high-profile public arguments about generative AI replacing knowledge workers.
Language learning is also one of the first places where the limits of current AI are easy to feel. Speech recognition still struggles with accents from outside the US, UK, and Australia. Generative models hallucinate grammar rules. AI tutors do not get tired, but they also do not actually understand a learner's life, motivations, or progress in the way a human tutor does. The field sits at the intersection of three things people argue about: the future of education, the future of work, and how much we should trust machines to teach us how to talk.
AI in language learning today covers four overlapping product categories. The first is conversational practice, where learners speak or chat with an AI character that plays roles such as a barista, an interviewer, or a friend. The second is pronunciation feedback, where speech recognition models score how close a learner's pronunciation is to a reference. The third is adaptive content generation, where models write practice sentences, lesson plans, and assessment items on demand. The fourth is automated assessment, where machine learning systems score speaking and writing on high-stakes tests like the Duolingo English Test, Pearson Test of English, and Cambridge Linguaskill.
The table below lists the major products and the AI systems they rely on.
| Product | Company | Type of AI | Released |
|---|---|---|---|
| Duolingo Max (Roleplay, Explain My Answer) | Duolingo | GPT-4 | March 2023 |
| Video Call with Lily | Duolingo | GPT-4o | September 2024 |
| Duolingo English Test | Duolingo | In-house plus GPT-3 for item generation | 2016 (pilot 2014) |
| Speak Tutor | Speak | GPT-4 | 2023 |
| ELSA AI Tutor | ELSA | In-house speech recognition plus generative AI | September 2023 |
| Babbel Speak | Babbel | In-house and partner LLMs | September 2025 |
| MemBot | Memrise | GPT-3, later GPT-4 | 2023 |
| Pimsleur AI Conversation Coach | Pimsleur | GPT-4o mini | 2024 |
| Khanmigo | Khan Academy | GPT-4 | March 2023 (pilot) |
| Loora | Loora | In-house plus OpenAI models | 2023 (stealth lift) |
| Univerbal (formerly Quazel) | Univerbal | OpenAI models | 2023 (YC W23) |
| AiBC speaking practice | British Council | In-house plus partner LLMs | 2025 |
| Smart Lesson Generator | Pearson | LLM aligned with Global Scale of English | 2024 |
| Reading Coach | Microsoft | In-house speech recognition | 2022 |
Language technology in education is older than the personal computer. Computer-assisted language learning, usually shortened to CALL, has been a research field since the 1960s. The PLATO mainframe system at the University of Illinois ran Russian and French drills in the 1960s and 1970s. In the 1980s and 1990s, CD-ROM products like the original Rosetta Stone moved CALL into homes. By the late 1990s, integrative CALL combined multimedia with the internet, allowing real-time chat with native speakers through tools like the Mixxer and later Tandem.
The 2010s shifted the field toward mobile apps. Duolingo launched its public site in 2012 and its mobile app in 2013. Babbel had been on the web since 2008 and on mobile since 2010. Memrise opened in 2010 with spaced-repetition flashcards. Busuu, Lingvist, Drops, and HelloTalk all arrived in the same window. By 2019, mobile-assisted language learning, or MALL, had become the dominant form of language study outside formal classrooms. Speech recognition was the main piece of AI in these apps: a learner read a phrase aloud, and the app marked the words it thought it heard in red or green.
The next jump came from transformer models. ChatGPT launched in November 2022 and reached 100 million users by January 2023. Within four months, both Duolingo and Khan Academy launched paid products built on GPT-4, on the same day GPT-4 itself was announced. Speak, which had quietly raised money from the OpenAI Startup Fund in late 2022, became the public face of the OpenAI partnership model. Memrise rebranded its old chatbot ideas around GPT-3 in early 2023. Smaller startups including Loora, Quazel, and Lingotion emerged from stealth or accelerator programs the same year, all built on top of OpenAI APIs. By 2024, almost every consumer language product on the market either had a GPT-powered tutor or was promising one.
Duolingo is the largest consumer language learning app in the world, with more than 100 million monthly active users by 2024. The company was founded in 2011 by Carnegie Mellon professor Luis von Ahn and his graduate student Severin Hacker. Von Ahn had previously invented CAPTCHA and reCAPTCHA; Google acquired reCAPTCHA in 2009. Duolingo went public on the Nasdaq in 2021 and uses a freemium model in which most lessons are free and an ad-free tier called Super Duolingo costs around $7 per month.
The AI story at Duolingo began before ChatGPT. The company has used machine learning since the early 2010s to model spaced repetition (its Half-Life Regression model was published in 2016) and to generate exercises. The Duolingo English Test, which is discussed in its own section below, has been ML-driven since its 2014 pilot.
Duolingo Max was announced on March 14, 2023, the same day OpenAI publicly launched GPT-4. It sits above the Super Duolingo tier and originally cost $29.99 per month or $167.99 per year in the US market. At launch, Max offered two new features:
Duolingo and OpenAI had been collaborating since September 2022. The two teams worked on prompt tuning and safety guardrails so the model would behave like a patient tutor rather than a general chatbot, and they double-checked the accuracy of GPT-4's grammar explanations against Duolingo's curriculum.
At Duocon 2024 on September 24, 2024, Duolingo announced Video Call with Lily, a real-time spoken conversation feature that lets Max subscribers talk to Lily, one of the company's most recognizable characters. Video Call uses GPT-4o under the hood. Lily has a memory system that lets her bring up topics the learner mentioned in previous calls. At launch the feature was available for learners studying English, Spanish, and French.
A 2025 Duolingo white paper on Japanese learners of English reported that learners assigned to use Video Call outperformed a control group on post-test speaking proficiency and on combined listening and speaking proficiency. The same Duocon also introduced Adventures, a story-based feature that drops the learner into animated scenarios with Duolingo characters, and a partnership with Loog for a $249 digital piano for the Music course.
Duolingo has discussed using GPT-class models internally to generate first drafts of sentences, distractors for multiple choice items, and acceptable translation variants. CEO Luis von Ahn has said publicly that AI lets the company build courses dramatically faster than human writers alone, citing in 2025 that AI had completed work in roughly 12 months that previously took 12 years.
Speak is the most prominent example of a language app built around a tight partnership with OpenAI. The company was founded in 2016 by Connor Zwick and Andrew Hsu, both Thiel Fellowship recipients. Zwick had previously built Flashcards+, an education app that Chegg acquired. The two went through Y Combinator's W17 batch and spent the following years developing speech recognition models and an AI tutor system before LLMs were widely available.
Speak launched commercially in Korea in 2019 and grew quietly there for several years. In November 2022, the OpenAI Startup Fund led a $27 million Series B, the first time OpenAI publicly committed venture capital to a consumer education startup. The deal also gave Speak early access to new OpenAI systems. In August 2023, the company raised a $16 million Series B-2 to fund expansion into the US market, with participation from Dropbox co-founders Drew Houston and Arash Ferdowsi.
Speak's product is built around a voice-first AI tutor. Learners speak aloud in scripted lessons and in open-ended conversation, and the tutor responds in real time with corrections, follow-up questions, and explanations. Unlike Duolingo, Speak does not focus on gamified text exercises; the company's pitch is that the only way to learn to speak a language is to actually speak it, out loud, every day.
In December 2024, Speak raised a $78 million Series C led by Accel at a $1 billion valuation, doubling the company's $500 million valuation from a Series B extension six months earlier. Existing investors OpenAI Startup Fund, Khosla Ventures, and Y Combinator participated. At the time of the round, Speak had more than 10 million registered learners in over 40 countries and was reported to be approaching nine-figure annual revenue. Nearly six percent of South Korea's population was learning English on Speak. Total funding raised reached around $162 million.
ELSA Speak, short for English Learning Speech Assistant, is the leading pronunciation-focused language app. It was founded in 2015 by Vu Van, a Vietnamese-born Stanford MBA, and Xavier Anguera, a speech recognition researcher. Vu has said publicly that the idea came from her own experience as an MBA student whose strong Vietnamese accent made it hard to be understood in seminars and at networking events.
ELSA's core differentiator is a proprietary speech recognition model trained on the speech of non-native English speakers, not just native US, UK, and Australian voices. The app gives color-coded phoneme-level feedback on each word the learner says, scoring pronunciation, intonation, fluency, and word stress. ELSA won SXSW's 2016 startup pitch competition, after which the app went viral, gaining 30,000 users in 24 hours.
ELSA's funding history reflects the international nature of the company:
| Round | Year | Amount | Notable investors |
|---|---|---|---|
| Series A | 2019 | $7 million | Gradient Ventures (Google), Monk's Hill Ventures, SOSV |
| Series B | 2021 | $15 million | Vietnam Investments Group, SIG, Gradient Ventures |
| Series C | 2023 | $23 million | UOB Venture Management |
In September 2023, ELSA launched ELSA AI Tutor, a generative AI feature that lets learners hold open-ended conversations with an AI partner. The system uses ELSA's proprietary speech model plus generative AI to manage dialogue and feedback. The tutor remembers a learner's past mistakes and tailors role-play scenarios such as job interview prep, negotiating a raise, and meeting facilitation. By 2024 ELSA reported over 50 million registered users worldwide, with the strongest user bases in Vietnam, Brazil, Japan, India, and Indonesia.
The broader apps market has moved from speech recognition to generative tutors over the 2023 to 2025 window. The table below covers the largest paid apps after Duolingo and Speak.
| App | Founded | AI features | Notes |
|---|---|---|---|
| Babbel | 2007 (Berlin) | Babbel Speak (launched September 2025), pronunciation scoring, AI scenario practice | Subscription-based, lesson content written by linguists. Babbel Speak guides learners through 28 real-life scenarios in English, Spanish, French, Italian, and German, and uses calming animations to reduce anxiety. |
| Pimsleur | 1963 (Simon & Schuster brand since 1997) | AI Voice Coach, AI Conversation Coach, both built on GPT-4o mini | The Conversation Coach launched for Latin American Spanish in 2024 and scaled to 175,000 learners with around 4,000 weekly active users by mid-2025, according to development partner AE Studio. |
| Rosetta Stone | 1992 | TruAccent speech recognition (machine-learning-based pronunciation scoring), revamped human tutoring with AI-assisted lesson plans | Acquired by IXL Learning in March 2021. TruAccent has been improved iteratively rather than relaunched with one big feature. |
| Memrise | 2010 (London) | MemBot (2023), AI Buddies | MemBot was first built on GPT-3 and later upgraded. A Memrise study reported learners had 45 percent lower stress practicing with MemBot than with a human tutor, and that motivation more than doubled. |
| Busuu | 2008 (London) | AI-generated practice prompts, conversation feedback | Owned by Chegg since 2022. |
| Lingvist | 2013 (Tallinn) | Machine-learning-driven adaptive vocabulary order | Acquired by Bitwig in 2020 and later sold; its smart review model was an early production use of machine learning in a language app. |
| Drops | 2015 (Budapest) | Image-based vocabulary, in-app generative scenarios | Acquired by Kahoot in 2020. |
| HelloTalk and Tandem | 2012 and 2015 | AI translation, AI grammar correction layered onto human chat | These are social apps where users practice with each other. AI features sit alongside, not in place of, the human exchange. |
| Loora | 2022 (Tel Aviv) | Voice-first generative English tutor on iOS | Raised $9.25 million seed in June 2023 and $12 million Series A in February 2024. Targets adult professionals. |
| Univerbal (formerly Quazel) | 2022 (Zurich) | Conversational AI tutor across 21 languages with scene builder and grammar analysis | Founded by Philipp Hadjimina, David Niederberger, and Samuel Bissegger out of ETH Zurich. Y Combinator W23 batch. Raised CHF 1.5 million in 2024. |
| Praktika | 2022 | AI avatar tutor for English speaking practice | Reported strong growth on iOS through 2024. |
| TalkPal | 2023 | GPT-4 conversational tutor for around 60 languages | Backed by EU venture investors. |
| LangAI, SmallTalk2Me, Tutor Lily, Sonik, Lingostar | 2023 to 2025 | Various voice and chat tutors | Most are thin wrappers over OpenAI or Anthropic APIs and have struggled to differentiate. |
Babbel's flagship AI feature, Babbel Speak, opened in beta in September 2025. It is positioned as a beginner-friendly conversation trainer that walks the user through short, low-stakes scripts before turning them loose on freer dialogue. The pronunciation analysis behind Babbel Speak was built in-house with Babbel's linguistic team, comparing thousands of phoneme samples per utterance. Babbel had previously added smaller speech-based features in late 2023 and through 2024 as steps toward this product.
The Pimsleur audio method, originally developed by Paul Pimsleur in the 1960s, has long been one of the most respected programs for spoken language. In 2024, Pimsleur began rolling out an AI Conversation Coach for Spanish, built on top of GPT-4o mini with a custom real-time voice processing pipeline. AE Studio, which built the architecture with Pimsleur, has publicly described the pipeline as voice to text, text to model, model output to text-to-speech audio, with cost optimization at every layer. Engagement reportedly reached five times Pimsleur's initial projections.
High-stakes English testing has been one of the most consequential applications of AI in the field. Three tests in particular have built AI into their scoring pipelines: the Duolingo English Test, the Pearson Test of English Academic, and Cambridge Linguaskill.
The Duolingo English Test (DET) is an online, on-demand English proficiency exam. Duolingo started a pilot version called Test Center in 2014, and the test launched commercially in 2016. The DET is the only high-stakes English test that is end-to-end machine learning driven: items are generated by ML, the test is computer-adaptive, the speaking and writing responses are scored by ML, and plagiarism and proctoring are also automated. It uses GPT-3 and later models to generate fresh test items so that questions are not reused across test takers.
The DET takes about one hour, can be taken from a laptop at home 24 hours a day, and costs around $65 compared to roughly $250 for TOEFL or IELTS. Scores range from 10 to 160. Adoption exploded during the COVID-19 pandemic when in-person testing centers closed. By the end of 2024, more than 3,000 institutions accepted DET scores, including all eight Ivy League universities and 98 of the top 100 US News-ranked US schools. In Canada, every English-language U15 university accepts DET. Europe had 489 accepting institutions and programs as of December 2024.
The Pearson Test of English Academic (PTE Academic) was the first major English proficiency test to use automated scoring at scale, beginning in 2009. PTE's writing module uses the Intelligent Essay Assessor, a vector-space model originally developed by Knowledge Analysis Technologies and later acquired by Pearson, while the speaking module is scored by acoustic and language models trained on hundreds of thousands of test taker responses. Pearson reported feeding more than 678,000 examiner-aligned responses into its scoring system in 2020 alone. Test results are usually delivered within 2 days, which is a competitive selling point against the multi-week turnaround of older tests.
Linguaskill is Cambridge University Press and Assessment's computer-adaptive English test. It launched globally in 2018 and uses AI scoring backed by human examiners, a combination Cambridge calls hybrid marking. Reading and listening are auto-marked, writing is also automatically scored, and speaking uses a mix of automated scoring with human review. The test had passed 1 million test takers by 2023 and typically delivers 99 percent of results within 12 hours of completion.
Khanmigo is Khan Academy's AI tutor, announced on March 14, 2023, the same day as GPT-4 and Duolingo Max. The system is built on GPT-4 with custom system prompts that constrain Khanmigo to act as a Socratic tutor: rather than giving an answer, it asks the learner leading questions. Khan Academy used early access to GPT-4 from August 2022 to develop and test the product before its public reveal.
Khanmigo's language tutoring is structured around Khan Academy's existing courses, especially its AP and SAT prep content, but also includes Spanish practice and tutoring built around Khan Academy's grammar exercises. By the 2024 to 2025 school year, Khanmigo had grown from roughly 68,000 pilot users in 2023 to 2024 to more than 700,000 users, mostly inside US schools that piloted the tool. Khan Academy is a nonprofit, which has shaped the way it has marketed Khanmigo: pitched as a way to give every student a personal tutor at low cost rather than as a paid app.
Microsoft has integrated AI language tools into its Education suite. Reading Coach is a free tool, originally launched as a feature of Reading Progress in Microsoft Teams in 2022 and broadened in 2024, that gives learners real-time feedback on pronunciation, syllabification, and reading fluency as they read aloud. Reading Coach auto-detects more than 67 languages and locales and is widely used in US K-12 classrooms for early literacy and English language learner support. Microsoft Teams also bundles Speaker Coach for adult presenters, which gives feedback on pace, filler words, and inclusive language.
Pearson launched its Smart Lesson Generator for English teachers in 2024. The tool uses a generative model behind a teacher-facing interface that produces curriculum-aligned activities, including lesson hooks, vocabulary lists, grammar drills, conversation starters, and exit tickets. Output is filtered and refined against the Global Scale of English, Pearson's proprietary 10 to 90 scale that maps to the Common European Framework of Reference. By 2025 more than 4,000 English teachers worldwide were reported to have access through Pearson's platforms.
The British Council rolled out a similar tool called AiBC in 2025. AiBC simulates conversations with learners and produces a downloadable report on grammar, vocabulary, fluency, and clarity. The Council has been explicit that it positions AiBC as a supplement, not a replacement, for human teachers.
Many learners use general purpose chatbots for language practice rather than dedicated apps. ChatGPT supports voice conversation in dozens of languages, and the GPT Store hosts thousands of custom GPTs for language learning, from beginner Spanish conversation partners to Japanese keigo coaches and IELTS prep tutors. Claude is widely used for written language practice, especially for advanced learners working on writing style, formal register, or comparative grammar.
Research on ChatGPT as a language tutor has expanded rapidly since 2023. A 2024 meta-analysis in Humanities and Social Sciences Communications found that ChatGPT had a large positive effect on learning performance and a moderate positive effect on learning perception and higher-order thinking across studies, although the analysis was later retracted on methodological grounds and resubmitted. Smaller studies in journals such as Computer Assisted Language Learning and ReCALL have reported gains in writing fluency, vocabulary acquisition, and learner motivation when ChatGPT is used as a supplemental tutor.
The limits are also well documented. Studies in Education and Information Technologies and elsewhere have reported that students worry about accuracy, overcorrection, stylistic homogenization toward American English, and a reduction in real human interaction. Teachers report concerns about overreliance and reduced classroom discussion.
For independent learners, a typical workflow is to use a structured app like Duolingo or Pimsleur for daily practice, an AI conversation partner like Speak or ChatGPT Voice for free speaking practice, and a written tool like Claude or ChatGPT for grammar questions, sentence rewrites, and reading comprehension.
In January 2024, news broke that Duolingo had cut around 10 percent of its contractor workforce at the end of 2023, including translators whose work was being replaced by AI-generated drafts that smaller teams of human reviewers cleaned up. Coverage in the Washington Post, CNN, Bloomberg, TechCrunch, and Fortune turned the story into one of the first widely covered examples of generative AI directly displacing knowledge workers in a consumer tech company. Some former contractors said publicly that the work they had been doing took years of formal training and that AI-only drafts produced subtle errors that non-native reviewers were not always positioned to catch.
Duolingo confirmed that GPT-class models were being used to generate sentences for courses, to produce lists of acceptable translations, and to review user error reports faster. The company said human experts continued to validate output against Common European Framework of Reference for Languages standards. Around six months later, in October 2024, Duolingo also let go of contract writers.
The story escalated in April 2025 when CEO Luis von Ahn sent an all-hands email, also posted publicly on LinkedIn, declaring Duolingo an "AI-first" company that would gradually stop using contractors to do work that AI can handle. The reaction online was sharply negative, with many users threatening to cancel subscriptions. A week later, von Ahn walked the framing back, saying he did not see AI replacing employees and that Duolingo was "continuing to hire at the same speed as before." The episode became a case study in how to message AI transitions inside consumer brands.
A second recurring controversy concerns whether AI characters can authentically represent the cultures of the languages they teach. Reviewers and teachers have noted that GPT-style models default to a flat, US-influenced version of formality and small talk that does not reflect, for example, the social conventions of Japanese keigo, Korean nunchi, or French argot. ELSA, Speak, and Duolingo all rely on prompt engineering and curated character backstories to push against this, but the underlying model still has training data biases that surface in small ways.
Several concerns recur across academic literature, journalism, and reviewer commentary.
Accent bias in speech recognition. Reference speech models for pronunciation scoring are usually trained on corpora dominated by US, UK, and Australian native speakers. When a learner with a Nigerian, Indian, Filipino, or Korean accent says a word intelligibly, the system may still mark it wrong because the phoneme boundaries do not match the model's expected distribution. ELSA built its product specifically to address this; other apps have lagged. Studies of consumer speech recognition systems including those from Amazon, Apple, Google, IBM, and Microsoft have documented systematic higher error rates for Black speakers in the US, and similar disparities have been replicated internationally.
Hallucinated grammar. Generative tutors sometimes invent rules, mark correct sentences as wrong, or give contradictory explanations across consecutive responses. This is particularly common when the learner asks questions about morphology in low-resource languages such as Welsh, Tagalog, or Quechua, where the model's training data is thin. Duolingo's Explain My Answer feature has been adjusted multiple times to reduce hallucinated rules, and the company has said human language experts review a sample of model outputs.
Loss of human contact. Critics including the British Council, several CALL researchers, and many language teachers argue that AI tutors are best understood as practice partners, not teachers. The motivation, emotional regulation, and cultural insight that a good teacher provides is, at least so far, not replicable. Several studies have reported that learners self-report higher comfort with AI than with human tutors but lower long-term motivation when human contact is removed entirely.
Privacy of voice data. Apps that collect millions of hours of voice samples from non-native speakers have potential biometric exposure. ELSA, Speak, and Duolingo all publish privacy policies covering voice data, but the underlying business pressure to collect more data to improve models tends to push against minimization.
Replacement of working linguists. The Duolingo case has fueled wider discussion in CALL and translation studies journals about the ethics of training models on the work of professional translators, then using those models to replace them. The 2024 to 2025 controversy did not change the trajectory; if anything, more language companies followed Duolingo's framing through 2025.
Cost of running real-time voice models. Real-time voice tutors are expensive to operate. Cost-per-conversation has been a recurring constraint, which is why companies like Pimsleur and Speak invest heavily in custom voice pipelines using smaller models like GPT-4o mini rather than running every call through a large frontier model.