No Language Left Behind (NLLB)
Last reviewed
Jun 3, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,548 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,548 words
Add missing citations, update stale details, or suggest a clearer explanation.
No Language Left Behind (NLLB) is a machine translation research project and model family from Meta AI, announced in July 2022. Its flagship system, NLLB-200, performs direct translation between 200 languages, with a deliberate focus on low-resource languages that earlier commercial and research systems handled poorly or not at all. Meta released the models, the training and mining code, and a new evaluation benchmark called FLORES-200 under an open license for research use. The work was later expanded and peer-reviewed, appearing in Nature in June 2024 under the title "Scaling neural machine translation to 200 languages." [1][2][6]
Most machine translation research has concentrated on a few dozen languages that have large amounts of digitized parallel text. The hundreds of languages spoken by smaller or less digitally represented communities are typically left out, which limits access to information for billions of people. The stated goal of NLLB was to narrow the quality gap between high-resource and low-resource languages while keeping a single model that translates in any direction between the supported languages, rather than routing everything through English. [1][2]
The arXiv technical report frames the question directly in its abstract: "What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind?" The team began with interviews of native speakers of low-resource languages to understand needs before building datasets and models. [2]
NLLB-200 is built on the Transformer encoder-decoder design but uses conditional computation. The largest model is a Sparsely Gated mixture of experts (MoE), in which the network activates only a subset of its parameters for any given input. In NLLB-200, every fourth Transformer block has its feed-forward layer replaced by a Sparsely Gated Mixture of Experts layer, which lets the model add specialized capacity for individual languages and language families without a proportional increase in compute per token. [2][3]
The flagship MoE model has roughly 54.5 billion parameters. (Meta's launch blog rounds this to "54B" and the Nature paper rounds it to "55B"; the figure cited consistently in technical write-ups is 54.5B.) It was trained on a corpus of more than 18 billion sentence pairs. Alongside the full MoE model, Meta released several smaller dense models and distilled variants so that researchers without large GPU clusters could use the system. [1][3][6]
| Variant | Parameters | Type | Notes |
|---|---|---|---|
| NLLB-200 (MoE) | ~54.5B | Sparsely Gated Mixture of Experts | Flagship; highest quality |
| NLLB-200 (dense) | 3.3B | Dense Transformer | Largest dense model |
| NLLB-200 (dense) | 1.3B | Dense Transformer | N/A |
| NLLB-200-distilled | 1.3B | Distilled dense | Distilled from the full model |
| NLLB-200-distilled-600M | 600M | Distilled dense | Smallest; most widely downloaded variant |
The distilled 600M and 1.3B checkpoints became the most commonly used in practice because they run on modest hardware while retaining much of the quality of the larger models. The model cards describe NLLB-200 as a research model that is "not released for production deployment," with the primary intended users being the machine translation research community. [4][6]
A large share of the engineering work in NLLB went into building parallel data for languages that have very little of it. The team created NLLB-Seed, a set of about 6,000 sentences drawn from Wikipedia and professionally translated into 39 low-resource languages, and NLLB-MD, about 3,000 sentences from non-Wikipedia sources professionally translated into six low-resource languages, to test how well models generalize beyond Wikipedia text. [3]
To scale beyond these seed sets, the project used automatic bitext mining: a multilingual sentence-embedding system (LASER3) was used to find sentence pairs that are translations of each other across large web corpora. This pipeline produced a training dataset of over one billion mined sentence pairs covering 148 languages, which was combined with existing parallel corpora and back-translated data. The open-source mining and data-cleaning tooling was released under the name stopes. [1][3]
To measure quality across so many languages, Meta built and released FLORES-200, an evaluation benchmark that extends the earlier FLORES-101 set to 200 languages. FLORES-200 consists of about 3,000 sentences sampled from the English Wikipedia and professionally translated into each supported language, which allows evaluation across roughly 40,000 translation directions (every supported language paired with every other). Because the same source sentences are translated into all languages, scores are comparable across language pairs. [1][3]
NLLB-200 was scored with automatic metrics including BLEU, sentence-piece BLEU (spBLEU), and chrF++. Automatic metrics were complemented by human evaluation using a protocol called XSTS (Cross-lingual Semantic Text Similarity), and by a toxicity benchmark, ETOX, that checks for added toxic content across all 200 languages. The Nature paper reports an overall mean calibrated XSTS score of 4.26, with the majority of evaluated directions scoring above the 4.0 "high quality" threshold. [2][6]
The headline result is an average improvement of about 44 percent over the previous state of the art. Meta's launch materials and the Nature abstract both state the 44 percent figure; the launch blog specifies that it is measured across all 10,000 directions of the FLORES-101 benchmark, and the Nature paper states it is "as measured by BLEU." Expressed in absolute terms, NLLB-200 outperformed the nearest prior system by almost +7.3 spBLEU points on average, and that +7.3-point gain corresponds to the 44 percent relative improvement. [1][2][6]
| Metric | Result | Source |
|---|---|---|
| Average quality gain vs. prior SOTA | +44% | Meta blog; Nature (2024) |
| Average gain in absolute spBLEU | +7.3 spBLEU | Nature (2024); launch coverage |
| Gain on some African and Indian languages | > 70% | Meta blog (2022) |
| Translation directions evaluated | ~40,000 | FLORES-200 |
| Mean calibrated XSTS (human eval) | 4.26 | Nature (2024) |
Gains were largest for low-resource languages. Meta reported improvements of more than 70 percent over recent systems for several African and Indian languages, and expanded high-quality African language support to 55 languages, up from fewer than 25 in prior tools. The model covers about three times as many low-resource languages as high-resource ones. [1][6]
Meta open-sourced the NLLB-200 models, the FLORES-200 benchmark, the model training code, and the code for recreating the training dataset, with the goal of letting other researchers reproduce and extend the work. The models are distributed under a Creative Commons Attribution-NonCommercial license (CC-BY-NC 4.0), meaning they are available for research and other non-commercial uses with attribution. Meta also announced grants of up to 200,000 US dollars for nonprofit and research uses of NLLB-200 aligned with the UN Sustainable Development Goals. [1][4]
Meta worked with the Wikimedia Foundation to apply the technology behind NLLB-200 to Wikipedia's Content Translation tool, which helps volunteer editors translate articles between language editions. With NLLB-200 as a back end, editors gained machine-assisted translation for more than 20 low-resource languages, including 10 that were not previously supported by any machine translation tool. One example cited by Meta is Lingala, spoken by tens of millions of people across Central Africa but represented by only a few thousand Wikipedia articles at the time. [1][5]
An expanded, peer-reviewed account of the project, "Scaling neural machine translation to 200 languages," was published in Nature (volume 630, issue 8018, pages 841 to 846) on 5 June 2024, with the DOI 10.1038/s41586-024-07335-x. The lead authors include Marta R. Costa-jussà, James Cross, and Onur Çelebi, with the broader "NLLB Team" credited as authors. The paper describes the conditional-compute MoE model, the FLORES-200 benchmark, the XSTS human evaluation, and the toxicity work, and confirms the 44 percent average BLEU improvement over the prior state of the art. [6]
The original technical report, "No Language Left Behind: Scaling Human-Centered Machine Translation," was posted to arXiv on 11 July 2022 (arXiv:2207.04672) and runs to about 190 pages. [2]
NLLB serves as a foundation for later Meta translation work, most directly SeamlessM4T, the multilingual and multimodal model released in August 2023. SeamlessM4T handles speech-to-text, speech-to-speech, text-to-text, and text-to-speech translation plus automatic speech recognition across about 100 languages. Its text encoder is based on the NLLB architecture, and during training its decoder is guided by token-level knowledge distillation from the NLLB text-to-text model, so NLLB's text translation capability is carried forward into the multimodal system. [7]