No Language Left Behind (NLLB)

Meta AI Natural Language Processing Open Source AI

8 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

7 citations

Revision

v2 · 1,542 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

No Language Left Behind (NLLB) is a machine translation research project and model family from Meta AI, announced in July 2022. Its flagship system, NLLB-200, performs direct translation between 200 languages, with a deliberate focus on low-resource languages that earlier commercial and research systems handled poorly or not at all. Meta released the models, the training and mining code, and a new evaluation benchmark called FLORES-200 under an open license for research use. The work was later expanded and peer-reviewed, appearing in Nature in June 2024 under the title "Scaling neural machine translation to 200 languages." ^[1]^[2]^[6]

Goal and context

Most machine translation research has concentrated on a few dozen languages that have large amounts of digitized parallel text. The hundreds of languages spoken by smaller or less digitally represented communities are typically left out, which limits access to information for billions of people. The stated goal of NLLB was to narrow the quality gap between high-resource and low-resource languages while keeping a single model that translates in any direction between the supported languages, rather than routing everything through English. ^[1]^[2]

The arXiv technical report frames the question directly in its abstract: "What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind?" The team began with interviews of native speakers of low-resource languages to understand needs before building datasets and models. ^[2]

The NLLB-200 model

NLLB-200 is built on the Transformer encoder-decoder design but uses conditional computation. The largest model is a Sparsely Gated mixture of experts (MoE), in which the network activates only a subset of its parameters for any given input. In NLLB-200, every fourth Transformer block has its feed-forward layer replaced by a Sparsely Gated Mixture of Experts layer, which lets the model add specialized capacity for individual languages and language families without a proportional increase in compute per token. ^[2]^[3]

The flagship MoE model has roughly 54.5 billion parameters. (Meta's launch blog rounds this to "54B" and the Nature paper rounds it to "55B"; the figure cited consistently in technical write-ups is 54.5B.) It was trained on a corpus of more than 18 billion sentence pairs. Alongside the full MoE model, Meta released several smaller dense models and distilled variants so that researchers without large GPU clusters could use the system. ^[1]^[3]^[6]

Variant	Parameters	Type	Notes
NLLB-200 (MoE)	~54.5B	Sparsely Gated Mixture of Experts	Flagship; highest quality
NLLB-200 (dense)	3.3B	Dense Transformer	Largest dense model
NLLB-200 (dense)	1.3B	Dense Transformer	N/A
NLLB-200-distilled	1.3B	Distilled dense	Distilled from the full model
NLLB-200-distilled-600M	600M	Distilled dense	Smallest; most widely downloaded variant

The distilled 600M and 1.3B checkpoints became the most commonly used in practice because they run on modest hardware while retaining much of the quality of the larger models. The model cards describe NLLB-200 as a research model that is "not released for production deployment," with the primary intended users being the machine translation research community. ^[4]^[6]

Training data and mining

A large share of the engineering work in NLLB went into building parallel data for languages that have very little of it. The team created NLLB-Seed, a set of about 6,000 sentences drawn from Wikipedia and professionally translated into 39 low-resource languages, and NLLB-MD, about 3,000 sentences from non-Wikipedia sources professionally translated into six low-resource languages, to test how well models generalize beyond Wikipedia text. ^[3]

To scale beyond these seed sets, the project used automatic bitext mining: a multilingual sentence-embedding system (LASER3) was used to find sentence pairs that are translations of each other across large web corpora. This pipeline produced a training dataset of over one billion mined sentence pairs covering 148 languages, which was combined with existing parallel corpora and back-translated data. The open-source mining and data-cleaning tooling was released under the name stopes. ^[1]^[3]

FLORES-200 benchmark

To measure quality across so many languages, Meta built and released FLORES-200, an evaluation benchmark that extends the earlier FLORES-101 set to 200 languages. FLORES-200 consists of about 3,000 sentences sampled from the English Wikipedia and professionally translated into each supported language, which allows evaluation across roughly 40,000 translation directions (every supported language paired with every other). Because the same source sentences are translated into all languages, scores are comparable across language pairs. ^[1]^[3]

NLLB-200 was scored with automatic metrics including BLEU, sentence-piece BLEU (spBLEU), and chrF++. Automatic metrics were complemented by human evaluation using a protocol called XSTS (Cross-lingual Semantic Text Similarity), and by a toxicity benchmark, ETOX, that checks for added toxic content across all 200 languages. The Nature paper reports an overall mean calibrated XSTS score of 4.26, with the majority of evaluated directions scoring above the 4.0 "high quality" threshold. ^[2]^[6]

Quality results

The headline result is an average improvement of about 44 percent over the previous state of the art. Meta's launch materials and the Nature abstract both state the 44 percent figure; the launch blog specifies that it is measured across all 10,000 directions of the FLORES-101 benchmark, and the Nature paper states it is "as measured by BLEU." Expressed in absolute terms, NLLB-200 outperformed the nearest prior system by almost +7.3 spBLEU points on average, and that +7.3-point gain corresponds to the 44 percent relative improvement. ^[1]^[2]^[6]

Metric	Result	Source
Average quality gain vs. prior SOTA	+44%	Meta blog; Nature (2024)
Average gain in absolute spBLEU	+7.3 spBLEU	Nature (2024); launch coverage
Gain on some African and Indian languages	> 70%	Meta blog (2022)
Translation directions evaluated	~40,000	FLORES-200
Mean calibrated XSTS (human eval)	4.26	Nature (2024)

Gains were largest for low-resource languages. Meta reported improvements of more than 70 percent over recent systems for several African and Indian languages, and expanded high-quality African language support to 55 languages, up from fewer than 25 in prior tools. The model covers about three times as many low-resource languages as high-resource ones. ^[1]^[6]

Open-source release and license

Meta open-sourced the NLLB-200 models, the FLORES-200 benchmark, the model training code, and the code for recreating the training dataset, with the goal of letting other researchers reproduce and extend the work. The models are distributed under a Creative Commons Attribution-NonCommercial license (CC-BY-NC 4.0), meaning they are available for research and other non-commercial uses with attribution. Meta also announced grants of up to 200,000 US dollars for nonprofit and research uses of NLLB-200 aligned with the UN Sustainable Development Goals. ^[1]^[4]

Application to Wikipedia

Meta worked with the Wikimedia Foundation to apply the technology behind NLLB-200 to Wikipedia's Content Translation tool, which helps volunteer editors translate articles between language editions. With NLLB-200 as a back end, editors gained machine-assisted translation for more than 20 low-resource languages, including 10 that were not previously supported by any machine translation tool. One example cited by Meta is Lingala, spoken by tens of millions of people across Central Africa but represented by only a few thousand Wikipedia articles at the time. ^[1]^[5]

Nature publication (2024)

An expanded, peer-reviewed account of the project, "Scaling neural machine translation to 200 languages," was published in Nature (volume 630, issue 8018, pages 841 to 846) on 5 June 2024, with the DOI 10.1038/s41586-024-07335-x. The lead authors include Marta R. Costa-jussà, James Cross, and Onur Çelebi, with the broader "NLLB Team" credited as authors. The paper describes the conditional-compute MoE model, the FLORES-200 benchmark, the XSTS human evaluation, and the toxicity work, and confirms the 44 percent average BLEU improvement over the prior state of the art. ^[6]

The original technical report, "No Language Left Behind: Scaling Human-Centered Machine Translation," was posted to arXiv on 11 July 2022 (arXiv:2207.04672) and runs to about 190 pages. ^[2]

Relationship to SeamlessM4T

NLLB serves as a foundation for later Meta translation work, most directly SeamlessM4T, the multilingual and multimodal model released in August 2023. SeamlessM4T handles speech-to-text, speech-to-speech, text-to-text, and text-to-speech translation plus automatic speech recognition across about 100 languages. Its text encoder is based on the NLLB architecture, and during training its decoder is guided by token-level knowledge distillation from the NLLB text-to-text model, so NLLB's text translation capability is carried forward into the multimodal system. ^[7]

References

Meta AI, "200 languages within a single AI model: A breakthrough in high-quality machine translation." https://ai.meta.com/blog/nllb-200-high-quality-machine-translation/ ↩
NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, et al., "No Language Left Behind: Scaling Human-Centered Machine Translation," arXiv:2207.04672 (11 July 2022). https://arxiv.org/abs/2207.04672 ↩
InfoQ, "Meta Open-Sources 200 Language Translation AI NLLB-200" (August 2022). https://www.infoq.com/news/2022/08/meta-translation-ai-nllb/ ↩
Hugging Face, "facebook/nllb-200-3.3B" model card (license, intended use, variants). https://huggingface.co/facebook/nllb-200-3.3B ↩
MediaWiki, "Content translation / Machine Translation / NLLB-200." https://www.mediawiki.org/wiki/Content_translation/Machine_Translation/NLLB-200 ↩
NLLB Team et al., "Scaling neural machine translation to 200 languages," *Nature* 630, 841-846 (2024). DOI 10.1038/s41586-024-07335-x. https://www.nature.com/articles/s41586-024-07335-x ↩
Seamless Communication / Meta AI, "SeamlessM4T: Massively Multilingual & Multimodal Machine Translation," arXiv:2308.11596 (2023). https://arxiv.org/abs/2308.11596 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Large Concept Model SeamlessM4T

Goal and context

The NLLB-200 model

Training data and mining

FLORES-200 benchmark

Quality results

Open-source release and license

Application to Wikipedia

Nature publication (2024)

Relationship to SeamlessM4T

See also

References

Improve this article

Related Articles

LLaMA

Llama 3

Wav2Vec

BART (language model)

Mike Lewis

Large Concept Model

What links here

Related Articles

LLaMA

Llama 3

Wav2Vec

BART (language model)

Mike Lewis

Large Concept Model

What links here