Yi (language model)
Last reviewed
Jun 3, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,864 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,864 words
Add missing citations, update stale details, or suggest a clearer explanation.
Yi is a series of open large language models developed by the Chinese startup 01.AI (Chinese: 零一万物, Lingyiwanwu). The first models, Yi-6B and Yi-34B, were released in November 2023 as bilingual English and Chinese base models trained from scratch, and were notable both for ranking at or near the top of open-model leaderboards and for a public discussion over the fact that they reuse the Llama 2 architecture. Over the following year 01.AI extended the family with chat models, 200,000-token long-context variants, the depth-upscaled Yi-9B, the multimodal Yi-VL, the Yi-1.5 refresh, and the code-specialized Yi-Coder. The accompanying technical report, "Yi: Open Foundation Models by 01.AI", was posted to arXiv in March 2024 as arXiv:2403.04652. [1]
The company's later closed API models, Yi-Large and Yi-Lightning, are covered in separate articles and are only summarized here.
01.AI was founded in March 2023 by Kai-Fu Lee, a former president of Google China and head of Microsoft Research Asia who also co-founded the venture firm Sinovation Ventures. [2][3] The company is headquartered in Beijing. Within roughly eight months of its founding it reached a valuation above one billion US dollars, reaching unicorn status, with backing that later included Alibaba and Xiaomi. [3][4] Lee positioned 01.AI as China's answer to leading US labs, and the Yi series was its flagship open release.
The name "Yi" is written 一 in Chinese (the numeral "one"), tying the model line to the company's "zero-one" branding.
01.AI open-sourced the Yi-6B and Yi-34B base models on 2 November 2023, with a default context window of 4,096 tokens. [5] Both were pretrained on a deduplicated, quality-filtered bilingual corpus of 3.1 trillion tokens of English and Chinese text, with data cleaning toward higher quality through a multi-stage filtering pipeline rather than raw scale. [1] Three days later, on 5 November 2023, the long-context variants Yi-6B-200K and Yi-34B-200K were released, extending the usable context to 200,000 tokens. [5][6] Instruction-tuned chat models (Yi-6B-Chat and Yi-34B-Chat), plus 4-bit (AWQ) and 8-bit (GPTQ) quantized versions, followed on 23 November 2023. [5]
Yi-34B drew immediate attention because, on the day it appeared, Hugging Face's Open LLM Leaderboard placed it first among open pretrained base models, ahead of larger systems such as Llama 2 70B and Falcon-180B, in both English and Chinese evaluations. [4][5] The 34B model is small enough to run inference on a single high-memory GPU after quantization, which contributed to its adoption.
The release lineup is summarized below.
| Model | Parameters | Context | Released | Notes |
|---|---|---|---|---|
| Yi-6B / Yi-34B | 6B, 34B | 4K | 2 Nov 2023 | Base models |
| Yi-6B-200K / Yi-34B-200K | 6B, 34B | 200K | 5 Nov 2023 | Long context |
| Yi-6B-Chat / Yi-34B-Chat | 6B, 34B | 4K | 23 Nov 2023 | Instruction-tuned |
| Yi-VL-6B / Yi-VL-34B | 6B, 34B | 4K | 23 Jan 2024 | Vision-language |
| Yi-9B | 9B | 4K | 6 Mar 2024 | Depth-upscaled |
| Yi-9B-200K | 9B | 200K | 16 Mar 2024 | Long context |
| Yi-1.5 (6B/9B/34B) | 6B, 9B, 34B | 4K to 32K | 13 May 2024 | Apache 2.0 refresh |
| Yi-Coder (1.5B/9B) | 1.5B, 9B | 128K | 5 Sep 2024 | Code models |
Yi uses the now-standard decoder-only transformer design popularized by Llama. According to the technical report, Yi sets a vocabulary size of 64,000, uses Grouped-Query Attention (GQA), Rotary Position Embedding (RoPE) with an adjusted base frequency, and a SwiGLU feed-forward block. [1] The two released base sizes have the following configuration:
| Hyperparameter | Yi-6B | Yi-34B |
|---|---|---|
| Hidden size | 4,096 | 7,168 |
| Layers | 32 | 60 |
| Query heads | 32 | 56 |
| Key/value heads | 4 | 8 |
| Pretraining sequence length | 4,096 | 4,096 |
| Vocabulary | 64,000 | 64,000 |
Shortly after the November 2023 release, developers noticed that Yi-34B's weights on Hugging Face were structurally identical to Llama 2 except that two tensors had been renamed, so that tooling built for Llama did not load Yi out of the box. [7][8] A community member opened a pull request asking 01.AI to revert the names for compatibility, and the episode was picked up by the press as a question of whether 01.AI had obscured the model's lineage. [9]
01.AI acknowledged the renaming. Richard Lin, the company's head of open-source work, called it "an oversight," saying the team had renamed components "to meet experimental requirements" during training and then "kinda dropped the ball and didn't switch them back before pushing out our release," adding "We're sorry for the confusion." [9] The company said it had changed the names to test the model thoroughly and had no intention of masking the source, and it committed to restoring Llama-compatible tensor names. [7][9]
Researchers were careful to separate two distinct claims. Sharing the Llama architecture is not the same as being a derivative of Llama: EleutherAI noted in a fact-check that essentially all modern LLMs are assembled from the same published building blocks (the 2017 transformer, RoPE, SwiGLU, pre-normalization, and multi-query or grouped-query attention), none of which originated with Meta, and that Yi did not use Llama's weights. [10] What actually differentiates such models is the training data, pipeline, and infrastructure, all of which 01.AI states it built independently. [1][10] 01.AI's own model card frames it the same way: "The Yi series models adopt the same model architecture as Llama but are NOT derivatives of Llama." [11]
The 200K variants were produced by lightweight continual pretraining on longer sequences rather than by training from scratch at full length. [1] On the "needle in a haystack" retrieval test, which probes whether a model can recover a single planted fact from a long document, 01.AI reported that Yi-34B-200K reached about 99.8 percent retrieval accuracy. [6] A 200,000-token window corresponds to roughly 300,000 English words or several hundred thousand Chinese characters, which at the time was among the longest context lengths available in any openly released model.
Yi-VL, released on 23 January 2024 in 6B and 34B sizes, is the multimodal member of the family. It pairs a vision encoder initialized from CLIP ViT-H/14 with a two-layer multilayer perceptron projection and the Yi-Chat language model, trained in stages at 224 and 448 pixel resolutions. [1] 01.AI reported that Yi-VL-34B led open models on the MMMU and CMMMU multimodal benchmarks at release. [12]
Yi-9B (6 March 2024) was created by depth upscaling rather than training a 9B model directly: 01.AI duplicated the middle layers of Yi-6B (extending it from 32 to 48 layers) and continued pretraining on roughly 800 billion additional tokens, yielding a model that was strong for its size in code and mathematics. [1][5] A Yi-9B-200K long-context version followed on 16 March 2024. [5]
Yi-1.5, released on 13 May 2024 in 6B, 9B, and 34B sizes, continued pretraining the original Yi models on a further 500 billion tokens for a total corpus of 3.6 trillion tokens, then fine-tuned on a larger instruction set, improving coding, math, and reasoning. [11][13] Yi-1.5 base and chat models shipped with 4K context plus 16K and 32K extended variants, and this release moved the weights to the permissive Apache 2.0 license. [11]
Yi-Coder, released on 5 September 2024, is a code-specialized line in 1.5B and 9B sizes with a 128,000-token context window. The 9B version was built by continuing to train Yi-9B on 2.4 trillion tokens drawn from a GitHub code corpus and filtered web data spanning 52 programming languages. [14][15] 01.AI reported that Yi-Coder-9B-Chat scored 23.4 percent on LiveCodeBench, which it described as the only model under 10 billion parameters to pass 20 percent and as competitive with much larger code models such as DeepSeek-Coder-33B-Instruct. [14][15]
The technical report and model cards report the following figures for the original Yi models. Yi-34B-Chat placed second on the AlpacaEval leaderboard behind GPT-4 Turbo, ahead of GPT-4, Mixtral, and Claude, based on data available in early 2024. [11][16]
| Model | MMLU (5-shot) | AlpacaEval win-rate | Chatbot Arena Elo |
|---|---|---|---|
| Yi-34B (base) | 73.5 | N/A | N/A |
| Yi-34B-Chat | N/A | 94.08 | ~1110 |
| Yi-1.5-34B (base) | 76.3 | N/A | N/A |
(Yi-34B-Chat MMLU 0-shot was reported at 67.6, and the 5-shot base figure at 73.5.) [1] Reported Chatbot Arena and AlpacaEval numbers reflect leaderboard snapshots from late 2023 to early 2024 and change as new models are added.
The original Yi models were not released under a standard open-source license. They shipped under the "Yi Series Models Community License Agreement" (version 2.1, dated 23 November 2023), which permitted research use but required a free application or registration for commercial use. [5][11] With the Yi-1.5 release in May 2024, 01.AI moved the code and weights to the Apache 2.0 license, removing the commercial-registration requirement and making the weights free for personal, academic, and commercial use. [11][13] The Apache 2.0 terms were applied to the Yi-1.5 family and to subsequent open releases such as Yi-Coder. [14]
Alongside the open Yi weights, 01.AI built closed models served only through an API. Yi-Large is a large proprietary model that 01.AI priced aggressively against Western competitors, at about 20 RMB (roughly 2.70 US dollars) per million tokens, well below GPT-4 Turbo. [4] Yi-Lightning, released in October 2024, is a faster and cheaper model that 01.AI reported reached the top tier of the LMSYS Chatbot Arena blind-test rankings, with inference priced around 14 US cents per million tokens. [4] These models are covered in the Yi-Large and Yi-Lightning articles.
In 2025, reporting indicated that 01.AI scaled back its own large-scale pretraining and shifted toward selling tailored business solutions, in part building on third-party open models, after the rise of low-cost Chinese competitors. [3] The open Yi weights nonetheless remained widely used and frequently fine-tuned within the open-model community.