Yi (language model)

Chinese AI Large Language Models Open Source AI

11 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

18 citations

Revision

v2 · 2,146 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Yi is a series of open, bilingual (English and Chinese) large language models developed by the Chinese startup 01.AI (Chinese: 零一万物, Lingyiwanwu), the company founded in March 2023 by Kai-Fu Lee. The first models, Yi-6B and Yi-34B, were released on 2 November 2023 as base models trained from scratch on 3.1 trillion tokens, and Yi-34B briefly ranked first among open pretrained base models on Hugging Face's Open LLM Leaderboard, ahead of larger systems such as Llama 2 70B and Falcon-180B. ^[1]^[4]^[5] The Yi-34B architecture reuses the Llama transformer design (with credited attribution in 01.AI's model card), and over the following year 01.AI extended the family with chat models, 200,000-token long-context variants, the depth-upscaled Yi-9B, the multimodal Yi-VL, the Apache-2.0-licensed Yi-1.5 refresh, and the code-specialized Yi-Coder. ^[1]^[11] The accompanying technical report, "Yi: Open Foundation Models by 01.AI", was posted to arXiv in March 2024 as arXiv:2403.04652. ^[1]

The company's later closed API models, Yi-Large and Yi-Lightning, are covered in separate articles and are only summarized here.

What is Yi and who developed it?

Yi is the flagship open model line of 01.AI, a Beijing-based large language model startup founded in March 2023 by Kai-Fu Lee, a former president of Google China and head of Microsoft Research Asia who also co-founded the venture firm Sinovation Ventures. ^[2]^[3] Within roughly eight months of its founding the company reached a valuation above one billion US dollars, reaching unicorn status, with backing that later included Alibaba and Xiaomi. ^[3]^[4] Lee positioned 01.AI as China's answer to leading US labs, framing the open release as a way to "give back" and arguing that demand for a homegrown model was acute: "necessity is the mother of innovation, and there's clearly a huge necessity in China," since China lacked access to OpenAI and Google products. ^[2]

The name "Yi" is written 一 in Chinese (the numeral "one"), tying the model line to the company's "zero-one" branding.

When was Yi released? Yi-6B and Yi-34B

01.AI open-sourced the Yi-6B and Yi-34B base models on 2 November 2023, with a default context window of 4,096 tokens. ^[5] Both were pretrained on a deduplicated, quality-filtered bilingual corpus of 3.1 trillion tokens of English and Chinese text, with data cleaning toward higher quality through a multi-stage filtering pipeline rather than raw scale. ^[1] Three days later, on 5 November 2023, the long-context variants Yi-6B-200K and Yi-34B-200K were released, extending the usable context to 200,000 tokens. ^[5]^[6] Instruction-tuned chat models (Yi-6B-Chat and Yi-34B-Chat), plus 4-bit (AWQ) and 8-bit (GPTQ) quantized versions, followed on 23 November 2023. ^[5]

Yi-34B drew immediate attention because, on the day it appeared, Hugging Face's Open LLM Leaderboard placed it first among open pretrained base models, ahead of larger systems such as Llama 2 70B and Falcon-180B, in both English and Chinese evaluations. ^[4]^[5] The 34B model is small enough to run inference on a single high-memory GPU after quantization, which contributed to its adoption.

The release lineup is summarized below.

Model	Parameters	Context	Released	Notes
Yi-6B / Yi-34B	6B, 34B	4K	2 Nov 2023	Base models
Yi-6B-200K / Yi-34B-200K	6B, 34B	200K	5 Nov 2023	Long context
Yi-6B-Chat / Yi-34B-Chat	6B, 34B	4K	23 Nov 2023	Instruction-tuned
Yi-VL-6B / Yi-VL-34B	6B, 34B	4K	23 Jan 2024	Vision-language
Yi-9B	9B	4K	6 Mar 2024	Depth-upscaled
Yi-9B-200K	9B	200K	16 Mar 2024	Long context
Yi-1.5 (6B/9B/34B)	6B, 9B, 34B	4K to 32K	13 May 2024	Apache 2.0 refresh
Yi-Coder (1.5B/9B)	1.5B, 9B	128K	5 Sep 2024	Code models

What is the Yi architecture, and why did the Llama naming episode happen?

Yi uses the now-standard decoder-only transformer design popularized by Llama. According to the technical report, Yi sets a vocabulary size of 64,000, uses Grouped-Query Attention (GQA), Rotary Position Embedding (RoPE) with an adjusted base frequency, and a SwiGLU feed-forward block. ^[1] The two released base sizes have the following configuration:

Hyperparameter	Yi-6B	Yi-34B
Hidden size	4,096	7,168
Layers	32	60
Query heads	32	56
Key/value heads	4	8
Pretraining sequence length	4,096	4,096
Vocabulary	64,000	64,000

Shortly after the November 2023 release, developers noticed that Yi-34B's weights on Hugging Face were structurally identical to Llama 2 except that two tensors, input_layernorm and post_attention_layernorm, had been renamed, so that tooling built for Llama did not load Yi out of the box. ^[7]^[8] A community member opened a pull request asking 01.AI to revert the names for compatibility, and the episode was picked up by the press as a question of whether 01.AI had obscured the model's lineage. ^[9]

01.AI acknowledged the renaming. Richard Lin, the company's head of open-source work, said the team had renamed components during "extensive training experiments ... to meet experimental requirements," then added: "we kinda dropped the ball and didn't switch them back before pushing out our release ... We're sorry for the confusion." ^[9] (Press coverage characterized the incident as "an oversight.") The company said it had renamed the names to test the model thoroughly and had no intention of masking the source, and it committed to restoring Llama-compatible tensor names. ^[7]^[9]

Researchers were careful to separate two distinct claims. Sharing the Llama architecture is not the same as being a derivative of Llama: EleutherAI noted in a fact-check that essentially all modern LLMs are assembled from the same published building blocks (the 2017 transformer, RoPE, SwiGLU, pre-normalization, and multi-query or grouped-query attention), none of which originated with Meta, and that Yi did not use Llama's weights. ^[10] RoPE, for instance, was introduced by researchers at Zhuiyi Technology in 2021, and SwiGLU by Noam Shazeer at Google in 2020, so neither component is specific to Llama. ^[10] What actually differentiates such models is the training data, pipeline, and infrastructure, all of which 01.AI states it built independently. ^[1]^[10] 01.AI's own model card frames it the same way: "The Yi series models adopt the same model architecture as Llama but are NOT derivatives of Llama." ^[11]

How does Yi handle long context?

The 200K variants were produced by lightweight continual pretraining on longer sequences rather than by training from scratch at full length. ^[1] On the "needle in a haystack" retrieval test, which probes whether a model can recover a single planted fact from a long document, 01.AI reported that Yi-34B-200K reached about 99.8 percent retrieval accuracy. ^[6] A 200,000-token window corresponds to roughly 300,000 English words or several hundred thousand Chinese characters, which at the time was among the longest context lengths available in any openly released model.

What are the Yi-1.5, Yi-VL, and Yi-Coder variants?

Yi-VL, released on 23 January 2024 in 6B and 34B sizes, is the multimodal member of the family. It pairs a vision encoder initialized from CLIP ViT-H/14 with a two-layer multilayer perceptron projection and the Yi-Chat language model, trained in stages at 224 and 448 pixel resolutions. ^[1] 01.AI reported that Yi-VL-34B led open models on the MMMU and CMMMU multimodal benchmarks at release. ^[12]

Yi-9B (6 March 2024) was created by depth upscaling rather than training a 9B model directly: 01.AI duplicated the middle layers of Yi-6B (extending it from 32 to 48 layers) and continued pretraining on roughly 800 billion additional tokens, yielding a model that was strong for its size in code and mathematics. ^[1]^[5] A Yi-9B-200K long-context version followed on 16 March 2024. ^[5]

Yi-1.5, released on 13 May 2024 in 6B, 9B, and 34B sizes, continued pretraining the original Yi models on a further 500 billion tokens for a total corpus of 3.6 trillion tokens, then fine-tuned on a larger instruction set, improving coding, math, and reasoning. ^[11]^[13] Yi-1.5 base and chat models shipped with 4K context plus 16K and 32K extended variants, and this release moved the weights to the permissive Apache 2.0 license. ^[11]^[13]

Yi-Coder, released on 5 September 2024, is a code-specialized line in 1.5B and 9B sizes with a 128,000-token context window. The 9B version was built by continuing to train Yi-9B on 2.4 trillion tokens drawn from a GitHub code corpus and filtered web data spanning 52 programming languages. ^[14]^[15] 01.AI reported that Yi-Coder-9B-Chat scored 23.4 percent on LiveCodeBench, which it described as the only model under 10 billion parameters to pass 20 percent and as competitive with much larger code models such as DeepSeek-Coder-33B-Instruct. ^[14]^[15]

How does Yi perform on benchmarks?

The technical report and model cards report the following figures for the original Yi models. Yi-34B-Chat placed second on the AlpacaEval leaderboard behind GPT-4 Turbo, ahead of GPT-4, Mixtral, and Claude, based on data available in early 2024. ^[11]^[16]

Model	MMLU (5-shot)	AlpacaEval win-rate	Chatbot Arena Elo
Yi-34B (base)	73.5	N/A	N/A
Yi-34B-Chat	N/A	94.08	~1110
Yi-1.5-34B (base)	76.3	N/A	N/A

(Yi-34B-Chat MMLU 0-shot was reported at 67.6, and the 5-shot base figure at 73.5.) ^[1] Reported Chatbot Arena and AlpacaEval numbers reflect leaderboard snapshots from late 2023 to early 2024 and change as new models are added.

Is Yi open source? Licensing

The original Yi models were not released under a standard open-source license. They shipped under the "Yi Series Models Community License Agreement" (version 2.1, dated 23 November 2023), which permitted research use but required a free application or registration for commercial use. ^[5]^[11] With the Yi-1.5 release in May 2024, 01.AI moved the code and weights to the Apache 2.0 license, removing the commercial-registration requirement and making the weights free for personal, academic, and commercial use. ^[11]^[13] The Apache 2.0 terms were applied to the Yi-1.5 family and to subsequent open releases such as Yi-Coder. ^[14]

What are 01.AI's commercial models and 2025 strategy?

Alongside the open Yi weights, 01.AI built closed models served only through an API. Yi-Large is a large proprietary model that 01.AI priced aggressively against Western competitors, at about 20 RMB (roughly 2.70 US dollars) per million tokens, well below GPT-4 Turbo. ^[4] Yi-Lightning, released in October 2024, is a faster and cheaper model that 01.AI reported reached the top tier of the LMSYS Chatbot Arena blind-test rankings, with inference priced around 14 US cents per million tokens. ^[4] These models are covered in the Yi-Large and Yi-Lightning articles.

In the second half of 2024, 01.AI shifted its focus from frontier pretraining toward enterprise applications and smaller, industry-specific models, on the view that, as Kai-Fu Lee put it, only the largest technology companies can afford to train super-large models. ^[3]^[17] In mid-December 2024 the company reorganized: reporting indicated it wound down its dedicated pretraining algorithm and infrastructure teams, with members reportedly receiving offers from Alibaba's Tongyi and Alibaba Cloud, though Lee publicly denied that 01.AI had sold those teams to Alibaba. ^[17]^[18] Lee said 01.AI would prioritize AI applications in 2025, pointing to traction in sectors such as finance, energy, and gaming, and the company increasingly built solutions on third-party open models including DeepSeek. ^[17] The open Yi weights nonetheless remained widely used and frequently fine-tuned within the open-model community. ^[3]

References

Yi: Open Foundation Models by 01.AI, arXiv:2403.04652. https://arxiv.org/abs/2403.04652 ↩
Valued at $1B, Kai-Fu Lee's LLM startup unveils open source model, TechCrunch. https://techcrunch.com/2023/11/05/valued-at-1b-kai-fu-lees-llm-startup-unveils-open-source-model/ ↩
01.AI, Wikipedia. https://en.wikipedia.org/wiki/01.AI ↩
AI Pioneer Kai-Fu Lee Builds $1 Billion Startup in Eight Months, Bloomberg. https://www.bloomberg.com/news/articles/2023-11-05/kai-fu-lee-s-open-source-01-ai-bests-llama-2-according-to-hugging-face ↩
01-ai/Yi GitHub repository (release news and license history). https://github.com/01-ai/Yi ↩
01-ai/Yi-34B-200K model card, Hugging Face. https://huggingface.co/01-ai/Yi-34B-200K ↩
01-ai/Yi-34B llama-compatibility discussion, Hugging Face. https://huggingface.co/01-ai/Yi-34B/discussions/11 ↩
Kai-Fu Lee's Yi-34B uses exactly Llama's architecture except for 2 tensors renamed, Hacker News. https://news.ycombinator.com/item?id=38258015 ↩
Chinese tech unicorn 01.AI admits 'oversight' in changing name of AI model built on Meta's Llama system, South China Morning Post. https://www.scmp.com/tech/tech-trends/article/3241680/chinese-tech-unicorn-01ai-admits-oversight-changing-name-ai-model-built-meta-platforms-llama-system ↩
Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times, EleutherAI Blog. https://blog.eleuther.ai/nyt-yi-34b-response/ ↩
01-ai/Yi-34B model card, Hugging Face. https://huggingface.co/01-ai/Yi-34B ↩
01-ai/Yi-VL-34B model card, Hugging Face. https://huggingface.co/01-ai/Yi-VL-34B ↩
01-ai/Yi-1.5-34B model card, Hugging Face. https://huggingface.co/01-ai/Yi-1.5-34B ↩
Yi-Coder Released by 01.AI: A Powerful Small-Scale Code LLM Series, MarkTechPost. https://www.marktechpost.com/2024/09/05/yi-coder-released-by-01-ai-a-powerful-small-scale-code-llm-series-delivering-exceptional-performance-in-code-generation-editing-and-long-context-comprehension/ ↩
Meet Yi-Coder: A Small but Mighty LLM for Code, Hugging Face Blog. https://huggingface.co/blog/lorinma/yi-coder ↩
01-ai/Yi-34B-Chat model card, Hugging Face. https://huggingface.co/01-ai/Yi-34B-Chat ↩
Kai-Fu Lee sets the record straight on 01.AI's pivot, KrASIA. https://kr-asia.com/kai-fu-lee-sets-the-record-straight-on-01-ais-pivot ↩
Chinese start-up 01.AI founder Lee Kai-fu denies rumours of asset sale to Alibaba Cloud, South China Morning Post. https://www.scmp.com/tech/tech-trends/article/3293791/chinese-start-01ai-founder-lee-kai-fu-denies-rumours-asset-sale-alibaba ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

InfiniteBench Kai-Fu Lee ModelScope OlympiadBench Yi-Large Yi-Lightning

What is Yi and who developed it?

When was Yi released? Yi-6B and Yi-34B

What is the Yi architecture, and why did the Llama naming episode happen?

How does Yi handle long context?

What are the Yi-1.5, Yi-VL, and Yi-Coder variants?

How does Yi perform on benchmarks?

Is Yi open source? Licensing

What are 01.AI's commercial models and 2025 strategy?

See also

References

Improve this article

Related Articles

Qwen

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here

Related Articles

Qwen

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here