Yi-Lightning
Last reviewed
May 16, 2026
Sources
18 citations
Review status
Source-backed
Revision
v1 · 3,882 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
18 citations
Review status
Source-backed
Revision
v1 · 3,882 words
Add missing citations, update stale details, or suggest a clearer explanation.
Yi-Lightning is a closed-source large language model developed by Chinese artificial intelligence company 01.AI (零一万物, Língyī Wànwù), the company founded by Kai-Fu Lee. Released on October 16, 2024, Yi-Lightning is built on a Mixture-of-Experts architecture and was positioned by 01.AI as a faster, cheaper successor to its earlier dense flagship Yi-Large. On its debut, the model placed 6th overall on the Chatbot Arena operated by LMSYS, briefly trailing only a handful of frontier systems from OpenAI, Anthropic and Google DeepMind while sitting at the top of the Chinese-developed cohort.
The model entered the market with an API price of 0.99 RMB (about $0.14 at the time) per million tokens for both input and output, a rate that 01.AI used to argue that frontier-class capability had become a commodity for enterprise buyers in China. Kai-Fu Lee described the launch as evidence that the lag between the best Chinese closed APIs and the most recent OpenAI checkpoints had narrowed to roughly five months. Within weeks of the release, Yi-Lightning had become the model 01.AI promoted most heavily for enterprise deployments, while the company also published a 25-page technical report on arXiv (2412.01253) documenting its architecture, training pipeline and safety stack.
Yi-Lightning marked both a peak and a turning point for 01.AI. It was the company's last frontier-class pre-training effort to date. By the end of December 2024, reporting in the Chinese technology press indicated that 01.AI had reassigned much of its pre-training group, and in January 2025 Kai-Fu Lee publicly framed the company's future around enterprise applications rather than continued large-scale pre-training. By March 2025, 01.AI had stopped pre-training its own foundation models and reoriented toward business solutions built on third-party open weights, leaving Yi-Lightning as the final flagship in the original Yi roadmap.
This article focuses on Yi-Lightning itself. Sister articles cover the parent company in 01.AI, the earlier dense flagship in Yi-Large, the open-source family in Yi, and the model's leaderboard context in Chatbot Arena.
01.AI was incorporated in May 2023 by Kai-Fu Lee, a former president of Google China and the founder of Sinovation Ventures. Lee assembled the founding team during the months that followed ChatGPT's public release, framing large models as a generational opportunity for Chinese industry. By November 2023, the company had reached unicorn status with a valuation above $1 billion after a funding round backed by Alibaba Cloud, Tencent, Xiaomi and Sinovation Ventures.
From the start, 01.AI operated two parallel tracks. The open track produced the Yi-6B and Yi-34B base models released in November 2023 under Apache 2.0, followed by Yi-1.5, Yi-VL and Yi-Coder over the next year. The closed track was held back until May 13, 2024, when the company unveiled the proprietary Yi-Large API along with smaller siblings Yi-Large-Turbo, Yi-Medium, Yi-Medium-200K, Yi-Vision and Yi-Spark. At the May 2024 launch, 01.AI also confirmed that a larger Mixture-of-Experts model code-named Yi-XLarge was already in training.
Yi-Lightning emerged from that MoE program. Where Yi-Large had been a dense transformer trained at "tens of billions" of parameters, Yi-Lightning was a sparse model designed to deliver higher capability per active parameter at significantly lower inference cost. By the time the model launched, 01.AI was operating in a market where rival Chinese labs (Alibaba, Baidu, ByteDance, DeepSeek, Tencent, Moonshot AI) had cut their per-token prices by an order of magnitude over the summer, and the bar for a competitive flagship had moved.
01.AI announced Yi-Lightning on October 16, 2024 at the inaugural Yi Open Day held in Beijing. Kai-Fu Lee presented the model himself, focusing on three claims: that Yi-Lightning had ranked 6th globally and first among Chinese models on LMSYS Chatbot Arena, that the API was priced at 0.99 RMB per million tokens with 01.AI still making a positive margin, and that the company had narrowed the lag behind the latest OpenAI checkpoints to roughly five months. The launch was accompanied by the release of Yi-Lightning's smaller sibling Yi-Lightning-Lite for lower-cost workloads and the rollout of an "AI 2.0" digital human product built on top of the new flagship.
The Chinese trade press treated the announcement as a significant moment in the country's 2024 model race. Coverage by Pandaily, TMTPost and 1AI all noted that the Arena ranking was the first time a Chinese-developed closed model had cleared a particular GPT-4o checkpoint on the public leaderboard, while pricing coverage focused on the contrast between 0.99 RMB and the dollar-denominated rates that OpenAI charged for GPT-4o class models in the same window.
Yi-Lightning's architecture was documented in the "Yi-Lightning Technical Report" posted to arXiv on December 2, 2024 (paper 2412.01253). Unlike Yi-Large, for which 01.AI never published a full technical paper, Yi-Lightning's design was set out in some detail.
Yi-Lightning uses what the report calls an "enhanced" Mixture-of-Experts design. Each transformer block replaces the dense feed-forward network with a routed set of expert sub-networks, and each token activates only a small number of experts during the forward pass. The result is a model that has substantially more parameters in storage than it actually computes per token, which keeps the per-token compute (and the per-token cost) low even when total parameter count is high.
The specific innovations claimed by the 01.AI team include fine-grained expert segmentation, in which each expert FFN is partitioned into smaller specialized functional units so that several experts can contribute to the same token, and a three-tier load balancing system. The three tiers (Switch-Transformer, Expert Parallel and Partitioned Expert Parallel) each contribute auxiliary losses with coefficients of 10⁻⁶, 10⁻⁴ and 10⁻³ respectively. The report frames the segmentation choice as a tradeoff between parameter utilization and training throughput, and argues that the smaller expert granularity improves capability without proportionally inflating training cost.
The report does not disclose the total parameter count, the number of activated parameters per token, or the number of experts per layer. 01.AI representatives have indicated that this opacity is deliberate; the company has consistently declined to share parameter counts for any of its closed API models, beginning with Yi-Large in May 2024 and continuing with Yi-Lightning.
Yi-Lightning interleaves three sliding-window attention layers with one full attention layer in a repeating pattern, a hybrid arrangement the report justifies as a way to keep most of the model's attention computation local while still passing global context through the full-attention layers at regular intervals. The sliding-window layers use a 4,096-token window, while the full-attention layers operate on the entire active context.
The report also describes a cross-layer key-value cache sharing scheme that reduces inference-time memory. According to the paper, this design plus the sliding-window pattern together cut KV cache memory by up to 82.8% relative to a comparable dense model with full attention at every layer. The reduction was framed by 01.AI as the main lever that made the 0.99 RMB per million token price economical at scale.
The model supports a context window of up to 64,000 tokens. A long-context extension phase of 20 billion tokens during training was used to stretch the effective context from a shorter pre-training default to the 64K target.
Yi-Lightning uses a tokenizer with a vocabulary of 100,352 tokens, with digits decomposed into individual numerals to help arithmetic and code generation. The tokenizer carries forward the bilingual Chinese-English emphasis of the broader Yi family.
The Yi-Lightning report describes a multi-stage training pipeline consisting of pre-training, two stages of supervised fine-tuning, and a reinforcement learning post-training phase.
01.AI does not publish a single headline pre-training token count for Yi-Lightning, but the report identifies several distinct phases. The bulk of pre-training is described as a multilingual mixture of web documents, books, academic papers, code repositories from GitHub, and question-answer pairs, with a deliberate emphasis on mathematics and programming content sourced through an iterative classification pipeline. The math and code emphasis was the company's response to the strong showing of competitor models like DeepSeek-V2 and Qwen 2 on reasoning-heavy evaluations earlier in 2024.
A long-context extension phase of 20 billion tokens was run separately to extend the effective context from the pre-training default to the 64K maximum advertised at API launch. The extension phase uses curated long documents and synthetic long-context tasks.
The report breaks supervised fine-tuning into two stages. Stage one uses approximately 1.3 million examples covering broad instruction-following, dialogue and chain-of-thought reasoning. Stage two uses around 300,000 higher-quality examples concentrated on harder reasoning, multi-turn dialogue and edge cases. Synthetic training data for the math and coding subsets is generated by Monte Carlo Tree Search and depth-first-search procedures over candidate solutions.
For preference learning, the report describes two iterations of an online Direct Preference Optimization (online DPO) loop, with 16 candidate responses generated per prompt and re-ranked by a reward model. The 01.AI team frames the online DPO setup as a cost-efficient substitute for full reinforcement learning from human feedback, although the report also discusses a parallel reinforcement learning track used for selected high-value domains.
Yi-Lightning incorporates a safety framework that 01.AI calls RAISE (Responsible AI Safety Engine). The report describes RAISE as a four-component system covering pre-training data filtering, supervised fine-tuning targets, runtime guardrails and post-serving monitoring. The framework was developed in part to comply with Chinese regulatory requirements for generative AI services, which require providers to register models with the Cyberspace Administration of China and to filter outputs for politically sensitive content.
Reporting at the time of launch placed Yi-Lightning's training cost at approximately $3 million on a cluster of around 2,000 GPUs, a figure Kai-Fu Lee cited in interviews to contrast with the much larger training budgets attributed to GPT-4 and other Western frontier models. The exact GPU type used for Yi-Lightning was not officially disclosed; press coverage variously described the cluster as H100-equivalent or H800 hardware (the export-control-compliant H100 variant sold in China). The $3 million figure represents marginal training-run cost and does not include data acquisition, salaries or the cost of prior models in the Yi family.
The Yi-Lightning technical report contains a substantial benchmark section against contemporaneous frontier models. The figures below are drawn directly from the December 2024 paper. Most evaluations were run zero-shot or in the few-shot configuration standard for the named benchmark.
| Benchmark | Yi-Lightning score | Notes |
|---|---|---|
| MMLU | 81.5 | Standard 5-shot multitask language understanding |
| MATH (competition math) | 76.4% | 0-shot, accuracy |
| HumanEval (Python code) | 83.5% | 0-shot, pass@1 |
| MBPP (Python code) | 76.0% | 0-shot, pass@1 |
| GSM8K (grade-school math) | 92.7% | 0-shot |
| GPQA (graduate-level QA) | 50.9% | 0-shot |
| Arena-Hard v0.1 | 91.8 | LLM-as-judge, win rate against GPT-4-0314 |
| AlignBench v1.1 (Chinese alignment) | 7.54 | Out of 10 |
| MT-Bench | 8.75 | Out of 10, GPT-4 judge |
| IFEval (instruction following) | 81.9 | Strict-prompt accuracy |
The report's headline takeaway is that Yi-Lightning performs at or above the level of the GPT-4o-2024-05-13 checkpoint on most knowledge and reasoning benchmarks while trailing it on coding (HumanEval), graduate-level science (GPQA) and strict instruction following (IFEval). On mathematics (MATH) the model edges ahead of the same GPT-4o checkpoint by 0.4 points. The Chinese-language category, captured by AlignBench and the Chinese subset of Arena prompts, is where Yi-Lightning shows its largest advantage relative to Western peers.
The report cautions that the benchmark comparisons are against the GPT-4o-2024-05-13 release, which was several months old by the time Yi-Lightning launched. Newer GPT-4o checkpoints and o1-class reasoning models had appeared in the interim, and the 01.AI team explicitly acknowledged that Yi-Lightning would not be expected to match those newer checkpoints on tasks that rewarded long-form reasoning traces.
The public leaderboard result that 01.AI emphasized most heavily at launch came from the LMSYS Chatbot Arena, an open evaluation in which human users submit prompts to two anonymized models and vote for the preferred answer. The accumulated votes are converted to an Elo-style rating that produces a continuously updated global ranking.
When Yi-Lightning was first surfaced on the public leaderboard in mid-October 2024, it entered at an Arena rating of approximately 1287 and a global ranking of 6th overall. The five models ahead of it at that moment were checkpoints from OpenAI, Anthropic and Google. Crucially for 01.AI's marketing, the GPT-4o-2024-05-13 checkpoint was ranked 7th, behind Yi-Lightning. This was the first time a Chinese-developed closed model had cleared a GPT-4o checkpoint on the Arena leaderboard, and the 01.AI team used the result heavily in launch materials.
The subcategory rankings were also strong. The technical report lists Yi-Lightning as 2nd globally on the Chinese-language category, 3rd on Math and Multi-Turn, and 4th on Coding and Hard Prompts. A subsequent ranking published by UC Berkeley's Sky Computing Lab in late October 2024 placed 01.AI joint third among LLM companies by Arena scores, alongside xAI's Grok-2 and behind OpenAI and Google.
The 6th place ranking did not last. As newer Anthropic, OpenAI and Google releases hit the leaderboard through November and December 2024, Yi-Lightning drifted down the table. By early 2025 the model was generally placed in the upper-tier middle of the global leaderboard rather than the top ten, although it remained near the top among models with comparable API pricing.
Yi-Lightning was distributed through 01.AI's developer platform at platform.lingyiwanwu.com using an OpenAI-compatible chat-completions interface. Customers signed up for an API key, charged a wallet with RMB credit and called the model directly.
The launch-day API rate was 0.99 RMB per million tokens, applied symmetrically to input and output. At the prevailing exchange rate in October 2024 this worked out to roughly $0.14 per million tokens, a figure 01.AI used in English-language coverage of the launch.
Kai-Fu Lee said during the launch presentation that the 0.99 RMB rate still produced a positive gross margin for 01.AI on a per-call basis, attributing the headroom to the MoE design and the cross-layer KV cache sharing. The company also released Yi-Lightning-Lite, a smaller and faster sibling, at the same per-token list price.
The table below summarizes the pricing tiers active on platform.lingyiwanwu.com after the Yi-Lightning launch.
| API model | Target workload | Launch price (RMB per 1M tokens) | Approximate USD-equivalent |
|---|---|---|---|
| Yi-Lightning | High-capability MoE flagship | 0.99 | About $0.14 |
| Yi-Lightning-Lite | Lower latency, lighter weights | 0.99 | About $0.14 |
| Yi-Large | Earlier dense flagship | 20 | About $2.80 |
| Yi-Large-Turbo | Speed-optimized Yi-Large variant | 12 | About $1.70 |
| Yi-Medium | General-purpose mid-tier | 2.5 | About $0.35 |
| Yi-Medium-200K | Long-context mid-tier | 12 | About $1.70 |
| Yi-Vision | Image-text multimodal | 6 | About $0.85 |
| Yi-Spark | Lightweight low-cost workloads | 1 | About $0.14 |
The gap between the 0.99 RMB rate for Yi-Lightning and the 20 RMB launch rate for Yi-Large was the most consequential pricing change in 01.AI's history, and it explicitly framed Yi-Lightning as a replacement for Yi-Large rather than as a parallel option. Existing Yi-Large customers were free to keep using the older model, but the price differential, roughly twenty to one, made the migration argument straightforward.
In addition to the developer API, Yi-Lightning powered 01.AI's consumer chatbot Wanzhi (万知), the company's new "AI 2.0 Digital Human" product line introduced at the same October event, and a handful of enterprise pilots in finance, gaming and energy that 01.AI publicized in late 2024. Selected partners ran Yi-Lightning on dedicated private deployments rather than the shared multi-tenant API.
The table below contrasts Yi-Lightning with three contemporary cost-sensitive flagship models from peer labs that were widely tested against it in late 2024. Figures are drawn from each provider's official documentation and from the Yi-Lightning technical report; entries marked "not disclosed" reflect the source labs' published opacity on those metrics.
| Property | Yi-Lightning (Oct 2024) | GPT-4o mini (Jul 2024) | Claude 3 Haiku (Mar 2024) | DeepSeek V2.5 (Sep 2024) |
|---|---|---|---|---|
| Developer | 01.AI | OpenAI | Anthropic | DeepSeek |
| Availability | Closed, API only | Closed, API only | Closed, API only | Open weights plus API |
| Architecture | Mixture-of-Experts with hybrid attention | Not disclosed | Not disclosed | Mixture-of-Experts |
| Parameter count | Not disclosed | Not disclosed | Not disclosed | 236B total / 21B active (V2 base; V2.5 inherits) |
| Context window | 64K | 128K | 200K | 128K |
| API price, input (per 1M tokens, USD) | About $0.14 | $0.15 | $0.25 | $0.14 |
| API price, output (per 1M tokens, USD) | About $0.14 | $0.60 | $1.25 | $0.28 |
| LMSYS Arena rank at debut | 6th globally | 18th globally (initial appearance) | 22nd globally (initial appearance) | Top-10 open-weights at debut |
| Notable strength | Chinese-language tasks, Math, Arena-Hard | English knowledge, multimodal, function calling | Speed, refusal calibration, long context | Open weights, very strong code and math |
The comparison is best read as a snapshot of October to December 2024, since the relevant peer landscape shifted again within weeks: DeepSeek V3 launched at the end of December 2024, Qwen 2.5 had begun rolling out in parallel, and Anthropic announced Claude 3.5 Haiku in early November. Within that snapshot, Yi-Lightning's distinguishing claim was its combination of frontier-tier Arena performance and very low USD-equivalent pricing. The 01.AI team explicitly framed the model as offering "top-tier capability at bargain pricing" rather than as a pure speed or efficiency play.
On the cost side, the table understates the practical price advantage somewhat. GPT-4o mini and Claude 3 Haiku both charge significantly more for output tokens than input tokens, while Yi-Lightning's symmetric 0.99 RMB rate makes it especially cheap on workloads that generate long responses. DeepSeek V2.5, the closest peer on raw pricing, was at the time still less prominent on LMSYS Arena.
Reception to Yi-Lightning was generally positive in the Chinese technology press and cautious among Western analysts. Pandaily, TMTPost, 1AI and Caixin all gave the launch front-page treatment, focusing on the Arena ranking, the price and the claim that the Sino-American capability gap had closed to five months. The Pandaily piece reproduced Kai-Fu Lee's "top model at a bargain price" framing, and 1AI's coverage emphasized that the LMSYS result was the first time a Chinese closed model had outranked a GPT-4o checkpoint.
International coverage was thinner. The South China Morning Post and TechRadar both ran summaries that paired the Arena result with the reported $3 million training cost and the $0.14 per million token API rate, framing Yi-Lightning as evidence of how rapidly Chinese labs had closed the gap on cost. A widely shared post by Kai-Fu Lee on LinkedIn drew attention to the 6th place Arena ranking and the price; the post was accompanied by a screenshot of the leaderboard taken in mid-October 2024.
Researchers and developers focused on a different set of details. The arXiv technical report drew interest for the cross-layer KV cache sharing scheme and the fine-grained expert segmentation, both of which fit into a broader 2024 trend in MoE design also pursued by DeepSeek, Mixtral and the Qwen family. Independent reproduction was limited by the closed nature of the weights; outside the LMSYS Arena votes and the published benchmark numbers, the public record relies on 01.AI's own materials.
Reception evolved as 01.AI's strategic posture shifted. Through November and December 2024 the company continued to promote Yi-Lightning to enterprise customers, and the technical report was posted to arXiv on December 2, 2024. But in mid-December 2024 reports surfaced in the Chinese press that 01.AI had reassigned its pre-training algorithm and infrastructure teams. By late December, Tencent Tech and Caixin reported that members of the pre-training group had received offers from Alibaba's Tongyi (通义) division while infrastructure engineers had received offers from Alibaba Cloud. Kai-Fu Lee initially denied that 01.AI had sold its team or its assets, calling some of the reports "vicious slander."
In January 2025, 01.AI and Alibaba Cloud announced a joint laboratory focused on industrial AI applications. In a public letter explaining the move, Kai-Fu Lee wrote that "only tech giants can bear the costs of training super-large models" and that the company would refocus on tailored business solutions rather than continuing to chase frontier scale. By March 2025, 01.AI had effectively stopped pre-training its own foundation models, with enterprise products increasingly built on top of DeepSeek's open-source models rather than on Yi-Lightning. The company reported full-year 2024 revenue of more than 100 million yuan (roughly $13.7 million), with about 70% coming from domestic enterprise contracts and the remainder from consumer products.
As of mid-2026, Yi-Lightning remained available through platform.lingyiwanwu.com but had not been succeeded by a comparably ambitious flagship from 01.AI. Industry observers reading the company's history therefore tend to treat Yi-Lightning as both the technical high-water mark of 01.AI's pre-training era and the last public artifact of that era, with the post-2025 company functioning more as an applied AI integrator than as a foundation model lab.