ERNIE 5.0
Last reviewed
Jun 3, 2026
Sources
18 citations
Review status
Source-backed
Revision
v2 ยท 2,492 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
18 citations
Review status
Source-backed
Revision
v2 ยท 2,492 words
Add missing citations, update stale details, or suggest a clearer explanation.
ERNIE 5.0 is a natively omni-modal foundation model from Baidu, unveiled at the company's annual Baidu World 2025 conference in Beijing on 13 November 2025 as the flagship in the ERNIE line at launch. [1][2] Baidu describes it as a single model that jointly handles text, images, audio, and video, taking those modalities as input and producing them as output, rather than stitching separate text, vision, and speech systems together. [1] At launch the model was offered as a public preview through Baidu's ERNIE Bot assistant and to enterprise users through Baidu AI Cloud's Qianfan platform. [1][3] In May 2026 Baidu released a successor, ERNIE 5.1, a smaller and cheaper text model that took over as the company's headline system on public leaderboards. [14][15]
The release sits inside a broader push that Baidu staged at the same event, where chief executive Robin Li also introduced new Kunlun AI accelerators and a slate of agent and application products. [2][4] ERNIE 5.0 is the part of that announcement aimed squarely at the frontier of large models, and Baidu positioned it as evidence that its in-house systems can compete with the strongest models from American and other Chinese labs. [5][6]
Baidu World 2025 ran under the theme "AI in Action," and ERNIE 5.0 was the headline model. [4] In Baidu's own description, the model is built on what it calls natively unified omni-modal modeling technology, meaning the network learns text, images, audio, and video together from the start instead of adding perception or speech onto a text-first core. [1] The company lists the model's strengths as multimodal understanding and generation, instruction following, creative writing, factual reasoning, agentic planning, and tool use. [1]
Robin Li framed the launch around a view of where value in AI is moving. He argued that applications, not the underlying models, create most of the economic value, and he used the keynote to push Baidu's agent and application layer alongside the model itself. [2] One line from the event that Baidu repeated in its materials was that "when you internalize AI, it becomes a native capability and transforms intelligence from a cost into a source of productivity." [1]
The model arrived through several preview iterations rather than a single hard launch. Baidu first put an ERNIE 5.0 preview into public testing in early November 2025, ahead of the conference, and then continued to update the version through late 2025 and into early 2026. [7] That staged rollout is worth keeping in mind when reading performance claims, since the version that posted Baidu's strongest leaderboard results came later than the November unveiling.
This is where care matters most. The official Baidu press release announcing ERNIE 5.0 does not state a parameter count. [1] The widely cited figure of 2.4 trillion parameters comes from news reporting around the event, including the South China Morning Post, Bloomberg, and others, rather than from a Baidu technical report. [5][6][8] As of that reporting, Baidu had not published model weights or a detailed technical paper for ERNIE 5.0, so the architecture details below should be read as reported and attributed, not as specifications Baidu has formally documented. [8]
According to that reporting, ERNIE 5.0 uses an ultra-large mixture-of-experts design with highly sparse activation. The South China Morning Post wrote that the model activates less than 3 percent of its parameters for any given query, a routing approach meant to keep the cost of running such a large model down without giving up quality. [5] A mixture-of-experts layout routes each token through a small subset of specialized sub-networks, so the total parameter count can be very large while the compute actually spent on each token stays much smaller. That pattern is the same one Baidu used in the earlier ERNIE 4.5 family, and it is common across recent large language models. [9]
The table below separates what Baidu has stated from what is only reported.
| Item | Status | Detail |
|---|---|---|
| Omni-modal input and output | Stated by Baidu | Text, image, audio, video, jointly modeled [1] |
| Listed capabilities | Stated by Baidu | Multimodal understanding and generation, instruction following, creative writing, factual reasoning, agentic planning, tool use [1] |
| 2.4 trillion parameters | Reported, not in official release | Cited by SCMP, Bloomberg, and others [5][6] |
| Mixture-of-experts, sparse activation | Reported | Less than 3 percent of parameters active per query, per SCMP [5] |
| Model weights and technical report | Not published as of reporting | No open weights or detailed paper for 5.0 [8] |
Baidu made ERNIE 5.0 reachable in two ways at launch. Consumers could try the preview inside ERNIE Bot at ernie.baidu.com, and enterprise users and developers could access it through Baidu AI Cloud's Qianfan model-as-a-service platform. [1][3] This mirrors how Baidu has shipped earlier ERNIE models, where the consumer assistant and the Qianfan API are the two front doors.
Pricing for the model has been reported in secondary sources rather than confirmed in Baidu's launch release, so it should be treated as indicative. Reporting put ERNIE 5.0 on the Qianfan API at roughly $0.60 per million input tokens and about $2.10 per million output tokens, with later versions in the ERNIE 5 series listed at similar levels and with volume discounts for enterprise commitments. [10] Unlike ERNIE 4.5, which Baidu released as open weights, ERNIE 5.0 has been offered as a hosted service only, with no public weights at the time of writing. [8][9]
ERNIE 5.0 is the direct successor to ERNIE 4.5, and the two models took different paths to the public. Baidu open-sourced the ERNIE 4.5 family at the end of June 2025, releasing a range of models under the Apache 2.0 license, with the largest a mixture-of-experts model of roughly 424 billion total parameters and a heterogeneous design that shared some parameters across modalities while keeping others modality-specific. [9][11] That open release made ERNIE 4.5 available on Hugging Face, GitHub, and Baidu's own PaddlePaddle ecosystem for both research and commercial use. [9]
ERNIE 5.0 reverses that openness, at least so far. It is a closed, hosted flagship rather than a downloadable model, and the reported jump in scale, from hundreds of billions of parameters in 4.5 to a reported trillions-scale figure in 5.0, tracks the broader industry pattern of reserving the largest models for paid API and assistant access. The deeper change Baidu emphasizes is the move to native omni-modality. Where ERNIE 4.5 already mixed text and vision, Baidu presents ERNIE 5.0 as a model designed from the ground up to take in and generate across text, image, audio, and video together. [1]
The ERNIE name, short for Enhanced Representation through Knowledge Integration, has been Baidu's main model brand since 2019, evolving from a multimodal and knowledge-enhanced language model into the chatbot known in China as Ernie Bot or Wenxin Yiyan. [9] ERNIE 5.0 is the latest generation of that line, and Baidu treated it as its flagship until ERNIE 5.1 followed in May 2026. [14]
ERNIE 5.0's most concrete public result came from the LMArena leaderboard, a community ranking based on head-to-head user preference votes. Baidu reported that a version labeled ERNIE-5.0-0110 scored 1,460 points on the text leaderboard around 15 January 2026, placing it No. 8 globally and No. 1 among Chinese models. [12] In Baidu's account the model outperformed several leading systems on that board, including OpenAI's GPT-5.1-High and Google's Gemini-2.5-Pro, and it ranked second worldwide on the math category, behind only GPT-5.2-High. [5][12] An earlier preview, ERNIE-5.0-Preview-1022, had already reached a high position on the same leaderboard during November testing, which is what drew attention to the model before its scores settled. [7]
These are leaderboard standings from a specific snapshot, and rankings on LMArena shift as new models are added and more votes come in, so the No. 8 figure describes a moment rather than a fixed standing. The result still mattered as the first time a Baidu model cracked the global top ten on that board. [12]
The launch landed in the middle of an intense competition among Chinese AI developers, where Baidu is one of several large players alongside DeepSeek, Alibaba's Qwen, and Moonshot AI, among others. [6] Baidu also used Baidu World 2025 to address the hardware side of that race. The company laid out a roadmap for its Kunlun accelerators, with the M100 chip aimed at large-scale inference and slated for early 2026 and the M300 aimed at ultra-large multimodal training and inference and planned for early 2027. [4][13] It paired those with new supernode systems it calls Tianchi, a 256-chip Tianchi 256 targeted at the first half of 2026 and a 512-chip Tianchi 512 for the second half. [13] The thrust of the message was self-reliance, with Baidu pitching a full stack of domestic chips and models at a time when access to American accelerators is constrained by export controls. [4]
Adoption of Baidu's consumer assistant gave the launch added weight. Around the time of the official ERNIE 5.0 release in early 2026, reporting put the Ernie assistant at roughly 200 million monthly active users, a figure Baidu and outlets such as the South China Morning Post cited as the model rolled out more widely. [5] Whether ERNIE 5.0 holds its leaderboard standing as rivals release new models remains to be seen, and the lack of an open technical report means independent verification of its reported scale and internals is still limited. [8]
Baidu released ERNIE 5.1 in early May 2026, roughly six months after the 5.0 unveiling, and made it the company's new headline model on public leaderboards. [14][15] The release followed the same staged pattern as 5.0: Baidu put an ERNIE-5.1-Preview onto LMArena around 30 April 2026, then shipped the fuller ERNIE-5.1-0508 version about a week later. [16][17] Chinese coverage on 9 May 2026 reported the launch, and the official ERNIE blog labels the headline build 0508, consistent with a release on or around 8 May. [14][15][17] Like 5.0, the model was offered as a hosted service through Baidu AI Cloud's Qianfan platform and the consumer site at ernie.baidu.com, with no public weights. [17][18]
The headline of the 5.1 release is efficiency rather than raw scale. Baidu says it compressed the model's total parameters to roughly one third of ERNIE 5.0's while cutting the active parameters per token to about one half, so the newer model is both smaller overall and lighter to run for each query. [14][16][17] Against ERNIE 5.0's reported 2.4 trillion parameters that one-third figure implies a model in the high hundreds of billions, though Baidu did not publish an exact count. [14][16] The company also makes a striking cost claim, stating that ERNIE 5.1's pre-training cost was only about 6 percent of that of comparable models, which it attributes in part to training methods it describes as disaggregated, fully asynchronous reinforcement learning and an architecture that is elastic in depth, width, and sparsity. [14][15][18]
One point worth flagging is modality. Some early coverage framed 5.1 as carrying over ERNIE 5.0's omni-modal abilities, but Baidu's own release for the 0508 version, along with model-reference listings and most reporting, describes ERNIE 5.1 as a text model, advising against using it for vision or document-understanding work. [17][18] In other words, ERNIE 5.0 remains Baidu's omni-modal flagship, while 5.1 is the text specialist that posts the leaderboard numbers. The model uses a sparse mixture-of-experts design, supports tool use and function calling, and is listed with a 128,000-token context window and a maximum output of 65,536 tokens. [18] On the Qianfan API, ERNIE 5.1 has been listed at about $0.59 per million input tokens and $2.65 per million output tokens, close to 5.0's level, although Baidu did not state pricing in the launch post itself. [17][18]
The result that drew the most attention came from LMArena's Search Arena, the leaderboard that ranks models on search-augmented question answering. Baidu reported that ERNIE 5.1 reached No. 4 globally on the Search Arena with a score of 1,223, and that it was the only Chinese model in that top group, which several outlets described as the first time a Chinese large language model placed in the global top five for search. [14][15][17] On the broader Text Arena the picture was more middling: the preview build had ranked around No. 13 worldwide while still leading Chinese models, so the No. 4 figure is specific to the search task rather than to overall chat quality. [16][17] Baidu also pointed to strong benchmark scores, including 99.6 on the AIME26 math test with tool use, which it placed second only to Google's Gemini 3.1 Pro, and competitive results on GPQA and MMLU-Pro. [14][17] As with the 5.0 numbers, these are vendor-reported standings on a board whose rankings move over time, and Baidu has not released a technical report for 5.1, so the scale and cost claims remain attributed rather than independently confirmed. [17][18]