Baidu ERNIE
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,963 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,963 words
Add missing citations, update stale details, or suggest a clearer explanation.
ERNIE (Enhanced Representation through Knowledge Integration) is a family of large language and multimodal models developed by the Chinese technology company Baidu and its research arm, beginning with a knowledge-aware BERT-style pretraining paper in April 2019 and expanding into the consumer chatbot Wenxin Yiyan (文心一言, marketed in English as ERNIE Bot) launched on March 16, 2023.[^1][^2] Across more than half a decade the lineage has progressed from a 100-million-parameter Chinese encoder to a 260-billion-parameter dense model (ERNIE 3.0 Titan), to native-multimodal mixture-of-experts systems with as many as 424 billion parameters in the open-source ERNIE 4.5 release of mid-2025.[^3][^4] ERNIE serves both as the brand for Baidu's foundation models and as the engine of the Qianfan (千帆) cloud platform that distributes them to enterprises.[^5] It is the canonical Baidu entry in the Chinese-language model field that also includes Alibaba's Tongyi Qianwen, Tencent's Hunyuan, ByteDance's Doubao, and DeepSeek.
| Field | Value |
|---|---|
| Developer | Baidu, Inc. (Baidu AI / Baidu Research) |
| Origin paper | Sun et al., "ERNIE: Enhanced Representation through Knowledge Integration", arXiv:1904.09223, April 19, 2019[^1] |
| First product release | ERNIE Bot / Wenxin Yiyan, invited beta March 16, 2023[^2] |
| Public release | August 31, 2023 (after Chinese regulatory approval)[^6] |
| Latest flagship cited | ERNIE 4.5, ERNIE X1 (announced March 16, 2025)[^7] |
| Open-source release | ERNIE 4.5 family on Apache 2.0, June 30, 2025[^4] |
| Cloud platform | Baidu Qianfan Foundation Model Platform[^5] |
| Underlying framework | Transformer architectures on Baidu PaddlePaddle[^8] |
The first ERNIE paper was posted to arXiv on April 19, 2019 by Yu Sun, Shuohuan Wang and colleagues at Baidu under the title "ERNIE: Enhanced Representation through Knowledge Integration".[^1] Its central idea was that the original BERT-style masked language modelling objective, which randomly masks individual subword tokens, fails to capture the multi-word units that carry knowledge in natural language.[^1] ERNIE 1.0 therefore introduced two additional masking strategies on top of word-level masking: phrase-level masking, in which whole conceptual phrases (for example, idioms or noun phrases) are masked together, and entity-level masking, in which named entities such as people, locations, and organizations identified by a named entity recognition tagger are masked as units.[^1] The model was pretrained on a Chinese corpus drawn from Baidu Baike, Baidu News, and Baidu Tieba, and reported new state-of-the-art results on five Chinese NLP tasks: natural language inference, semantic similarity, NER, sentiment analysis, and question answering.[^1] ERNIE 1.0 was released open-source through the company's deep-learning framework PaddlePaddle.[^8]
ERNIE 2.0, posted to arXiv on July 29, 2019 and accepted to AAAI 2020, generalized the knowledge-masking idea into a continual pretraining framework.[^9] Rather than a fixed set of objectives, ERNIE 2.0 incrementally adds new self-supervised tasks, training them jointly with previously learned tasks through what the authors call continual multi-task learning.[^9] The framework defines tasks at three levels of linguistic information: word-aware tasks (such as knowledge masking and capitalization prediction), structure-aware tasks (sentence reordering, sentence distance), and semantic-aware tasks (discourse relation, IR relevance).[^9] The published model reported gains over BERT and XLNet on 16 English and Chinese benchmarks, including GLUE.[^9] Baidu open-sourced both the framework and pretrained weights through the PaddlePaddle ERNIE repository.[^8]
On July 5, 2021 Baidu posted arXiv:2107.02137, "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation".[^3] ERNIE 3.0 was a 10-billion-parameter knowledge-enhanced model trained on a 4-terabyte corpus combining web text, Baidu Search sources, domain-specific data (medical, legal, financial), and Baidu's knowledge graph containing more than 50 million facts.[^3] Architecturally it unified an autoregressive and an autoencoding component, allowing the same backbone to be used for both natural-language understanding and natural-language generation tasks.[^3] An English variant of the model topped the SuperGLUE benchmark on July 3, 2021, with a reported score of 90.6 to 89.8 for the human baseline.[^3]
In December 2021 Baidu Research, jointly with the Peng Cheng Laboratory (PCL) in Shenzhen, released ERNIE 3.0 Titan (arXiv:2112.12731), scaling the architecture up to 260 billion parameters.[^10] Baidu described it as the largest Chinese dense pretrained language model at the time, claiming new state-of-the-art results on more than 68 Chinese NLP datasets.[^10] The Titan paper added two distinctive training innovations: a self-supervised adversarial loss intended to make the model distinguish generated from real text, and a controllable language modelling loss that conditioned generation on attributes such as genre or sentiment.[^10] To control the compute footprint of distillation, the paper also proposed an online distillation framework in which the teacher continued training while teaching students.[^10] The 260-billion-parameter checkpoint was contemporary with rivals such as the Yuan 1.0 model and Huawei's Pangu-α and HuaweiPangu-Sigma series.
On March 16, 2023 Baidu cofounder and chief executive Robin Li unveiled Wenxin Yiyan (文心一言), branded in English as ERNIE Bot, at a press event in Beijing.[^2] The product, positioned as the company's ChatGPT competitor, launched as an invited beta and was demonstrated through prerecorded video clips covering mathematical reasoning, marketing copy, Chinese literary trivia, and multimodal output, including text-to-image illustrations and audio in regional Chinese dialects.[^2] MIT Technology Review reported that Baidu's Hong Kong-listed shares fell 6.4 percent on the launch day, that 650 enterprise partners had pre-registered, and that over 30,000 companies had applied for API access by the close of the event.[^2] The Wenxin Yiyan brand sat above earlier ERNIE foundation-model versions; Baidu's public communications described ERNIE Bot as powered initially by the ERNIE 3.0 family and quickly upgraded to ERNIE 3.5 in June 2023.[^11]
ERNIE Bot was opened to the general public in mainland China on August 31, 2023, after Baidu became one of the first companies to receive approval under China's interim measures for generative AI services that took effect on August 15, 2023.[^6] Reuters and TechNode reported that this regulatory approval was a precondition for mass-market rollout in China, where consumer AI services require security assessments unlike in most Western jurisdictions.[^6] By December 2023 Baidu's published figures put ERNIE Bot users above 100 million; the company subsequently reported approximately 200 million users in April 2024 and 300 million by mid-2024.[^11]
On October 17, 2023, at the in-person Baidu World 2023 conference in Beijing, Baidu launched the ERNIE 4.0 foundation model.[^12] Baidu chief technology officer Haifeng Wang framed ERNIE 4.0 as substantially better than ERNIE 3.5 across four axes the company calls understanding, generation, reasoning, and memory, claiming a roughly 30 percent overall improvement during beta testing.[^12] The launch was tied to a re-platforming of Baidu's flagship consumer products, with Baidu Search, Baidu Wenku, Baidu Drive, Baidu Maps, the Infoflow workplace app, and the Baidu GBI business-intelligence product all being rebuilt around generative AI, and the Qianfan Foundation Model Platform offering 42 foundation models to enterprise customers.[^12] No parameter count was publicly disclosed.[^12]
On June 28, 2024 Baidu announced ERNIE 4.0 Turbo, presented as a faster, cheaper, higher-quality upgrade.[^13] Baidu disclosed that ERNIE 4.0 Turbo was priced at 30 RMB (about USD 4.13) per million input tokens and 60 RMB per million output tokens, and that price cuts of up to 83 percent had been applied to earlier ERNIE tiers to remain competitive.[^13] The same announcement was accompanied by PaddlePaddle 3.0, Baidu's deep-learning framework upgrade.[^13]
In May 2024 Baidu announced that the lightweight tier models ERNIE Speed and ERNIE Lite were free for API users through Qianfan, marking the company's most aggressive move in the Chinese model-pricing war.[^5] Baidu followed in early 2025 with an announcement that consumer ERNIE Bot would become free for individual users by April 1, 2025, a date later brought forward to coincide with the ERNIE 4.5 launch.[^14]
On March 16, 2025, exactly two years after the original Wenxin Yiyan press event, Baidu announced two models simultaneously: ERNIE 4.5, the new flagship multimodal foundation model, and ERNIE X1, the company's first deep-thinking reasoning model.[^7] ERNIE 4.5 was positioned as a native multimodal model trained jointly on text, images, audio, and video; ERNIE X1 was framed as Baidu's response to DeepSeek-R1, offering similar reasoning quality at roughly half the API cost.[^7][^15] At the same event Baidu announced that ERNIE Bot would become free for individual users ahead of its previously scheduled April 1 date.[^7]
At its Create 2025 developer conference in April 2025, Baidu announced ERNIE 4.5 Turbo and ERNIE X1 Turbo, again advertising aggressive price reductions of up to 80 percent versus the standard 4.5 tier.[^16] On June 30, 2025 Baidu released the ERNIE 4.5 model family as open-source under the Apache 2.0 licence through Hugging Face, GitHub and the PaddlePaddle ecosystem.[^4] The release comprised 10 models, including mixture-of-experts (MoE) variants with up to 424 billion total parameters and roughly 47 billion active parameters per token, alongside a 3-billion-active MoE and a 0.3-billion dense model.[^4]
At Baidu World 2025 on November 13, 2025 Baidu announced ERNIE Bot 5.0, described as a native multimodal model with up to 2.4 trillion parameters capable of jointly understanding and generating text, images, audio and video.[^5] As of the time of the announcement, public technical documentation focused on capability demonstrations rather than architecture papers, in keeping with the company's recent shift away from publishing model details for its flagship systems.
The original 2019 ERNIE paper kept the Transformer encoder architecture of BERT but replaced the random subword masking objective with a three-stage knowledge-integration masking schedule.[^1] In the first stage, basic word-level masking trains the model the way BERT does. In the second stage, phrase-level masking masks contiguous spans identified as phrases by chunking tools and lexicons. In the third stage, entity-level masking masks spans identified by a named entity recognizer.[^1] The intent is that to reconstruct a masked entity such as a country name or scientist's name, the model must learn about that entity's relationships from context rather than from surface co-occurrence.[^1]
ERNIE 2.0 retained the encoder architecture but introduced a sequence-of-tasks design.[^9] Each new pretraining task is added to a continually growing pool, and at every step the model is trained on a mix of all current tasks with their losses summed.[^9] Tasks are deliberately heterogeneous: knowledge masking (lexical), capitalization prediction (lexical), token-document relation prediction (lexical), sentence reordering and sentence distance (structural), and discourse and information-retrieval relevance (semantic).[^9] The continual setting is contrasted with the catastrophic forgetting that single-task continual pretraining produces.
ERNIE 3.0 added explicit support for natural language generation by stacking a top "task-specific" set of Transformer layers on a shared "universal representation" backbone.[^3] The universal backbone is bidirectional; the task-specific layers are unidirectional for generation and bidirectional for understanding.[^3] During pretraining the model is trained jointly on autoregressive next-token prediction and autoencoding masked-token prediction, and structured knowledge from a knowledge graph is integrated through "universal knowledge-text prediction", a task that masks tokens in triples drawn from Baidu's knowledge graph and asks the model to predict them given the surrounding free text, or vice versa.[^3] This is the core mechanism through which ERNIE distinguishes itself from purely text-pretrained models.
ERNIE 3.0 Titan grew the network to 260 billion parameters, trained on roughly 4 terabytes of Chinese text covering 11 source types.[^10] Two additional losses were used: a self-supervised adversarial loss in which the model is trained to discriminate between text it produced and text drawn from the training distribution, intended to reduce repetition and hallucination; and a controllable language modelling loss that conditions generation on attribute tokens such as genre, sentiment, length, or topic.[^10] The paper also proposed an online distillation pipeline that trains many smaller student models from the Titan teacher in parallel, sharing forward passes to reduce compute cost.[^10]
Baidu extended the family to vision and language with ERNIE-ViL, a vision-language pretraining model, and ERNIE-ViLG, an early text-to-image model.[^17] ERNIE-ViLG 2.0, posted to arXiv on October 27, 2022 and accepted to CVPR 2023, is a 24-billion-parameter diffusion model for Chinese text-to-image generation.[^17] It departs from a single Stable Diffusion-style UNet by using a mixture-of-denoising-experts architecture in which different expert networks are applied at different stages of the denoising schedule, and by injecting fine-grained scene knowledge into the cross-attention pathway.[^17] On MS-COCO it reported a state-of-the-art zero-shot FID-30k of 6.75 at the time of publication.[^17]
The ERNIE 4.5 family released on June 30, 2025 mixes dense and MoE models.[^4] The largest configuration uses 424 billion total parameters with 47 billion active per token; a second MoE size uses 3 billion active parameters; and a 0.3-billion dense model targets edge inference.[^4] Baidu described the line as natively multimodal, accepting interleaved text, image, audio, and video inputs, and shipped them with ERNIEKit, a PaddlePaddle-based training toolkit covering pre-training, supervised fine-tuning, LoRA, DPO, and quantization-aware training.[^4]
| Variant | First public reference | Notes |
|---|---|---|
| ERNIE 1.0 | arXiv 1904.09223, Apr 2019[^1] | Knowledge-masking BERT for Chinese |
| ERNIE 2.0 | arXiv 1907.12412, Jul 2019[^9] | Continual multi-task pretraining (AAAI 2020) |
| ERNIE 3.0 | arXiv 2107.02137, Jul 2021[^3] | 10B parameter unified LM, SuperGLUE first place |
| ERNIE 3.0 Titan | arXiv 2112.12731, Dec 2021[^10] | 260B parameters, Baidu and PCL |
| ERNIE-ViLG 2.0 | arXiv 2210.15257, Oct 2022[^17] | 24B-parameter text-to-image diffusion |
| ERNIE Bot / Wenxin Yiyan | March 16, 2023[^2] | Consumer chatbot launch |
| ERNIE 3.5 | June 2023[^11] | Backend upgrade behind ERNIE Bot |
| ERNIE 4.0 | October 17, 2023[^12] | Baidu World 2023, AI-native products |
| ERNIE Speed / ERNIE Lite | May 2024[^5] | Free lightweight API tiers on Qianfan |
| ERNIE 4.0 Turbo | June 28, 2024[^13] | Faster, cheaper variant |
| ERNIE 4.5 | March 16, 2025[^7] | Native multimodal flagship |
| ERNIE X1 | March 16, 2025[^7] | Deep-thinking reasoning model |
| ERNIE 4.5 Turbo / X1 Turbo | April 2025[^16] | Up-to-80-percent price cut tier |
| ERNIE 4.5 open-source | June 30, 2025[^4] | Apache 2.0, 10 model variants |
| ERNIE Bot 5.0 | November 13, 2025[^5] | 2.4-trillion-parameter multimodal model |
The Qianfan (千帆) Foundation Model Platform is the cloud-side distribution point for these models, offering them through Baidu AI Cloud with API access, fine-tuning, dataset management, prompt engineering, and an "AppBuilder" layer for assembling enterprise applications.[^5] The platform also hosts third-party open-source models alongside ERNIE.[^5]
Baidu reports that ERNIE Bot reached over 100 million users by December 2023, 200 million by April 2024, and 300 million by mid-2024.[^11] Beyond the chatbot, ERNIE is woven into Baidu's existing consumer franchise: it powers generative summaries in Baidu Search, AI features inside Baidu Maps, AI-assisted document creation in Baidu Wenku, and Wenku-style retrieval over the Baidu Drive cloud-storage service.[^12] Enterprise deployments run through Qianfan, which Baidu describes as the first one-stop enterprise foundation-model platform in China and which is closely integrated with the PaddlePaddle deep-learning framework.[^5][^8] On the device side, Baidu announced an integration in which ERNIE provided the AI features available to mainland China users of Samsung Galaxy S24 smartphones in early 2024.[^11]
The Qianfan distribution mode is significant because it inverts the architecture in which the chatbot is the visible product and the model is hidden. With Qianfan, Baidu monetises the underlying ERNIE family in three concentric layers: a free tier (ERNIE Speed, ERNIE Lite) intended to acquire developers and small companies; a paid Turbo tier (ERNIE 4.0 Turbo, 4.5 Turbo, X1 Turbo) priced aggressively against DeepSeek and OpenAI to capture cost-sensitive enterprise traffic; and a managed-services AppBuilder layer for system integrators.[^5][^16] Public Baidu disclosures position ERNIE as the dominant traffic source on the Qianfan platform, with third-party open-weight models (such as Llama derivatives and locally fine-tuned Chinese checkpoints) available as alternatives but not as the default.[^5]
In academic and developer benchmark coverage, ERNIE has been compared to BERT, XLNet, GPT-4, GPT-4.5, DeepSeek V3, and DeepSeek-R1 depending on the version. ERNIE 1.0 and 2.0 were the first major Chinese-origin pretrained transformers to outperform BERT on Chinese benchmarks; ERNIE 3.0 was the first non-English-origin model to top SuperGLUE; and ERNIE 4.5 in 2025 has been benchmarked against GPT-4.5 in Baidu's own marketing literature.[^1][^9][^3][^7]
The ERNIE lineage sits within a small group of Chinese-domestic foundation-model families, each with a distinct strategy:
| Company | Model family | Approach |
|---|---|---|
| Baidu | ERNIE / Wenxin | Long-running knowledge-integration line, vertically integrated with search and PaddlePaddle, monetized via Qianfan API and consumer ERNIE Bot.[^1][^5] |
| Alibaba | Tongyi Qianwen / Qwen | Aggressive open-weights strategy from 2023 onward, with Qwen3 series widely deployed internationally.[^18] |
| Tencent | Hunyuan | Closed-source models initially, expanding to open multimodal and video variants. |
| ByteDance | Doubao | Consumer-first chatbot strategy, leveraging ByteDance's content distribution. |
| DeepSeek (Hangzhou) | DeepSeek-R1, DeepSeek V3 | Specialist research startup with aggressive open-weights releases. |
Baidu's defining strategy across this period has been to position ERNIE as the company's flagship under the unified "Wenxin" (文心, "literary heart") family brand and to bind it tightly to its Qianfan cloud and PaddlePaddle framework.[^5][^8] Reuters, Bloomberg and Financial Times reporting through 2023 and 2024 characterised the company as the early Chinese mover (releasing a ChatGPT competitor before regulators authorised public access) but as one losing ground to newer entrants such as DeepSeek until the March 2025 ERNIE 4.5 and X1 announcements.[^7][^15] Baidu's response to DeepSeek-R1 in March 2025, including a reasoning model and aggressive price cuts, was widely framed by analysts as a strategic shift toward open-source release and commodity pricing.[^4][^7][^15]
ERNIE is the model layer of a stack that Baidu commercialises in several ways. The lowest layer is the PaddlePaddle deep-learning framework, which Baidu has positioned as a domestic competitor to PyTorch and TensorFlow since 2016, and which provides the training and serving foundation for every ERNIE model.[^8] On top sits Baidu AI Cloud, the public-cloud business through which ERNIE compute is sold. The Qianfan Foundation Model Platform layers model-management, fine-tuning, and AppBuilder workflows on top of the cloud.[^5] At the application surface, Wenxin Yiyan / ERNIE Bot is the consumer-facing chatbot, while embedded ERNIE features appear in Baidu Search, Maps, Wenku, Drive, GBI, Infoflow, and the Apollo autonomous-driving stack.[^12]
This vertical integration is in contrast to the strategies of several peers. Alibaba has emphasised open weights via Qwen and Tongyi Qianwen, distributing checkpoints internationally through Hugging Face. ByteDance's Doubao has been positioned mainly as a consumer chat assistant inside the ByteDance app ecosystem rather than as a developer cloud product. Tencent's Hunyuan family has emphasised multimodal video and 3D variants (such as HunyuanVideo and Hunyuan 3D) and ties to the WeChat ecosystem. DeepSeek is a research-led startup with no consumer app, releasing weights aggressively to attract developer attention. Baidu, by contrast, runs the consumer chatbot, the cloud, the framework, and the model line in-house.[^5][^8][^11]
This integration was reportedly a constraint on Baidu's early agility. Reuters and Bloomberg coverage in 2024 noted that consumer adoption of ERNIE Bot grew steadily but was outpaced in absolute downloads by Doubao, and that the January 2025 release of DeepSeek-R1 forced Baidu to accelerate its own pricing and open-source plans.[^14][^15] The March 2025 ERNIE 4.5 and ERNIE X1 dual announcement, with simultaneous price cuts and free consumer access, and the June 2025 open-sourcing of ERNIE 4.5, mark the strategic pivot toward a more aggressive distribution posture.[^4][^7]
Several persistent criticisms apply across the ERNIE history: