The Beijing Academy of Artificial Intelligence (BAAI, Chinese: 北京智源人工智能研究院, often abbreviated 智源研究院 or Zhiyuan Research Institute) is a Chinese nonprofit artificial intelligence research organization based in Beijing. Founded on November 14, 2018 with backing from the Beijing Municipal Science and Technology Commission and the Zhongguancun Science Park Management Committee, BAAI has become one of China's most influential AI research institutes. The academy is best known for the Wu Dao large model series, including the 2021 Wu Dao 2.0 trillion-parameter model, and for the BGE (BAAI General Embedding) family, which has become one of the most downloaded open-source embedding model families on Hugging Face. BAAI also runs an annual conference that has grown into one of the most-watched AI gatherings in Asia.
| Field | Detail |
|---|---|
| Name | Beijing Academy of Artificial Intelligence (BAAI) |
| Chinese name | 北京智源人工智能研究院 |
| Founded | November 14, 2018 |
| Founder | Beijing Municipal Science and Technology Commission; Zhongguancun Science Park Management Committee |
| Founding chairman | Zhang Hongjiang |
| Type | Nonprofit AI research institute |
| Headquarters | Wudaokou, Haidian District, Beijing, China |
| President (current) | Wang Zhongyuan |
| Chairman / Dean | Huang Tiejun |
| Focus areas | Large language models, multimodal AI, embeddings, embodied AI, AI safety, foundation model research |
| Notable models | Wu Dao 1.0/2.0/3.0, Aquila / Aquila2, Emu / Emu2 / Emu3, BGE family, SegGPT, Painter |
| Open-source ecosystem | FlagOpen, FlagAI, FlagEmbedding |
| Website | baai.ac.cn |
BAAI was inaugurated on November 14, 2018 in Beijing's Haidian District as part of the Beijing AI Action Plan. The academy was established with support from the Ministry of Science and Technology and the Beijing municipal government, and pulled together a coalition of academic and industry partners that included Peking University, Tsinghua University, the Chinese Academy of Sciences, Baidu, ByteDance, Megvii, Meituan-Dianping, and Xiaomi. Zhang Hongjiang, a former CEO of Kingsoft and former managing director of Microsoft's Advanced Technology Center, was named founding chairman.
The organization was set up as a nonprofit. Its stated mission centers on long-term, fundamental AI research and on linking academic labs with the Chinese tech industry, with the Beijing municipal government providing the bulk of the funding.
In April 2019, BAAI launched the BAAI Scholars Program, which was designed to back roughly 100 senior AI researchers each year. The next month, the academy released the Beijing AI Principles (also referred to in some translations as the Beijing Consensus on Artificial Intelligence), a public statement on the responsible development and use of AI signed by Chinese universities, companies, and government bodies.
The first BAAI Conference was held in October 2019. From the start it was framed as an international event, and the academy used it to surface its ongoing work and to host visiting researchers from outside China.
Through 2020 and into early 2021, BAAI shifted a large share of its research budget toward very large pre-trained models. On March 20, 2021, the academy released Wu Dao 1.0, presented as China's first large-scale pre-trained model system. The work was led by Tang Jie, a Tsinghua University professor and BAAI's academic vice president, with contributions from more than 100 researchers across Peking University, Tsinghua, Renmin University of China, and the Chinese Academy of Sciences. Wu Dao 1.0 was actually a family of four models: Wen Yuan for language, Wen Lan for vision-language, Wen Hui for cognitive tasks, and Wen Su for protein structure prediction.
A few months later, on June 1, 2021, BAAI announced Wu Dao 2.0 at its annual conference. With 1.75 trillion parameters, it was, at the time of release, the largest neural network ever publicly disclosed. Crucially, it used a mixture of experts (MoE) architecture rather than the dense transformer used by GPT-3, which let the team push past a trillion parameters without a proportional jump in compute. Wu Dao 2.0 was also multimodal, trained on roughly 4.9 terabytes of text and images, with about 1.2 TB of Chinese text and 1.2 TB of English text in the mix. The model became a regular reference point in 2021 commentary about a closing US-China gap in foundation model research.
After Wu Dao 2.0, BAAI shifted toward a portfolio strategy, releasing many smaller open-weight models rather than chasing a single record-setting one. In mid-2023 the academy released the Aquila bilingual LLM family, including 7-billion-parameter base and chat models, followed in late 2023 by the Aquila2 series with 7B, 34B, and an experimental 70B variant. Around the same time, BAAI's vision team released the first Emu multimodal model, then the 37-billion-parameter Emu2 (accepted at CVPR 2024), and later Emu3, which framed image, text, and video as a single next-token prediction problem.
In August 2023, BAAI released the first version of the BGE embedding family, followed by BGE v1.5 in September of the same year. BGE quickly climbed the MTEB and C-MTEB leaderboards and became one of the most-cited open-source embedding model series. The BGE-M3 release in early 2024 added multilingual coverage (more than 100 languages), an 8,192-token context, and a single model that handles dense, sparse, and multi-vector retrieval at once.
Alongside the model releases, BAAI launched FlagOpen, an umbrella for its open-source projects (FlagAI, FlagEmbedding, FlagPerf, FlagScale, and others). On June 12, 2023, at the 5th BAAI Conference, the academy announced the Wu Dao 3.0 series, which was less a single model than a refresh of the language, vision, and other research lines under one banner.
In 2024 the academy turned more attention to embodied intelligence, biological computing models, and robotics, with FlagOpen 2.0 announced at the 6th BAAI Conference in June 2024. President Wang Zhongyuan, who had earlier worked at Microsoft Research Asia, Facebook, Kuaishou, and Meituan, used the conference to lay out the year's roadmap.
On March 25, 2025, the U.S. Department of Commerce's Bureau of Industry and Security added BAAI to its Entity List, citing alleged work supporting Chinese military modernization. The listing restricts U.S. firms from exporting certain technologies to the academy without a license. BAAI has continued to publish open models on Hugging Face and to host its annual conference; the 2025 BAAI Conference opened on June 6, 2025 at the Zhongguancun National Independent Innovation Demonstration Zone exhibition center, where Wang Zhongyuan announced a new WuJie (悟界) series of large models focused on the physical world.
BAAI is registered as a nonprofit and is funded primarily by the Beijing municipal government, with additional support tied to specific national science and technology programs. It is headquartered in the Wudaokou area of Haidian District, Beijing, close to Tsinghua and Peking University, and reported around 100 employees as of 2023.
Rather than acting as a traditional university lab, BAAI operates as a hub. Most of its researchers hold joint appointments at partner universities (Tsinghua, Peking, Renmin, and the Chinese Academy of Sciences are the most active partners) or come from industry on secondment. The academy organizes its work into research centers covering language, vision, multimodal learning, foundation model evaluation, AI safety, and more recently embodied intelligence and bio-computing. The FlagOpen ecosystem and a separate evaluation arm (FlagEval) cut across these centers.
Industry partners include Alibaba, Tencent, Baidu, ByteDance, Megvii, Meituan, and Xiaomi, all of which were associated with BAAI's founding coalition. The academy also works with regional government labs, including a January 2025 joint laboratory agreement with the Shenzhen-based Leju Robot.
BAAI's leadership has rotated several times since 2018. The academy distinguishes between a chairman/dean role, which has tended to go to a senior academic, and a president/director role focused on day-to-day research operations.
| Period | Role | Person | Notes |
|---|---|---|---|
| 2018 onward | Founding chairman | Zhang Hongjiang | Former CEO of Kingsoft, former MD of Microsoft Advanced Technology Center, elected to U.S. National Academy of Engineering in 2022 |
| Early years | First dean | Huang Tiejun | Peking University professor, vision and neuromorphic computing researcher |
| 2018 onward | Academic vice president | Tang Jie | Tsinghua University professor; led Wu Dao 1.0 and Wu Dao 2.0 |
| Recent | Chairman / Dean | Huang Tiejun | Continues in a senior leadership role at the academy |
| Around 2024 onward | President | Wang Zhongyuan | Former Microsoft Research Asia, Facebook, Kuaishou, and Meituan |
| Recent | Director | Zhu Songchun | Peking University professor; spent 14 years at UCLA before returning to China in 2020 |
Tang Jie, who is also one of the creators of the ChatGLM language model line associated with Tsinghua spin-off Zhipu AI, has long been the most public technical face of BAAI's large-model work.
The Wu Dao series is what put BAAI on the map outside China. Wu Dao 1.0, released March 20, 2021, was a four-part family: a 2.6-billion-parameter language model (Wen Yuan), a 1-billion-parameter vision-language model trained on 50 million image-text pairs (Wen Lan), an 11.3-billion-parameter cognitive model (Wen Hui), and a protein folding model (Wen Su) built on top of BERT.
Wu Dao 2.0, announced on June 1, 2021, jumped to 1.75 trillion parameters via a mixture of experts architecture. The team used FastMoE, an open-source MoE library that does not require Google's TPU stack, which was part of the academy's pitch that very large models did not have to be locked behind proprietary infrastructure. Training data totaled around 4.9 TB and was multimodal, mixing roughly 1.2 TB each of Chinese and English text with image and dialogue data. Wu Dao 2.0 was demonstrated writing essays and classical Chinese poetry, generating images from text, captioning images, and predicting protein structures. The academy also unveiled Hua Zhibing, a virtual student powered by the model, around the same time.
Wu Dao 3.0 was launched on June 12, 2023 at the 5th BAAI Conference. It was less a single model than a relaunch of several research lines, including the Aquila language models, the Vision team's work, and updated multimodal systems, all positioned under the Wu Dao 3.0 brand.
The Aquila series is BAAI's open-weight bilingual LLM line. The first release covered Aquila-7B (a base model) and AquilaChat-7B (a chat-tuned variant), trained from scratch on a roughly 60-40 English-Chinese corpus. The architecture borrows from GPT-3 and LLaMA but uses a redesigned tokenizer for Chinese and English and a set of more efficient operator implementations.
Aquila2, released in late 2023, expanded the line with Aquila2-7B and Aquila2-34B base models, AquilaChat2-7B and AquilaChat2-34B chat models, AquilaSQL for natural-language-to-SQL, and the experimental Aquila2-70B-Expr and AquilaChat2-70B-Expr. AquilaChat2-34B-16K extended context to 16K tokens. The model code is Apache 2.0 licensed; the model weights themselves are under a custom BAAI Aquila Model License Agreement.
The Emu series comes out of the BAAI Vision team and is BAAI's main answer to the GPT-4V class of multimodal models. The first Emu was a generative multimodal model trained on interleaved image-text sequences. Emu2 scaled to 37 billion parameters with a unified autoregressive objective and was accepted at CVPR 2024. The team highlighted Emu2's in-context multimodal learning, including visual prompting and object-grounded generation.
Emu3, released in 2024, abandoned the diffusion or compositional architectures common in the field. Instead, it tokenizes images, text, and videos into a single discrete space and trains a single transformer with next-token prediction. BAAI reported that Emu3 outperformed several task-specific models, including SDXL on text-to-image generation, LLaVA-1.6 on vision-language understanding, and OpenSora-1.2 on video generation, all without relying on a CLIP encoder or pretrained LLM. Emu3.5 followed, pre-trained on more than 10 trillion interleaved tokens drawn from video frames and transcripts, and added support for interleaved image-text generation and stronger spatiotemporal modeling.
The BAAI General Embedding family (BGE) is, by download volume, BAAI's most-used contribution. The first BGE models appeared in August 2023, with the v1.5 update in September 2023 fixing the similarity score distribution and improving retrieval without requiring an explicit instruction prefix.
| Model | Size | Embedding dim | Notes |
|---|---|---|---|
| BGE-large-en-v1.5 | ~335M params | 1024 | Topped MTEB at release; over 14 million Hugging Face downloads per month as of 2024 |
| BGE-base-en-v1.5 | ~110M params | 768 | Smaller English variant |
| BGE-small-en-v1.5 | ~33M params | 384 | Lightweight variant |
| BGE-large-zh-v1.5 | ~335M params | 1024 | Chinese counterpart, top of C-MTEB at release |
| BGE-M3 | ~568M params | 1024 | Multilingual (100+ languages), 8K context, dense + sparse + multi-vector retrieval in one model |
| BGE-reranker-base / large | ~278M / ~559M | n/a | Cross-encoders for re-ranking retrieval results |
| BGE-VL | 0.1B / 0.4B / 4B / 8B variants | varies | Vision-language embedding models for multimodal retrieval |
BGE has been integrated into many retrieval-augmented generation stacks, vector databases, and LangChain-style frameworks, and it is offered through services like NVIDIA NIM. The code lives in the FlagEmbedding repository, distributed under the MIT license.
The BAAI Vision team's Painter (CVPR 2023) and SegGPT (ICCV 2023) are paired generalist vision models. Painter is framed as a general-purpose model for in-context visual learning across a wide set of dense prediction tasks. SegGPT specializes that idea for segmentation: a single decoder-only transformer takes an input image plus a prompt image and a prompt mask, and outputs a segmentation mask for the input. Training is set up as an in-context coloring problem with random color mappings for each sample, which forces the model to read the prompt and not memorize fixed colors.
SegGPT supports object instance, stuff, part, contour, and text segmentation in both images and video, and reported one-shot scores of 56.1 mIoU on COCO-20 and 85.6 mIoU on FSS-1000. The model was added to Hugging Face Transformers as an officially supported model.
FlagOpen is BAAI's umbrella project for open-source AI infrastructure. The main components include:
| Project | Purpose |
|---|---|
| FlagAI | General toolkit for training and inference of large models, accepted as a sandbox-level project at the Linux Foundation |
| FlagEmbedding | Repository hosting BGE and related retrieval models |
| FlagScale | LLM toolkit for distributed training across heterogeneous chips |
| FlagPerf | Benchmark suite for AI hardware |
| FlagEval | Evaluation platform and leaderboards for LLMs and multimodal models |
The academy reframed this collection as FlagOpen 2.0 at the 2024 conference, with more emphasis on cross-chip support and on evaluation tooling.
BAAI has also published research on biological-style computational models, including MetaWorm and BAAIWorm, which simulate the C. elegans nervous system together with a digital body. The Jiuding AI computing platform has been described by the academy as offering on the order of 1,000 PFLOPs of capacity for member institutions. More recent work covers embodied AI agents (RoboBrain, RoboBrain-Dex) and the WuJie series of physical-world models announced in 2025.
BAAI maintains one of the most active organizational profiles on Hugging Face, hosting roughly 169 models, more than 120 datasets, and around 17 spaces under huggingface.co/BAAI, plus a separate huggingface.co/baaivision profile for the vision group. The bulk of monthly downloads comes from BGE, with bge-large-en-v1.5 alone reporting more than 14 million downloads per month. Datasets such as Infinity-Instruct (around 21.9M instances) and CI-VID see steady use. On GitHub, BAAI publishes through the FlagAI-Open and FlagOpen organizations, and through baaivision for computer vision projects.
Because many BAAI models ship under permissive licenses (MIT for BGE, Apache 2.0 for much of the Aquila code, custom but commercially usable terms for Aquila weights), they have ended up inside a long list of downstream products, especially in retrieval-augmented generation systems.
The BAAI Conference began in 2019 and has run every year since. It typically draws tens of thousands of registrants, including international researchers and Turing Award laureates, and is often described in Chinese tech media as the country's flagship AI conference. Over the years it has hosted speakers including Sam Altman of OpenAI as well as researchers from DeepMind and Anthropic.
| Edition | Year | Notes |
|---|---|---|
| 1st | October 2019 | Inaugural edition |
| 2nd | June 2020 | Held online due to COVID-19 |
| 3rd | June 2021 | Wu Dao 2.0 announced on June 1 at the conference |
| 4th | May 2022 | Held online; opened May 31, 2022 |
| 5th | June 2023 | Wu Dao 3.0 announced on June 12, 2023 |
| 6th | June 2024 | FlagOpen 2.0 announced; theme on language, multimodal, embodied, and bio-computing models |
| 7th | June 2025 | Held at the Zhongguancun National Independent Innovation Demonstration Zone; WuJie series announced |
The conference is organized as a multi-day event with keynote talks, technical sessions, model launches, and side events. Mayor of Beijing Yin Yong attended and spoke at the 2025 edition.
BAAI's partner network is unusually broad for a single research institute. Founding partners and ongoing collaborators include:
The academy has also been involved in international AI safety dialogues and is a member of China's AI Safety and Development Association (CnAISDA), set up in January 2025.