Huawei PanGu
Last reviewed
Jun 3, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 · 2,250 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 · 2,250 words
Add missing citations, update stale details, or suggest a clearer explanation.
PanGu (盘古, sometimes written Pangu) is a family of large AI models developed by Huawei, primarily through Huawei Cloud and Huawei's Noah's Ark Lab (诺亚方舟实验室). The name comes from Pangu, the giant who separates heaven and earth in Chinese creation mythology. The first model, PanGu-Alpha, appeared in 2021 as one of the largest Chinese language models of its time, and the family later grew to span text, vision, multimodal, and scientific models. A distinctive feature of the program is that it is trained almost entirely on Huawei's own Ascend processors using the company's MindSpore framework rather than on Nvidia GPUs and the more common Western software stack. In 2025 Huawei began releasing open-weight models under the name openPangu, and the same year the family became the subject of a public dispute over whether one of its models resembled Alibaba's Qwen.
PanGu-Alpha (written PanGu-α) was introduced in a paper titled "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation," posted to arXiv on 26 April 2021 (arXiv:2104.12369) and credited to a large team led by Wei Zeng [1]. It is a decoder-only autoregressive model in the style of GPT, but trained on Chinese text. The work was released with several configurations rather than a single model: the paper lists 1.3 billion, 2.6 billion, 13 billion, and 200 billion parameter versions, with the 200B model being the headline result [1].
The models were trained on 1.1TB of high-quality Chinese text drawn from a range of domains, and the largest version was trained on a cluster of 2,048 Ascend 910 processors using MindSpore's auto-parallel features to split the model across devices [1]. The paper reported that PanGu-Alpha could perform tasks such as text summarization, question answering, and dialogue generation in few-shot and zero-shot settings. Smaller variants (notably the 2.6B and 13B models) were made available to researchers, while the full 200B model was not openly released. Huawei dates the public launch of the PanGu program to July 2021 [2].
Alongside the research models, Huawei Cloud built PanGu into a commercial product line of "industry" models aimed at specific sectors rather than at consumers. These versions were announced at Huawei's developer and Connect conferences, and the numbering (2.0, 3.0, 5.0) tracks the product generation rather than a single underlying network.
PanGu Models 3.0 were unveiled at the Huawei Developer Conference on 7 July 2023 [3]. Huawei described a layered "5+N+X" architecture: an L0 layer of five foundation models (natural language, computer vision, multimodal, prediction, and scientific computing), an L1 layer of industry-specific models trained on domain data, and an L2 layer of narrow scenario models [3]. The 3.0 line was offered in sizes of roughly 10 billion, 38 billion, 71 billion, and 100 billion parameters so customers could trade off latency against capability [3]. Huawei marketed versions for government, finance, manufacturing, mining, railway, and meteorology, and cited deployments such as the PanGu Mine model used by Shandong Energy Group [4].
PanGu Models 5.0 followed at the Huawei Developer Conference on 21 June 2024, presented by Huawei Cloud CEO Zhang Pingan [5]. Huawei said the 5.0 line spanned sizes "in the billions, tens of billions, hundreds of billions, and trillions" of parameters, grouped into tiers for embedded devices, professional use, and large general models [5]. By the time of Huawei Connect 2024, the company said PanGu models had been applied across more than 30 industries and 400 scenarios [5].
PanGu-Weather is a deep-learning model for medium-range global weather forecasting, and it is the part of the family that drew the most attention from the scientific community. A paper titled "Accurate medium-range global weather forecasting with 3D neural networks," written by Huawei Cloud researchers, was published in the journal Nature on 5 July 2023 [6]. According to Huawei, citing Nature Index, this was the first Nature paper authored solely by employees of a Chinese technology company [7].
The model uses an architecture Huawei calls the 3D Earth-Specific Transformer (3DEST), designed to handle the non-uniform three-dimensional structure of atmospheric data. It was trained on 43 years of reanalysis data (1979 to 2021) [7]. Huawei reported that PanGu-Weather matched or exceeded the accuracy of traditional numerical weather prediction for forecasts ranging from one hour to seven days, while running roughly 10,000 times faster: a 24-hour global forecast could be produced in about 1.4 seconds on a single Nvidia V100 GPU rather than hours on a supercomputer [7]. In August 2023 the European Centre for Medium-Range Weather Forecasts (ECMWF) began publishing PanGu-Weather's 10-day forecasts on its public charts, alongside other AI models [8].
PanGu-Sigma (written PanGu-Σ) is a sparse mixture-of-experts language model described in a paper, "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing," posted to arXiv on 20 March 2023 (arXiv:2303.10845) [9]. It scaled the PanGu approach to about 1.085 trillion parameters by extending a dense Transformer into a sparse one, with most parameters inactive for any given input [9].
The paper introduced a routing scheme called Random Routed Experts (RRE), a two-level design in which experts are grouped by domain and tokens are assigned to an expert within the appropriate group without a learned gating function [9]. A second technique, Expert Computation and Storage Separation (ECSS), was used to keep training efficient on Ascend hardware. PanGu-Sigma was trained over 329 billion tokens on a cluster of Ascend 910 processors, and Huawei reported a 6.3 times increase in training throughput compared with a conventional MoE model using the same hyperparameters [9]. Press coverage at the time, drawing on the technical report, described training over more than 100 days on 512 Ascend 910 chips across more than 40 natural and programming languages [10].
In 2025 Huawei published a cluster of technical reports describing newer PanGu models trained on its Ascend NPUs. Pangu Ultra is a dense 135 billion parameter model with a 128K-token context window, reported as trained on 13.2 trillion tokens using 8,192 Ascend NPUs [11]. Pangu Ultra MoE is a 718 billion parameter sparse model using 256 experts per layer with 8 activated per token, described in a report on training large MoE models on more than 6,000 Ascend NPUs (arXiv:2505.04519) [11].
The model that became most widely discussed is Pangu Pro MoE, detailed in "Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity," posted to arXiv on 27 May 2025 (arXiv:2505.21411) [12]. It has 72 billion total parameters, of which 16 billion are activated per token, and 48 layers with 64 routed experts and 8 activated per layer [12]. Its central idea is Mixture of Grouped Experts (MoGE), which partitions the experts into groups and forces each token to activate the same number of experts within every group, evening out the load across devices [12]. Huawei reported that Pangu Pro MoE was tuned for the Ascend 300I Duo and 800I A2, reaching about 1,148 tokens per second per card (up to 1,528 with speculative decoding), and that it outperformed open models such as GLM-Z1-32B and Qwen3-32B on its reported benchmarks [12].
Huawei moved PanGu toward open weights in mid-2025. On 30 June 2025 it announced that it would open-source the family under the name openPangu, releasing a dense 7 billion parameter model and the 72 billion parameter Pangu Pro MoE, together with inference code optimized for Ascend hardware [2][13]. The weights and inference code were published on the GitCode platform (under an Ascend Tribe organization) and mirrored to Hugging Face [13].
The 7B model corresponds to "Pangu Embedded," described in a separate report (arXiv:2505.22375) as a dual-system reasoner that can switch between a "fast" mode for routine queries and a "slow" mode for harder problems, with both manual and automatic mode selection [14]. Huawei reported that, with 7 billion parameters, it outperformed similarly sized models such as Qwen3-8B and GLM4-9B on benchmarks including AIME 2024, GPQA, and LiveCodeBench [14]. At Huawei Connect 2025, the company went further, saying it would fully open-source its AI software stack, including MindSpore toolchains, openPangu models, and the CANN compute architecture, by 31 December 2025 [2].
| Model | Year | Type | Parameters | Notes |
|---|---|---|---|---|
| PanGu-Alpha | 2021 | Dense autoregressive LM | 1.3B / 2.6B / 13B / 200B | Chinese text, arXiv:2104.12369 |
| PanGu Models 3.0 | 2023 | Industry model line | 10B / 38B / 71B / 100B | 5+N+X architecture |
| PanGu-Weather | 2023 | Weather forecasting | N/A | 3DEST, Nature paper |
| PanGu-Sigma | 2023 | Sparse MoE LM | 1.085T total | Random Routed Experts |
| PanGu Models 5.0 | 2024 | Industry model line | Billions to trillions | Tiered by use case |
| Pangu Ultra | 2025 | Dense LM | 135B | 128K context, Ascend |
| Pangu Ultra MoE | 2025 | Sparse MoE LM | 718B total | 256 experts, 8 active |
| Pangu Pro MoE / openPangu Pro MoE | 2025 | Sparse MoE LM | 72B total, 16B active | MoGE, open-weight |
| Pangu Embedded / openPangu-Embedded-7B | 2025 | Dense reasoning LM | 7B | Fast and slow thinking, open-weight |
A consistent thread across the PanGu family is that the models are trained and served on Huawei's own silicon rather than on the hardware most foundation models use. PanGu-Alpha was trained on 2,048 Ascend 910 processors with MindSpore in 2021, and the 2025 models scaled to thousands of Ascend NPUs, with Pangu Ultra reported on 8,192 NPUs and Pangu Ultra MoE on more than 6,000 [1][11]. This focus is part of Huawei's broader effort to build a domestic AI stack that does not depend on US-made accelerators, and the PanGu technical reports double as demonstrations that competitive large models can be trained on Ascend. The released inference code for openPangu is correspondingly optimized for Ascend parts such as the 300I Duo and 800I A2 rather than for GPUs [12][13].
In July 2025 PanGu became the center of a public dispute. On 6 July 2025 an anonymous party using the name HonestAGI posted a report to GitHub alleging that Pangu Pro MoE was an "upcycled" derivative of Alibaba's Qwen 2.5 14B model rather than an independent build [15]. The report relied on a "fingerprinting" method comparing internal parameter statistics, and claimed a correlation of about 0.927 between the two models' attention parameters, along with similar patterns in QKV bias projections and attention LayerNorm weights [15]. Commentators also noted that a Qwen license file had appeared inside the Pangu code repository on GitCode, and the report's author said they had received messages from purported Huawei whistleblowers [15].
Huawei's Noah's Ark Lab responded on 7 July 2025 with a statement rejecting the allegation. It said Pangu Pro MoE was a foundation model developed in-house and trained on Ascend hardware, and that it was "not based on incremental training of other manufacturers' models" [16]. The lab added that any third-party open-source code used in the model had been employed in compliance with the relevant licenses and properly attributed [16]. Alibaba did not publicly comment on the claims, and HonestAGI's identity was not established [16]. The original GitHub repository was later taken down, with the technical evidence removed, which left the dispute unresolved in public; supporters of each side disagreed over whether the statistical method used in the report was a reliable test of copying [15][16].
The PanGu research papers were published openly, but the early large models were not released as open weights: only smaller PanGu-Alpha variants (such as the 2.6B and 13B models) circulated for research use. The 2025 openPangu releases are distributed under Huawei's own OPENPANGU MODEL LICENSE AGREEMENT (version 1.0), which Huawei presents as a permissive license intended to allow further development of the models [17]. The license, model weights, and inference code for openPangu-Embedded-7B and Pangu Pro MoE are hosted on GitCode and on Hugging Face [13][17].