Huawei PanGu

AI Models Chinese AI Large Language Models

11 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

17 citations

Revision

v2 · 2,247 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

PanGu (盘古, sometimes written Pangu) is a family of large AI models developed by Huawei, primarily through Huawei Cloud and Huawei's Noah's Ark Lab (诺亚方舟实验室). The name comes from Pangu, the giant who separates heaven and earth in Chinese creation mythology. The first model, PanGu-Alpha, appeared in 2021 as one of the largest Chinese language models of its time, and the family later grew to span text, vision, multimodal, and scientific models. A distinctive feature of the program is that it is trained almost entirely on Huawei's own Ascend processors using the company's MindSpore framework rather than on Nvidia GPUs and the more common Western software stack. In 2025 Huawei began releasing open-weight models under the name openPangu, and the same year the family became the subject of a public dispute over whether one of its models resembled Alibaba's Qwen.

PanGu-Alpha

PanGu-Alpha (written PanGu-α) was introduced in a paper titled "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation," posted to arXiv on 26 April 2021 (arXiv:2104.12369) and credited to a large team led by Wei Zeng ^[1]. It is a decoder-only autoregressive model in the style of GPT, but trained on Chinese text. The work was released with several configurations rather than a single model: the paper lists 1.3 billion, 2.6 billion, 13 billion, and 200 billion parameter versions, with the 200B model being the headline result ^[1].

The models were trained on 1.1TB of high-quality Chinese text drawn from a range of domains, and the largest version was trained on a cluster of 2,048 Ascend 910 processors using MindSpore's auto-parallel features to split the model across devices ^[1]. The paper reported that PanGu-Alpha could perform tasks such as text summarization, question answering, and dialogue generation in few-shot and zero-shot settings. Smaller variants (notably the 2.6B and 13B models) were made available to researchers, while the full 200B model was not openly released. Huawei dates the public launch of the PanGu program to July 2021 ^[2].

Industry models (2.0, 3.0, 5.0)

Alongside the research models, Huawei Cloud built PanGu into a commercial product line of "industry" models aimed at specific sectors rather than at consumers. These versions were announced at Huawei's developer and Connect conferences, and the numbering (2.0, 3.0, 5.0) tracks the product generation rather than a single underlying network.

PanGu Models 3.0 were unveiled at the Huawei Developer Conference on 7 July 2023 ^[3]. Huawei described a layered "5+N+X" architecture: an L0 layer of five foundation models (natural language, computer vision, multimodal, prediction, and scientific computing), an L1 layer of industry-specific models trained on domain data, and an L2 layer of narrow scenario models ^[3]. The 3.0 line was offered in sizes of roughly 10 billion, 38 billion, 71 billion, and 100 billion parameters so customers could trade off latency against capability ^[3]. Huawei marketed versions for government, finance, manufacturing, mining, railway, and meteorology, and cited deployments such as the PanGu Mine model used by Shandong Energy Group ^[4].

PanGu Models 5.0 followed at the Huawei Developer Conference on 21 June 2024, presented by Huawei Cloud CEO Zhang Pingan ^[5]. Huawei said the 5.0 line spanned sizes "in the billions, tens of billions, hundreds of billions, and trillions" of parameters, grouped into tiers for embedded devices, professional use, and large general models ^[5]. By the time of Huawei Connect 2024, the company said PanGu models had been applied across more than 30 industries and 400 scenarios ^[5].

PanGu-Weather

PanGu-Weather is a deep-learning model for medium-range global weather forecasting, and it is the part of the family that drew the most attention from the scientific community. A paper titled "Accurate medium-range global weather forecasting with 3D neural networks," written by Huawei Cloud researchers, was published in the journal Nature on 5 July 2023 ^[6]. According to Huawei, citing Nature Index, this was the first Nature paper authored solely by employees of a Chinese technology company ^[7].

The model uses an architecture Huawei calls the 3D Earth-Specific Transformer (3DEST), designed to handle the non-uniform three-dimensional structure of atmospheric data. It was trained on 43 years of reanalysis data (1979 to 2021) ^[7]. Huawei reported that PanGu-Weather matched or exceeded the accuracy of traditional numerical weather prediction for forecasts ranging from one hour to seven days, while running roughly 10,000 times faster: a 24-hour global forecast could be produced in about 1.4 seconds on a single Nvidia V100 GPU rather than hours on a supercomputer ^[7]. In August 2023 the European Centre for Medium-Range Weather Forecasts (ECMWF) began publishing PanGu-Weather's 10-day forecasts on its public charts, alongside other AI models ^[8].

PanGu-Sigma

PanGu-Sigma (written PanGu-Σ) is a sparse mixture-of-experts language model described in a paper, "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing," posted to arXiv on 20 March 2023 (arXiv:2303.10845) ^[9]. It scaled the PanGu approach to about 1.085 trillion parameters by extending a dense Transformer into a sparse one, with most parameters inactive for any given input ^[9].

The paper introduced a routing scheme called Random Routed Experts (RRE), a two-level design in which experts are grouped by domain and tokens are assigned to an expert within the appropriate group without a learned gating function ^[9]. A second technique, Expert Computation and Storage Separation (ECSS), was used to keep training efficient on Ascend hardware. PanGu-Sigma was trained over 329 billion tokens on a cluster of Ascend 910 processors, and Huawei reported a 6.3 times increase in training throughput compared with a conventional MoE model using the same hyperparameters ^[9]. Press coverage at the time, drawing on the technical report, described training over more than 100 days on 512 Ascend 910 chips across more than 40 natural and programming languages ^[10].

Recent models on Ascend (2025)

In 2025 Huawei published a cluster of technical reports describing newer PanGu models trained on its Ascend NPUs. Pangu Ultra is a dense 135 billion parameter model with a 128K-token context window, reported as trained on 13.2 trillion tokens using 8,192 Ascend NPUs ^[11]. Pangu Ultra MoE is a 718 billion parameter sparse model using 256 experts per layer with 8 activated per token, described in a report on training large MoE models on more than 6,000 Ascend NPUs (arXiv:2505.04519) ^[11].

The model that became most widely discussed is Pangu Pro MoE, detailed in "Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity," posted to arXiv on 27 May 2025 (arXiv:2505.21411) ^[12]. It has 72 billion total parameters, of which 16 billion are activated per token, and 48 layers with 64 routed experts and 8 activated per layer ^[12]. Its central idea is Mixture of Grouped Experts (MoGE), which partitions the experts into groups and forces each token to activate the same number of experts within every group, evening out the load across devices ^[12]. Huawei reported that Pangu Pro MoE was tuned for the Ascend 300I Duo and 800I A2, reaching about 1,148 tokens per second per card (up to 1,528 with speculative decoding), and that it outperformed open models such as GLM-Z1-32B and Qwen3-32B on its reported benchmarks ^[12].

openPangu open-source release

Huawei moved PanGu toward open weights in mid-2025. On 30 June 2025 it announced that it would open-source the family under the name openPangu, releasing a dense 7 billion parameter model and the 72 billion parameter Pangu Pro MoE, together with inference code optimized for Ascend hardware ^[2]^[13]. The weights and inference code were published on the GitCode platform (under an Ascend Tribe organization) and mirrored to Hugging Face ^[13].

The 7B model corresponds to "Pangu Embedded," described in a separate report (arXiv:2505.22375) as a dual-system reasoner that can switch between a "fast" mode for routine queries and a "slow" mode for harder problems, with both manual and automatic mode selection ^[14]. Huawei reported that, with 7 billion parameters, it outperformed similarly sized models such as Qwen3-8B and GLM4-9B on benchmarks including AIME 2024, GPQA, and LiveCodeBench ^[14]. At Huawei Connect 2025, the company went further, saying it would fully open-source its AI software stack, including MindSpore toolchains, openPangu models, and the CANN compute architecture, by 31 December 2025 ^[2].

Model timeline and sizes

Model	Year	Type	Parameters	Notes
PanGu-Alpha	2021	Dense autoregressive LM	1.3B / 2.6B / 13B / 200B	Chinese text, arXiv:2104.12369
PanGu Models 3.0	2023	Industry model line	10B / 38B / 71B / 100B	5+N+X architecture
PanGu-Weather	2023	Weather forecasting	N/A	3DEST, Nature paper
PanGu-Sigma	2023	Sparse MoE LM	1.085T total	Random Routed Experts
PanGu Models 5.0	2024	Industry model line	Billions to trillions	Tiered by use case
Pangu Ultra	2025	Dense LM	135B	128K context, Ascend
Pangu Ultra MoE	2025	Sparse MoE LM	718B total	256 experts, 8 active
Pangu Pro MoE / openPangu Pro MoE	2025	Sparse MoE LM	72B total, 16B active	MoGE, open-weight
Pangu Embedded / openPangu-Embedded-7B	2025	Dense reasoning LM	7B	Fast and slow thinking, open-weight

Training on Ascend and MindSpore

A consistent thread across the PanGu family is that the models are trained and served on Huawei's own silicon rather than on the hardware most foundation models use. PanGu-Alpha was trained on 2,048 Ascend 910 processors with MindSpore in 2021, and the 2025 models scaled to thousands of Ascend NPUs, with Pangu Ultra reported on 8,192 NPUs and Pangu Ultra MoE on more than 6,000 ^[1]^[11]. This focus is part of Huawei's broader effort to build a domestic AI stack that does not depend on US-made accelerators, and the PanGu technical reports double as demonstrations that competitive large models can be trained on Ascend. The released inference code for openPangu is correspondingly optimized for Ascend parts such as the 300I Duo and 800I A2 rather than for GPUs ^[12]^[13].

The 2025 similarity allegation

In July 2025 PanGu became the center of a public dispute. On 6 July 2025 an anonymous party using the name HonestAGI posted a report to GitHub alleging that Pangu Pro MoE was an "upcycled" derivative of Alibaba's Qwen 2.5 14B model rather than an independent build ^[15]. The report relied on a "fingerprinting" method comparing internal parameter statistics, and claimed a correlation of about 0.927 between the two models' attention parameters, along with similar patterns in QKV bias projections and attention LayerNorm weights ^[15]. Commentators also noted that a Qwen license file had appeared inside the Pangu code repository on GitCode, and the report's author said they had received messages from purported Huawei whistleblowers ^[15].

Huawei's Noah's Ark Lab responded on 7 July 2025 with a statement rejecting the allegation. It said Pangu Pro MoE was a foundation model developed in-house and trained on Ascend hardware, and that it was "not based on incremental training of other manufacturers' models" ^[16]. The lab added that any third-party open-source code used in the model had been employed in compliance with the relevant licenses and properly attributed ^[16]. Alibaba did not publicly comment on the claims, and HonestAGI's identity was not established ^[16]. The original GitHub repository was later taken down, with the technical evidence removed, which left the dispute unresolved in public; supporters of each side disagreed over whether the statistical method used in the report was a reliable test of copying ^[15]^[16].

Licensing

The PanGu research papers were published openly, but the early large models were not released as open weights: only smaller PanGu-Alpha variants (such as the 2.6B and 13B models) circulated for research use. The 2025 openPangu releases are distributed under Huawei's own OPENPANGU MODEL LICENSE AGREEMENT (version 1.0), which Huawei presents as a permissive license intended to allow further development of the models ^[17]. The license, model weights, and inference code for openPangu-Embedded-7B and Pangu Pro MoE are hosted on GitCode and on Hugging Face ^[13]^[17].

References

PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation, arXiv. https://arxiv.org/abs/2104.12369 ↩
Huawei PanGu, Wikipedia. https://en.wikipedia.org/wiki/Huawei_PanGu ↩
Reshaping Industries with AI: Huawei Cloud Launches Pangu Models 3.0 and Ascend AI Cloud Services, Huawei Cloud. https://www.huaweicloud.com/intl/en-us/news/20230707180809498.html ↩
Huawei: Reshaping Industries with AI as the Cloud Foundation for an Intelligent World, Huawei. https://www.huawei.com/en/news/2023/9/ascend-aicloud-service ↩
Huawei Cloud unveils Pangu Large Model 5.0, Huawei Central. https://www.huaweicentral.com/huawei-cloud-unveils-pangu-large-model-5-0/ ↩
Accurate medium-range global weather forecasting with 3D neural networks, Nature. https://www.nature.com/articles/s41586-023-06185-3 ↩
Prestigious science journal Nature publishes paper about Pangu Weather AI Model authored by HUAWEI CLOUD researchers, Huawei. https://www.huawei.com/en/news/2023/7/pangu-ai-model-nature-publish ↩
Huawei Cloud Pangu-Weather Model Now Available on European Weather Agency Website, Huawei. https://www.huawei.com/en/news/2023/8/pangu-weather-forcast ↩
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing, arXiv. https://arxiv.org/abs/2303.10845 ↩
Huawei Researchers Develop Pangu-Σ: A Large Language Model With Sparse Architecture And 1.085 Trillion Parameters, MarkTechPost. https://www.marktechpost.com/2023/07/10/huawei-researchers-develop-pangu-%CF%83-a-large-language-model-with-sparse-architecture-and-1-085-trillion-parameters/ ↩
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs, arXiv. https://arxiv.org/abs/2505.04519 ↩
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity, arXiv. https://arxiv.org/abs/2505.21411 ↩
Huawei open sources its 7B and 72B Pangu AI models and some of its model reasoning tech, CNBC via Techmeme. https://www.techmeme.com/250701/p7 ↩
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition, arXiv. https://arxiv.org/abs/2505.22375 ↩
Huawei Rejects Claims It Copied Alibaba's Qwen AI Model for "Pangu Pro", WinBuzzer. https://winbuzzer.com/2025/07/07/huawei-rejects-claims-it-copied-alibabas-ai-escalating-chinas-tech-war-xcxwbn/ ↩
Huawei denies Pangu model was based on Alibaba's Qwen, Developer Tech. https://www.developer-tech.com/news/huawei-denies-pangu-model-was-based-on-alibaba-qwen/ ↩
openPangu-Embedded-7B model card, Hugging Face. https://huggingface.co/FreedomIntelligence/openPangu-Embedded-7B ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Artificial intelligence in China MindSpore

PanGu-Alpha

Industry models (2.0, 3.0, 5.0)

PanGu-Weather

PanGu-Sigma

Recent models on Ascend (2025)

openPangu open-source release

Model timeline and sizes

Training on Ascend and MindSpore

The 2025 similarity allegation

Licensing

References

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here