GLM-5

AI Models Chinese AI Large Language Models Mixture of Experts Open Source AI

17 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

24 citations

Revision

v3 · 3,446 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

GLM-5 is an open-weight flagship large language model released by the Chinese AI company Zhipu AI, under its international brand Z.ai, on February 11, 2026. It is a 744 billion parameter sparse Mixture of Experts (MoE) transformer with roughly 40 billion parameters active per token, distributed under the permissive MIT License on Hugging Face, and positioned by Zhipu as an agentic engineering model built for long-horizon coding and tool use ^[1]^[2]^[3]. The accompanying technical report describes GLM-5 as "a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering" ^[21].

GLM-5 succeeds GLM-4.6 (September 2025) and roughly doubles the total parameter count of GLM-4.5 (355 billion total, 32 billion active) while leaving the active parameter count only modestly higher, a design choice that pushes Zhipu's flagship further into the very-large-sparse-MoE space pioneered by DeepSeek V3 and Kimi K2. The model adopts DeepSeek Sparse Attention (DSA) layered on Multi-head Latent Attention for efficient long-context processing, supports a 200,000 token context window, and was trained on 28.5 trillion tokens ^[1]^[2]^[5]^[21].

The release attracted unusual attention because of its political and financial context. Zhipu had become the first Chinese large language model company to list publicly only a month earlier, on the Hong Kong Stock Exchange, and GLM-5 was its first flagship release after that listing. Multiple outlets reported that GLM-5 was trained entirely on domestically produced Huawei chips, which was widely covered as evidence that United States export controls on advanced AI accelerators had not prevented China from producing a frontier-tier model. Reuters summarized the claim as a model "trained entirely on Huawei Ascend chips using the MindSpore framework, with zero dependency on NVIDIA hardware," and Zhipu's Hong Kong listed shares rose by as much as 34 percent in the days after the release ^[3]^[6]^[7].

What is GLM-5?

GLM-5 is the February 2026 flagship of the GLM (General Language Model) family from Zhipu AI / Z.ai. It is a decoder-only, open-weight text model with a sparse MoE architecture, a 200,000 token context window, and an MIT license that permits unrestricted commercial use. Its design goal, stated in the technical report, is to move beyond short "vibe coding" prompts toward "agentic engineering": running as an autonomous agent that keeps itself on task across long, multi-step coding and tool-use workflows ^[1]^[2]^[21]. The base GLM-5 is text only; vision and longer-horizon agent capabilities arrived in later family members (GLM-5.1 and GLM-5.2) ^[14]^[15].

Background

Zhipu AI was founded in 2019 as a spinoff of the Knowledge Engineering Group at Tsinghua University and built up its GLM line over several years, from the bilingual open-source GLM-130B in 2022 through ChatGLM-6B, ChatGLM2-6B, GLM-4, and the open-weight GLM-4.5 family in mid-2025. In 2025 the company adopted Z.ai as its international consumer brand and shifted its flagship line back to permissive open-weight releases, beginning with GLM-4.5 in July 2025 and GLM-4.6 in September 2025 ^[4]^[8].

The company's public profile changed sharply in early 2026. On January 8, 2026, the company (legally Knowledge Atlas Technology Joint Stock Co. Ltd.) listed on the Hong Kong Stock Exchange, becoming the first foundation model company in the world to go public. The offering priced at HKD 116.20 per share, raised about HKD 4.17 billion (roughly USD 558 million), and gave the company an initial valuation near HKD 55.5 billion (about USD 7.1 billion). The Hong Kong retail tranche was oversubscribed by about 1,159 times, extreme even by Hong Kong tech IPO standards, and the company said roughly 70 percent of the proceeds would fund AI model research through 2028 ^[6]^[8]^[22].

GLM-5 shipped roughly five weeks after the listing. The release was timed against a busy first quarter that also saw new closed-model releases from OpenAI and Anthropic and a wave of Chinese frontier-tier models trained on domestic accelerators rather than NVIDIA hardware. Zhipu framed GLM-5 less as a pure benchmark winner and more as a usable open agentic system, particularly for long-running coding workflows where a model needs to keep itself on task across many steps ^[1]^[2].

What is the architecture of GLM-5?

GLM-5 is a decoder-only transformer with a sparse Mixture of Experts feed-forward layer at every block. Of the 744 billion total parameters, roughly 40 billion are activated per token. Each MoE layer holds 256 routed experts plus a shared expert that processes every token; routing selects a small subset of the routed experts for each token, with the shared expert contributing on top. The technical report describes an 80-layer stack, a layer count chosen to reduce expert-parallelism communication overhead, which produces an active-parameter footprint comparable to a dense 40B model while keeping total capacity in the high triple-digit billions ^[2]^[5]^[9]^[21].

The biggest architectural change relative to GLM-4.6 is the attention layer. GLM-5 combines Multi-head Latent Attention (MLA) with DeepSeek Sparse Attention (DSA), the sparse attention family introduced by DeepSeek in late 2025, applied during continued pre-training. DSA runs a lightweight indexer over the key-value cache, selects a small subset of the most relevant tokens for each query, and then performs the heavy attention computation only over that subset. The reported result is roughly linear, rather than quadratic, scaling of attention cost with context length, which makes the 200,000 token context window economically practical to serve. The context window was extended progressively during training through 32K, 128K, and 200K stages ^[2]^[4]^[5]^[21].

Zhipu also reports inference-time engineering choices that matter for serving the model. The base weights ship in BF16 with an official FP8 quantized variant that fits a single 8 GPU H200 (or H20) inference node. Both variants are published on Hugging Face. The chat template, tool-calling schema, and OpenAI-compatible API are designed to be drop-in compatible with the GLM-4.5 and GLM-4.6 inference stacks, including vLLM and SGLang, so existing GLM deployments could update with relatively little integration work ^[2]^[9]^[10].

Model configuration summary

Specification	Value
Total parameters	744 billion ^[2]^[5]
Active parameters per token	About 40 billion ^[2]^[5]
Layers	80 ^[9]^[21]
Routed experts per MoE layer	256 ^[2]^[9]^[21]
Shared experts	Yes, plus the routed experts ^[9]
Attention	Multi-head Latent Attention with DeepSeek Sparse Attention (DSA) ^[2]^[21]
Context window	200,000 tokens ^[2]^[5]
Native precision	BF16, with an official FP8 variant ^[2]^[9]
Training tokens	28.5 trillion ^[1]^[2]
License	MIT ^[2]^[9]

How was GLM-5 trained?

Zhipu reports that GLM-5 was pre-trained on 28.5 trillion tokens, up from the 23 trillion tokens used for GLM-4.5, with continued emphasis on bilingual English and Chinese data and significant additional code and agent-trajectory data. The training corpus mix has not been published in detail. Post-training uses a recipe broadly similar to GLM-4.5: supervised fine-tuning, expert distillation across specialized reasoning, coding, and agent experts, and reinforcement learning aimed at long-horizon tool use and code execution ^[1]^[2]^[5].

The most distinctive aspect of GLM-5's training story is the hardware. Multiple outlets reported that the model was trained on a cluster of roughly 100,000 Huawei Ascend 910B processors, AI accelerators designed by Huawei's HiSilicon subsidiary and manufactured by SMIC on a 7 nanometer process, using Huawei's open-source MindSpore framework with Zhipu-developed optimizations layered on top. The technical report itself emphasizes that GLM-5 was "adapted to Chinese GPU ecosystems" across seven domestic chip platforms. Zhipu and several outside commentators emphasized that no NVIDIA GPUs were involved in any stage of training, which had not previously been demonstrated at this scale ^[1]^[3]^[11]^[21].

The Ascend 910B is a more constrained accelerator than NVIDIA's H100 or H200, with lower per-chip throughput and a less mature software ecosystem. Reaching frontier-tier benchmark performance on a domestic stack required substantial engineering investment in distributed training, fault tolerance, and operator-level kernel work. Coverage in Bloomberg and other financial press treated GLM-5 less as a benchmark winner and more as a proof point that United States export controls on advanced AI accelerators have not prevented China from producing competitive open-weight frontier models, although they have made it considerably more expensive and slower ^[3]^[7]^[12].

Zhipu has not disclosed total training compute, training duration, or the cost of the build, and the published technical report stops short of full reproduction details. The model is released as open weights but not as open data and not as open training code, which puts it in roughly the same disclosure tier as the GLM-4.5 and GLM-4.6 technical reports rather than at the level of the most reproducible academic releases ^[2]^[5].

How well does GLM-5 perform on benchmarks?

Zhipu reported GLM-5 results across a broad set of agentic, coding, reasoning, and knowledge benchmarks at launch, and several independent trackers including Artificial Analysis and LMArena followed within days. The table below collects the most widely cited public numbers and identifies the source for each. Benchmarks where Zhipu has not published a number are omitted rather than estimated. Several agentic scores depend on the harness and tooling used, so vendor-reported figures should be read as upper-bound, favorable-configuration results.

Benchmark	GLM-5	Notes
SWE-Bench Verified	77.8 percent	Vendor reported; leads open-weight models at launch ^[1]^[2]
SWE-Bench Multilingual	73.3 percent	Vendor reported ^[2]
Terminal-Bench 2.0	56.2 percent	Vendor reported (up to 60.7 to 61.1 with the strongest harness) ^[2]
AIME 2026 I	92.7 percent	Vendor reported ^[1]^[2]
HMMT November 2025	96.9 percent	Vendor reported ^[2]
GPQA Diamond	86.0 percent	Vendor reported ^[1]^[2]
Humanity's Last Exam (no tools)	30.5 percent	Vendor reported ^[2]
Humanity's Last Exam (with tools)	50.4 percent	Vendor reported; reported best in class at launch ^[2]^[3]
BrowseComp	62.0 (75.9 with context management)	Vendor reported ^[2]
BrowseComp-Zh	72.7	Vendor reported ^[2]
CyberGym	43.2	Vendor reported ^[2]
Artificial Analysis Intelligence Index	Top open-weight model at launch	Independent ^[5]
LMArena Text Arena	1452 (rank 11 overall, rank 1 open weights)	Independent ^[4]

The most repeated comparison in launch coverage placed GLM-5 within a few points of GPT-5.2 and Claude Opus 4.5 on SWE-Bench Verified and AIME 2026, while clearly ahead on Humanity's Last Exam with tools. Coverage was more mixed on areas requiring tooling that GLM-5 does not natively support: the base model is text only and does not handle images, so vision and multimodal benchmarks went to later family members rather than to the flagship ^[3]^[14]^[15].

Independent evaluation tended to be slightly more cautious than Zhipu's own framing. Artificial Analysis placed GLM-5 at the top of the open-weight pack on its Intelligence Index at launch, with strong scores on tool-use and coding components and somewhat weaker scores on pure knowledge and math sub-benchmarks. Several reviewers also noted that the situational awareness of the base GLM-5, meaning its ability to track its own progress and recover from errors during very long coding sessions, was lower than Claude Opus 4.5's at launch; that gap was closed by the later GLM-5.1 post-training update ^[5]^[15].

Is GLM-5 open source, and how can it be used?

GLM-5 weights are released under the MIT license, the same permissive license used for GLM-4.5 and GLM-4.6. The license allows unrestricted commercial use, fine-tuning, redistribution, and derivative works with no royalty obligation and no attribution requirement beyond preserving the license text. Weights are hosted on Hugging Face under the zai-org organization in both BF16 and FP8 variants and mirrored on ModelScope and on the Z.ai GitHub organization ^[2]^[9]^[10].

Hosted access goes through the chat.z.ai consumer product, the Z.ai API, and several third-party providers including OpenRouter. Zhipu raised list pricing across its commercial tiers by roughly 30 percent at the GLM-5 launch, the first significant price increase by a major Chinese LLM provider in 2026 and a reversal of the price-war pattern that had dominated the previous two years. Zhipu framed the increase as a move from share-grab pricing to sustainable margins after the IPO ^[18]^[19].

Endpoint	Input ($/M tokens)	Output ($/M tokens)	Notes
Z.ai API direct (GLM-5)	About 1.00	About 3.20	Standard tier; peak-hour multiplier applies ^[16]^[18]
OpenRouter GLM-5	About 0.80	About 2.56	OpenRouter pass-through at launch ^[3]

The launch pricing put GLM-5 at roughly five to eight times less expensive per output token than Claude Opus 4.5 or GPT-5.2 on a comparable workload, though the precise ratio depends heavily on whether prompt caching, batch discounts, and peak-hour multipliers are applied. Several reviewers described the combination of open weights and roughly Claude-Opus-class coding scores at much lower API rates as the more important commercial story for GLM-5, separate from the geopolitical training-hardware angle ^[3]^[16]^[20].

The deployment footprint is the main practical limit. The 744 billion parameter model at BF16 needs on the order of 1.5 TB of storage and well over a terabyte of accelerator memory to run unquantized, which puts the unquantized model out of reach for almost everyone outside large data centers; even the FP8 variant requires an eight-accelerator node. In practice most outside users reach GLM-5 through hosted APIs rather than self-hosting the weights, and no smaller GLM-5-Air variant was released alongside the flagship as of mid-2026 ^[3]^[10]^[20].

How does GLM-5 compare to GLM-4.6 and to GLM-5.1 / GLM-5.2?

Relative to GLM-4.6, GLM-5 roughly doubles total parameters, replaces the GLM-4 attention design with MLA plus DeepSeek Sparse Attention, increases pre-training data to 28.5 trillion tokens, and reframes the flagship as an agentic engineering system rather than a general chat model. GLM-5 was then followed by two post-trained successors that kept the same family architecture but extended its agentic and long-context behavior ^[2]^[15].

Zhipu also shipped narrower derivatives reported by secondary sources, including a coding-tuned GLM-5-Turbo and a vision-and-agent GLM-5V-Turbo, before the numbered point releases became the canonical successors. The most consequential follow-ups are the two numbered releases below ^[13]^[14].

Release	Released	Total parameters	What changed
GLM-5	February 11, 2026	744B (about 40B active)	Generalist flagship, text only, 200K context ^[2]^[3]
GLM-5.1	Spring 2026	About 754B	Post-trained update of GLM-5 that "sustains optimization over hundreds of rounds and thousands of tool calls" for long agent sessions; SWE-Bench Pro 58.4, CyberGym 68.7 ^[15]
GLM-5.2	June 2026	About 753B	1M token context via the IndexShare technique; effort-level control (High and Max); SWE-Bench Pro 62.1 ^[23]^[24]

GLM-5.1 keeps the GLM-5 base weights as its starting point and layers a new post-training recipe on top that emphasizes much longer agent rollouts. On SWE-Bench Pro, a harder variant of the SWE-Bench coding evaluation, the GLM-5.1 model card reports 58.4 percent (versus 55.1 percent for GLM-5) and describes it as state of the art on that benchmark among the models compared. GLM-5.1 also reports 68.7 on the CyberGym offensive-security benchmark, nearly 20 points ahead of GLM-5, and 63.5 on Terminal-Bench 2.0 ^[15]^[17].

GLM-5.2, released in June 2026, extends the context window to 1 million tokens using a technique Zhipu calls IndexShare, which "reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9x at a 1M context length." GLM-5.2 adds explicit effort-level control to balance capability against latency and compute, and reports SWE-Bench Pro 62.1 (up from GLM-5.1's 58.4), Terminal-Bench 2.1 around 81 to 83 depending on harness, GPQA Diamond 91.2, and AIME 2026 99.2. The model card describes the release under "an MIT open-source license, no regional limits, technical access without borders" ^[23]^[24].

Reception

GLM-5 was received as a substantial release, both as a model and as an industrial proof point. Coverage in mainstream financial press focused on the training-hardware story. Bloomberg, CNBC, and Reuters framed the release as a frontier-tier large language model trained without American-designed accelerators, and noted that the timing, weeks after Zhipu's Hong Kong IPO, made the launch read as a public deliverable rather than just a research milestone. Z.ai's listed shares climbed about 34 percent in the days after the GLM-5 announcement ^[3]^[6]^[7]^[12].

Reception inside the open-source community was warmer than for GLM-4.6. The MIT license, the size of the model, the genuinely competitive SWE-Bench numbers, and the open availability of FP8 weights together made GLM-5 a default new open-weight reference for serious coding-agent work. Several reviewers noted that the headline benchmarks reflect the most favorable Zhipu-defined harness configurations and that real-world agent setups, particularly those involving complex multi-tool workflows or non-coding domains, still trailed Claude Opus and GPT-5 in independent testing. The base GLM-5 is also strictly text only, a meaningful limitation against the multimodal frontier ^[3]^[14]^[20].

The most consistent criticism in third-party coverage was deployment ergonomics: the 744 billion parameter footprint puts the unquantized model out of reach for almost everyone outside large data centers, and even the FP8 variant requires an eight-accelerator node, which complicates the open-source framing somewhat in practice ^[3]^[10]^[20].

For competitors, GLM-5 reset expectations for what an open-weight model from a Chinese lab could be. It shipped at frontier-tier benchmark performance, was distributed under one of the most permissive open licenses available, came with credible long-horizon agent post-training in GLM-5.1 and GLM-5.2, and demonstrated that the entire training pipeline could be moved off NVIDIA hardware without dropping to a noticeably lower tier of capability ^[3]^[7]^[15].

ELI5

GLM-5 is a very large, free-to-use AI "brain" made by a Chinese company called Zhipu AI (which it sells as Z.ai). It is good at writing computer code and doing tasks step by step on its own, like a tireless coding assistant. It has 744 billion "settings" inside, but to keep things fast it only switches on a small group of them (about 40 billion) for any one piece of text. The big surprise was how it was built: instead of the American NVIDIA chips most top AI models use, this one was reportedly trained entirely on Chinese Huawei chips, which a lot of people took as a sign that China can build top-tier AI on its own hardware. Newer versions, GLM-5.1 and GLM-5.2, can stay focused for much longer jobs and read up to a million words of text at once.

References

Z.ai. "GLM-5: From Vibe Coding to Agentic Engineering." GitHub repository, zai-org/GLM-5, February 2026. https://github.com/zai-org/GLM-5 ↩
zai-org. "GLM-5 model card." Hugging Face. https://huggingface.co/zai-org/GLM-5 ↩
Maxime Labonne. "GLM-5: China's First Public AI Company Ships a Frontier Model." Hugging Face Blog, February 2026. https://huggingface.co/blog/mlabonne/glm-5 ↩
Digital Applied. "GLM-5 Released: 744B MoE Model vs GPT-5.2 and Claude 4.5." February 2026. https://www.digitalapplied.com/blog/zhipu-ai-glm-5-release-744b-moe-model-analysis ↩
llm-stats. "GLM-5: Zhipu AI's Agentic Engineering Breakthrough." https://llm-stats.com/blog/research/glm-5-launch ↩
CNBC. "The first of China's AI tigers goes public as Zhipu climbs in Hong Kong debut." January 8, 2026. https://www.cnbc.com/2026/01/08/china-ai-tiger-goes-ipo-zhipu-hong-kong-debut-openai-knowledge-atlas-hsi-hang-seng-listing.html ↩
Bloomberg. "China's Zhipu AI Debuts in Hong Kong After $558 Million IPO." January 7, 2026. https://www.bloomberg.com/news/articles/2026-01-07/china-s-openai-rival-zhipu-debuts-in-hk-after-558-million-ipo ↩
Trending Topics. "Zhipu AI: World's best open source AI celebrates successful IPO in Hong Kong." January 2026. https://www.trendingtopics.eu/zhipu-ai-ipo-hongkong/ ↩
NxCode. "GLM-5 Complete Guide: China's 744B Open-Source Model That Rivals GPT-5.2 (2026)." https://www.nxcode.io/resources/news/glm-5-open-source-744b-model-complete-guide-2026 ↩
Lushbinary. "GLM-5 Developer Guide: 744B Open-Weight Model on Huawei Chips." February 2026. https://lushbinary.com/blog/glm-5-developer-guide-zhipu-ai-huawei-ascend-open-weight/ ↩
Let's Data Science. "How China's GLM-5 Works: 744B Model on Huawei Chips." February 2026. https://letsdatascience.com/blog/china-trained-frontier-ai-model-glm-5-without-nvidia ↩
Winbuzzer. "Zhipu AI Releases GLM-5: 744B Model Rivals Claude Opus." February 12, 2026. https://winbuzzer.com/2026/02/12/zhipu-ai-glm-5-744b-model-rivals-claude-opus-z-ai-platform-xcxwbn/ ↩
Rommark. "GLM-5 and GLM-5-Turbo: Zhipu's New Coding Models Take on Claude Opus 4.5." March 2026. https://www.rommark.dev/blog/pages/glm-5-turbo-coding-plan-review.html ↩
WaveSpeed. "GLM-5V-Turbo: What Developers Should Know in 2026." April 2026. https://wavespeed.ai/blog/posts/glm-5v-turbo-developers-2026/ ↩
zai-org. "GLM-5.1 model card." Hugging Face. https://huggingface.co/zai-org/GLM-5.1 ↩
Awesome Agents. "GLM-5: China's 744B Open-Source Frontier Model." https://awesomeagents.ai/models/glm-5/ ↩
Awesome Agents. "GLM-5.1 Tops SWE-Bench Pro With Zero NVIDIA Hardware." April 2026. https://awesomeagents.ai/news/glm-5-1-swe-bench-pro-huawei-chips/ ↩
Creati.ai. "Zhipu AI Launches GLM-5 Model with 30% Price Increase in First 2026 LLM Hike." February 16, 2026. https://creati.ai/ai-news/2026-02-16/zhipu-ai-launches-glm-5-model-30-percent-price-increase/ ↩
Techloy. "China's Zhipu AI Launches GLM-5 with 30 Percent Price Increase as Stock Jumps 34 Percent." February 2026. https://www.techloy.com/chinas-zhipu-ai-launches-glm-5-with-30-price-increase-as-stock-jumps-34/ ↩
The Neuron. "China's GLM-5 Rivals Claude and GPT-5 Without US Chips." February 2026. https://www.theneuron.ai/explainer-articles/chinas-glm-5-trained-without-single-american-chip-heres-why-that-matters/ ↩
Zhipu AI / Z.ai. "GLM-5: from Vibe Coding to Agentic Engineering." Technical report, arXiv:2602.15763, February 2026. https://arxiv.org/html/2602.15763v1 ↩
Caixin Global. "China's Zhipu AI Jumps in Hong Kong Debut." January 8, 2026. https://www.caixinglobal.com/2026-01-08/chinas-zhipu-ai-jumps-in-hong-kong-debut-102401610.html ↩
zai-org. "GLM-5.2 model card." Hugging Face. https://huggingface.co/zai-org/GLM-5.2 ↩
zai-org. "GLM-5.2: Built for Long-Horizon Tasks." Hugging Face Blog, June 2026. https://huggingface.co/blog/zai-org/glm-52-blog ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Best Open-Source LLMs DeepSeek vs Llama vs Qwen Factory (AI company)GLM-4-Voice GLM-4.5 GLM-4.6 GLM-5.1 GLM-5.2 LLM Size and Parameter Comparison Linkup Tang Jie Zhang Peng (Zhipu AI)Zhipu AI

What is GLM-5?

Background

What is the architecture of GLM-5?

Model configuration summary

How was GLM-5 trained?

How well does GLM-5 perform on benchmarks?

Is GLM-5 open source, and how can it be used?

How does GLM-5 compare to GLM-4.6 and to GLM-5.1 / GLM-5.2?

Reception

ELI5

See also

References

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here