GLM-5
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,252 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,252 words
Add missing citations, update stale details, or suggest a clearer explanation.
GLM-5 is an open-weight flagship large language model released by Zhipu AI under its international brand Z.ai on February 11, 2026. The model is a 744 billion parameter sparse Mixture of Experts (MoE) transformer with roughly 40 billion parameters active per token, distributed under the MIT License on Hugging Face. It was trained on a cluster of approximately 100,000 Huawei Ascend 910B processors using Huawei's MindSpore framework rather than NVIDIA GPUs, which made it the first claimed frontier-class language model produced without any American-made AI accelerators [1][2][3].
GLM-5 succeeds GLM-4.6 (September 2025) and a short-lived intermediate GLM-4.7 release from late 2025. It roughly doubles the total parameter count of GLM-4.5 while leaving the active parameter count only modestly higher, a design choice that pushes Zhipu's flagship further into the very-large-sparse-MoE space pioneered by DeepSeek V3 and Kimi K2. The model uses DeepSeek Sparse Attention (DSA) for efficient long-context processing, supports a 200,000 token context window, and is positioned by Zhipu as a generalist agentic engineering model with particular focus on long-horizon coding and tool use [2][4][5].
The release attracted unusual attention because of its political context. Zhipu had become the first Chinese large language model company to list publicly only a month earlier, on the Hong Kong Stock Exchange, and GLM-5 was the first flagship release after that listing. Its training entirely on domestically produced Huawei chips was widely covered as evidence that United States export controls on advanced AI accelerators had not prevented China from producing a frontier-tier model. Zhipu's Hong Kong listed shares rose by as much as 34 percent in the days after the release [3][6][7].
Zhipu AI was founded in 2019 as a spinoff of the Knowledge Engineering Group at Tsinghua University and built up its GLM line over several years, from the bilingual open-source GLM-130B in 2022 through ChatGLM-6B, ChatGLM2-6B, GLM-4, and the open-weight GLM-4.5 family in mid-2025. In 2025 the company adopted Z.ai as its international consumer brand and shifted its flagship line back to permissive open-weight releases, beginning with GLM-4.5 in July 2025 and GLM-4.6 in September 2025 [4][8].
The company's public profile changed sharply in early 2026. On January 8, 2026, Z.ai listed on the Hong Kong Stock Exchange under ticker 02513.HK, becoming the first of the Chinese AI tigers to go public. The offering priced at HKD 116.20 per share, opened at HKD 120.00, raised about HKD 4.35 billion (roughly USD 559 million), and gave the company an initial market capitalization near HKD 52.8 billion. The Hong Kong retail tranche was oversubscribed by more than 1,100 times, which was extreme even by Hong Kong tech IPO standards [6][8].
GLM-5 was previewed during the IPO roadshow as the company's next flagship and shipped roughly five weeks after the listing. The release was timed against a busy first quarter that also saw new closed-model releases from OpenAI and Anthropic and the first wave of Chinese frontier-tier models trained on domestic accelerators rather than NVIDIA hardware. Zhipu framed GLM-5 less as a benchmark winner and more as a usable open agentic system, particularly for long-running coding workflows where a model needs to keep itself on task across thousands of steps [1][2].
GLM-5 is a decoder-only transformer with a sparse Mixture of Experts feed-forward layer at every block. Of the 744 billion total parameters, roughly 40 billion are activated per token, giving a sparsity ratio of about 18 to 1. Each MoE layer holds 256 routed experts plus a smaller set of shared experts that process every token. Routing chooses 8 of the 256 routed experts for each token, with the shared experts contributing on top. The 80-layer stack produces an active-parameter footprint comparable to a dense 40B model while keeping total capacity in the high triple-digit billions [2][5][9].
The biggest architectural change relative to GLM-4.6 is the attention layer. GLM-5 replaces the partial Rotary Position Embedding plus Grouped-Query Attention design used in the GLM-4 family with DeepSeek Sparse Attention (DSA), the same family of sparse attention introduced by DeepSeek in late 2025. DSA runs a lightweight indexer over the key-value cache, selects a small subset of tokens that are most relevant to each query, and then performs the heavy attention computation only over that subset. The reported result is roughly linear, rather than quadratic, scaling of attention cost with context length, which makes the 200,000 token context window economically practical to serve [2][4][5].
Zhipu also reports a small set of inference-time engineering choices that matter for serving the model. The base weights ship in BF16 with an official FP8 quantized variant that fits a single 8 GPU H200 (or H20) inference node. Both variants are published on Hugging Face. The chat template, tool calling schema, and OpenAI-compatible API are designed to be drop-in compatible with the GLM-4.5 and GLM-4.6 inference stacks, including vLLM and SGLang, so existing GLM deployments could update with relatively little integration work [2][9][10].
| Specification | Value |
|---|---|
| Total parameters | 744 billion [2][5] |
| Active parameters per token | About 40 billion [2][5] |
| Layers | 80 [9] |
| Routed experts per MoE layer | 256, with 8 activated per token [2][9] |
| Shared experts | Yes, plus the routed experts [9] |
| Attention | DeepSeek Sparse Attention (DSA) [2][4] |
| Context window | 200,000 tokens [2][5] |
| Maximum output | 128,000 tokens [2][10] |
| Native precision | BF16, with an official FP8 variant [9] |
| License | MIT [2][9] |
Zhipu reports that GLM-5 was pre-trained on 28.5 trillion tokens, up from the 23 trillion tokens used for GLM-4.5, with continued emphasis on bilingual English and Chinese data and significant additional code and agent trajectory data. The training corpus mix has not been published in detail. Post-training uses a three stage recipe broadly similar to GLM-4.5: supervised fine-tuning, expert distillation across specialized reasoning, coding, and agent experts, and reinforcement learning aimed at long-horizon tool use and code execution [1][2][5].
The most distinctive aspect of GLM-5's training story is the hardware. The model was trained on roughly 100,000 Huawei Ascend 910B processors, AI accelerators designed by Huawei's HiSilicon subsidiary and manufactured by SMIC at a 7 nanometer process. The training framework was MindSpore, Huawei's open-source deep learning framework, with a set of Zhipu-developed optimizations layered on top, including multi-level pipelined deployment, custom fusion kernels for Ascend, and an updated version of Zhipu's open-source SGLang-native post-training runtime called slime that adds Active Partial Rollouts (APRIL) for asynchronous reinforcement learning over long trajectories. Zhipu and several outside commentators emphasized that no NVIDIA GPUs were involved in any stage of training, which had not previously been demonstrated at the frontier scale [1][3][11].
The Ascend 910B is a more constrained accelerator than NVIDIA's H100 or H200, with lower per-chip throughput and a much less mature software ecosystem. Reaching frontier-tier benchmark performance on a fully domestic stack required a substantial engineering investment in distributed training, fault tolerance, and operator level kernel work. Outside coverage in Bloomberg and other financial press treated GLM-5 less as a benchmark winner and more as a proof point that United States export controls on advanced AI accelerators have not prevented China from producing competitive open-weight frontier models, although they have made it considerably more expensive and slower [3][7][12].
Zhipu has not disclosed total training compute, training duration, or the cost of the build, and the published technical report stops short of providing full reproduction details. The model is released as open weights but not as open data and not as open training code, which puts it in roughly the same disclosure tier as the GLM-4.5 and GLM-4.6 technical reports rather than at the level of the most reproducible academic releases [2][5].
GLM-5 was followed by a series of related models over the spring of 2026 that together form a small family. The base GLM-5 ships as the generalist text model. A coding focused GLM-5-Turbo follows in March, a vision and agent focused GLM-5V-Turbo follows in early April, and a post-trained update labeled GLM-5.1 ships shortly after. Total parameter count and broad architecture remain consistent across the family, with variants differing primarily in post-training and head configuration [13][14][15].
| Variant | Released | Total parameters | Notes |
|---|---|---|---|
| GLM-5 | February 11, 2026 | 744B (about 40B active) | Generalist flagship, text only [2][3] |
| GLM-5-Turbo | March 16, 2026 | Same backbone | Tuned for tool use and agent loops, lower tool-call error rates [13] |
| GLM-5V-Turbo | April 1, 2026 | Same backbone, new visual encoder | Native multimodal (image, video, text) with the CogViT visual encoder; targeted at design-to-code and GUI agents [14] |
| GLM-5.1 | April 7, 2026 | 744B (about 40B active) | Post-trained update of GLM-5 with much improved long-horizon agent execution; reportedly able to run autonomous workflows of up to about 1,700 steps over roughly eight hours, versus around 20 step horizons for the base GLM-5 [15] |
GLM-5.1 is the most consequential of the follow-ups. It keeps the GLM-5 base weights as its starting point and layers a new post-training recipe on top that emphasizes much longer agent rollouts. On the SWE-Bench Pro benchmark, an extended and harder variant of the SWE-Bench coding evaluation, GLM-5.1 reaches 58.4 percent, narrowly ahead of GPT-5.4 at 57.7 percent and Claude Opus 4.6 at 57.3 percent according to the Z.ai release page. That result, taken with the original GLM-5's SWE-Bench Verified score of 77.8 percent, is what produced most of the headlines describing GLM-5.1 as the highest scoring open-weight coding model in the world [15][16].
Zhipu reported GLM-5 results across a broad set of agentic, coding, reasoning, and knowledge benchmarks at launch, and several independent trackers including Artificial Analysis and LMArena followed within days. The table below collects the most widely cited public numbers for GLM-5 and identifies the source for each. Benchmarks where Zhipu has not published a number are omitted rather than estimated.
| Benchmark | GLM-5 | Notes |
|---|---|---|
| SWE-Bench Verified | 77.8 percent | Vendor reported; leads open-weight models, trails Claude Opus 4.5 at 80.9 [1][16] |
| SWE-Bench Multilingual | 73.3 percent | Vendor reported [16] |
| Terminal-Bench 2.0 | 56.2 percent | Vendor reported [5][16] |
| AIME 2026 I | 92.7 percent | Vendor reported [1][5] |
| GPQA Diamond | 86.0 percent | Vendor reported [1][5] |
| Humanity's Last Exam (with tools) | 50.4 percent | Vendor reported; reported as best in class at launch [3][16] |
| BrowseComp | 75.9 | Vendor reported; reported best in class among open-weight models [16] |
| MCP-Atlas | 67.8 | Vendor reported [16] |
| Tau squared Bench | 89.7 | Vendor reported [16] |
| Vending Bench 2 (USD) | 4,432 | Vendor reported; agent simulation benchmark [16] |
| CC-Bench V2 frontend build | 98 percent | Vendor reported, internal harness [4] |
| CC-Bench V2 end-to-end correctness | 74.8 percent | Vendor reported [4] |
| Artificial Analysis Intelligence Index | 50 | Independent [5] |
| LMArena Text Arena | 1452 (rank 11 overall, rank 1 open weights) | Independent [4] |
The most repeated comparison in launch coverage placed GLM-5 within a few points of GPT-5.2 and Claude Opus 4.5 on SWE-Bench Verified and AIME 2026, while clearly ahead of both on Humanity's Last Exam with tools and on the BrowseComp web research benchmark. Coverage was more mixed on areas where the comparison required tooling that GLM-5 does not natively support. The base model is text only and does not handle images, so vision and multimodal benchmarks went to the later GLM-5V-Turbo variant rather than to the flagship [3][14][16].
GLM-5.1 picked up a separate set of benchmark wins about two months later. On SWE-Bench Pro, the harder variant of SWE-Bench used in 2026 to differentiate frontier-tier coding models, Zhipu reports 58.4 percent for GLM-5.1, edging out GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on the same evaluation. On CyberGym, a long-horizon offensive security and toolchain benchmark, GLM-5.1 scores 68.7, nearly 20 points ahead of GLM-5. Those wins came against closed proprietary models in a category, SWE-Bench Pro, where no other open-weight model had previously held the top slot [15][17].
Independent evaluation has tended to be slightly more cautious than Zhipu's own framing. Artificial Analysis put GLM-5 around the middle of the frontier pack on its Intelligence Index at launch, with strong scores on tool-use and coding components and somewhat weaker scores on pure knowledge and math reasoning sub-benchmarks. Several reviewers also noted that the situational awareness of the base GLM-5, meaning its ability to track its own progress and recover from errors during very long coding sessions, was lower than Claude Opus 4.5's at launch, although GLM-5.1 closed much of that gap [5][15].
GLM-5 weights are released under the MIT license, the same permissive license used for GLM-4.5 and GLM-4.6. The license allows unrestricted commercial use, fine-tuning, redistribution, and derivative works with no royalty obligation and no attribution requirement beyond preserving the license text. Weights are hosted on Hugging Face under the zai-org organization in both BF16 and FP8 variants and mirrored on ModelScope and on the Z.ai GitHub organization [2][9][10].
Hosted access goes through the chat.z.ai consumer product, the Z.ai API, and several third party providers including OpenRouter. Zhipu raised list pricing across its commercial tiers by roughly 30 to 60 percent at the GLM-5 launch, the first significant price increase by a Chinese LLM provider in 2026 and a sharp reversal of the price war pattern that had dominated the previous two years. Subscription tiers, API tokens, and enterprise dedicated nodes were all affected, with subscription prices rising from CNY 20 per month to CNY 26 to 32 per month and API output token prices rising from CNY 60 to CNY 100 to 120 per million tokens. Zhipu framed the increase as a move from share-grab pricing to sustainable margins after the IPO [18][19].
| Endpoint | Input ($/M tokens) | Output ($/M tokens) | Notes |
|---|---|---|---|
| Z.ai API direct (GLM-5) | About 1.00 | About 3.20 | Standard tier; peak hour multiplier applies [16][18] |
| OpenRouter GLM-5 | About 0.80 | About 2.56 | OpenRouter pass-through at launch [3] |
| Z.ai API direct (GLM-5.1) | 1.40 | 4.40 | Cached input at 0.26; 3x multiplier during 14:00 to 18:00 Beijing time [15] |
| Z.ai API direct (GLM-5V-Turbo) | 1.20 | 4.00 | Vision-language model; cached input at 0.24 [14] |
For reference, the launch pricing put GLM-5 at roughly five to eight times less expensive per output token than Claude Opus 4.5 or GPT-5.2 on a comparable workload, although the precise ratio depends heavily on whether prompt caching, batch discounts, and peak hour multipliers are applied. Several reviewers described the combination of open weights and roughly Claude-Opus-class coding scores at much lower API rates as the more important commercial story for GLM-5, separate from the geopolitical training-hardware angle [3][16][20].
GLM-5 was received as a substantial release, both as a model and as an industrial proof point. Coverage in mainstream financial press focused on the training hardware story. Bloomberg, CNBC, and Reuters all framed the release as the first frontier-tier large language model trained without any American-designed accelerators, and noted that the timing, weeks after Zhipu's Hong Kong IPO, made the launch read as a public deliverable rather than just a research milestone. Z.ai's listed shares climbed about 34 percent in the days after the GLM-5 announcement and continued to outperform the broader Hang Seng technology index through the spring [3][6][7][12].
Reception inside the open-source community was warmer than for GLM-4.6. The MIT license, the size of the model, the genuinely competitive SWE-Bench numbers, and the open availability of FP8 weights together made GLM-5 the default new open-weight reference for serious coding agent work. Several reviewers noted that the headline benchmarks reflect the most favorable Zhipu-defined harness configurations and that real-world agent setups, particularly those involving complex multi-tool workflows or non-coding domains, still trail Claude Opus and GPT-5 in independent testing. The base GLM-5 is also strictly text only, which is a meaningful limitation against the multimodal frontier; that gap was closed only later by GLM-5V-Turbo [3][14][20].
The most consistent criticism in third party coverage was about deployment ergonomics. The 744 billion parameter footprint at BF16 needs roughly 1.5 TB of storage and around 1,500 GB of accelerator memory to run unquantized, which puts the unquantized model out of reach for almost everyone outside large data centers. Even the FP8 variant requires an eight-accelerator node. In practice most outside users hit GLM-5 through hosted APIs rather than self-host the weights, which complicates the open-source framing somewhat. Several reviewers contrasted this with the smaller Air-style variants in the GLM-4.5 family, and noted that no equivalent smaller GLM-5-Air had been released as of mid-2026 [3][10][20].
For competitors, GLM-5 reset the expectations for what an open-weight model from a Chinese lab could be. It shipped at frontier-tier benchmark performance, was distributed under one of the most permissive open licenses available, came with credible long-horizon agent post training in GLM-5.1, and demonstrated that the entire training pipeline could be moved off NVIDIA hardware without dropping to a noticeably lower tier of capability. That combination of features had not previously been seen in any single release [3][7][15].