# GLM-5.1

> Source: https://aiwiki.ai/wiki/glm_5_1
> Updated: 2026-06-02
> Categories: Chinese AI, Large Language Models, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**GLM-5.1** is an open-weight [large language model](/wiki/large_language_model) developed by the Chinese AI company [Zhipu AI](/wiki/zhipu_ai), which markets its products internationally under the brand [Z.ai](/wiki/z_ai). Released on April 7, 2026, with weights published the following day, it is the company's flagship model for agentic engineering and the direct successor to [GLM-5](/wiki/glm_5).[1][2][3] GLM-5.1 is a [mixture-of-experts](/wiki/mixture_of_experts) (MoE) model with roughly 754 billion total parameters and about 40 billion active per token, distributed on [Hugging Face](/wiki/hugging_face) under the permissive [MIT License](/wiki/mit_license).[2][4] Its headline feature is long-horizon autonomy: Z.ai says the model can work continuously on a single coding task for up to eight hours, refining its own strategy across hundreds of iterations rather than answering in one shot.[1][5]

## Overview

GLM-5.1 is a text-in, text-out model positioned squarely at software engineering and tool-driven agent work.[1][6] Z.ai describes it as a refinement of the GLM-5 series tuned for "long-horizon" execution, the ability to keep making forward progress on a hard problem long after a typical model has, in the company's words, run out of ideas.[5] Where earlier releases were built for minute-scale interactions, GLM-5.1 is designed to plan, run experiments, read the results, find the blockers, and try again over many rounds without a human stepping in.[1][5]

The model reaches state-of-the-art results among open models on several agentic and coding benchmarks, and Z.ai claims it edges past closed frontier systems including [GPT-5.1-Codex-Max](/wiki/gpt_5_1_codex_max)'s lineage, [Claude Opus 4.6](/wiki/claude_opus_4_6), and [Gemini 3 Pro](/wiki/gemini_3_pro) on the [SWE-Bench Pro](/wiki/swe_bench_pro) software-engineering benchmark.[1][3][7] It ships with a 200,000-token context window and supports up to 128,000 output tokens in a single response.[6][8]

## Developer: Zhipu AI / Z.ai and the GLM line

Zhipu AI is a Beijing-based AI company that spun out of [Tsinghua University](/wiki/tsinghua_university) and develops the GLM (General Language Model) family.[3][9] The company uses the consumer and developer brand Z.ai for its international platform and API, while operating the BigModel.cn platform in China.[3][7] Zhipu has been on the United States Entity List since January 2025, which restricts its access to American-made AI accelerators, a constraint that shaped how the GLM-5 series was trained.[10][9]

The GLM line had already moved through several generations before GLM-5.1. The wiki documents [GLM-4.5](/wiki/glm_4_5) and [GLM-4.6](/wiki/glm_4_6) from the 4.x series, the multimodal [GLM-4-Voice](/wiki/glm_4_voice), and the GLM-5 base model. GLM-5.1 is not a separate architecture from GLM-5 so much as the next iteration of it: a model in the same MoE family, retrained and post-trained to push much harder on coding, tool use, and sustained autonomous execution.[1][5][2]

## Release

Z.ai announced GLM-5.1 on April 7, 2026, and open-sourced the weights on Hugging Face on April 8, 2026, alongside an API rollout.[1][2][11] Regional and English-language press carried the launch over April 7 through April 9, 2026, with most outlets dating the announcement to April 7 or 8.[3][7][12] The release was framed by the company as its "most intelligent flagship model to date" and, in Z.ai's own marketing, the strongest open-source model then available.[12][7]

The launch coincided with a price increase for Zhipu's cloud API. The company raised pricing on model calls by roughly 8 to 17 percent relative to its earlier GLM-5 Turbo offering, a move several outlets noted ran against the broader industry trend of falling token prices.[11][12] Coverage tied the pricing and release to Zhipu's business position: the company had reported a widening loss for 2025 and had raised prices repeatedly through early 2026, and its shares moved sharply on launch-week disclosures.[11]

## Variants and sizes

At launch GLM-5.1 was distributed as a single flagship model rather than a family of differently sized checkpoints. The base weights are published in BF16, and a large ecosystem of quantized builds (for runtimes such as llama.cpp, Ollama, and LM Studio) appeared on Hugging Face shortly after release.[2][4]

| Variant | Description | Source |
|---|---|---|
| GLM-5.1 (base weights) | Full BF16 weights, ~754B total parameters, MIT-licensed | [2][4] |
| Quantized community builds | Numerous GGUF and other quantizations for local runtimes (llama.cpp, Ollama, LM Studio, Jan) | [2][4] |
| API model `glm-5.1` | Hosted model served via the Z.ai / BigModel API and third-party clouds | [6][7] |

Z.ai also exposed the model through its GLM Coding Plan tiers (Lite, Pro, and Max) and made it compatible with agentic coding clients including Claude Code, Cline, and other OpenAI-compatible tools.[1][13]

## Architecture and training

GLM-5.1 uses the architecture Z.ai labels `glm_moe_dsa`, a mixture-of-experts design combined with a sparse-attention mechanism the company derived from research on efficient long-context attention.[2][8] Independent technical summaries and the published model configuration describe a network of roughly 754 billion total parameters with about 40 billion active per token, built from 256 routed experts plus one shared expert (with on the order of eight routed experts plus the shared expert active per token) spread across roughly 78 layers.[8][14] The attention stack mixes a linear-attention component with conventional attention, which is what keeps the 200,000-token context window tractable.[8][14] These configuration-level details come from the model card and third-party analyses rather than a Z.ai datasheet, so the exact expert-routing numbers should be read as the disclosed configuration rather than a fully audited specification.[8][14]

The GLM-5 series is notable for the hardware it ran on. Z.ai disclosed that the GLM-5 line, including GLM-5.1, was trained without [Nvidia](/wiki/nvidia) accelerators, using a large cluster of [Huawei](/wiki/huawei_ai) Ascend 910-series chips together with Huawei's MindSpore framework.[10][15] Reporting on the cluster put it among the largest non-Nvidia training deployments disclosed to date, which several outlets framed as evidence that competitive frontier models can be built outside the Nvidia ecosystem under export-control pressure.[10][15] For context, the GLM-5 base model scaled the family from 355 billion parameters (32 billion active) up to 744 billion (40 billion active) and was trained on about 28.5 trillion tokens of pre-training data, integrating a sparse-attention mechanism for the first time.[16] GLM-5.1 carries that lineage forward with a post-training and [reinforcement learning](/wiki/reinforcement_learning) recipe aimed at long-horizon agency, using what Z.ai describes as asynchronous RL infrastructure.[1][5]

## Capabilities

The defining capability of GLM-5.1 is sustained autonomous execution. Rather than producing a one-shot answer, the model runs an experiment, analyzes, and optimizes loop, repeatedly reviewing its own strategy, recognizing dead ends, and trying new approaches.[1][5] Z.ai reports demonstrations in which the model worked for more than eight hours on a single task, including building a complete Linux desktop environment from scratch through hundreds of autonomous plan-execute-analyze-optimize iterations in one uninterrupted session.[1][5] In other demonstrations it iterated on performance tuning over hundreds of rounds and thousands of tool calls, for example pushing a CUDA kernel from a roughly 2.6x to a far larger speedup and improving a vector-database workload through hundreds of iterations.[5][6]

On coding and [agentic AI](/wiki/agentic_ai) workflows, GLM-5.1 is optimized for clients such as Claude Code and supports function calling, structured tool use, and OpenAI-SDK-compatible access.[6][7] Beyond software engineering, the model posts strong scores on general reasoning, mathematics, and web-browsing agent benchmarks, indicating it is a broadly capable system rather than a coding-only specialist.[2][4]

## Benchmarks

The figures below are drawn from Z.ai's published model card and release materials together with independent coverage. Scores are self-reported by Z.ai unless otherwise noted, and benchmark results for any model should be read with that caveat.[2][4][1]

| Benchmark | GLM-5.1 | Notes | Source |
|---|---|---|---|
| SWE-Bench Pro | 58.4 | Reported as #1 globally at release, ahead of GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) | [1][3][7] |
| SWE-bench Verified | 77.8 | Competitive with leading closed models, a few points behind Claude Opus 4.6 | [3][16] |
| Terminal-Bench 2.0 | 63.5 (69.0 with Claude Code scaffolding) | Real-world terminal tasks | [2][4] |
| NL2Repo | 42.7 | Repository generation | [2][4] |
| CyberGym | 68.7 | Up sharply from GLM-5's 48.3 | [2][4] |
| BrowseComp | 68.0 | Web-browsing agent benchmark | [2][4] |
| GPQA Diamond | 86.2 | Graduate-level science questions | [2][4][5] |
| AIME 2026 | 95.3 | Competition mathematics | [2][4] |
| τ³-Bench | 70.6 | Tool-use agent benchmark | [4] |

Z.ai summarized the coding results by saying that across the combined average of SWE-Bench Pro, Terminal-Bench, and NL2Repo, GLM-5.1 ranked third globally, first among Chinese models, and first among open-source models.[12][2] Reviewers measuring it against Claude Opus 4.6 reported coding performance in the low-to-mid-90s percent range relative to that model, with Z.ai's CEO stating that coding capability "approaches" Claude Opus 4.6.[11][17]

## License and availability

| Aspect | Detail | Source |
|---|---|---|
| License | MIT License (commercial use allowed, no royalty) | [2][4] |
| Weights | Published on Hugging Face at `zai-org/GLM-5.1` (BF16) | [2] |
| Context window | 200,000 tokens | [6][8] |
| Max output | 128,000 tokens | [6][8] |
| API access | Z.ai (`api.z.ai`) and BigModel.cn; OpenAI-SDK compatible | [6][7] |
| API model ID | `glm-5.1` | [6] |
| Coding plans | GLM Coding Plan Lite / Pro / Max tiers | [1][13] |
| Client compatibility | Claude Code, Cline, OpenClaw, and OpenAI-compatible coding tools | [1][7] |
| Third-party hosting | Offered through clouds such as Together AI and others | [18] |

The permissive MIT license, rather than a custom community license, was repeatedly highlighted as a draw: it allows download, modification, fine-tuning, and commercial deployment without usage restrictions or fees, subject to the practical reality that self-hosting a 754B model requires substantial hardware.[2][17]

## Reception

Coverage of GLM-5.1 was broadly positive about its open-weight, low-cost challenge to closed frontier labs. Outlets framed it as the first open-source model to lead the SWE-Bench Pro leaderboard, and several described it as the strongest open-weight model available at the time of release.[1][3][7] Technical writers emphasized two themes in particular: the long-horizon autonomy, which they treated as a meaningful step beyond minute-scale chat interactions, and the training story, since the model reached frontier-adjacent results on Huawei Ascend hardware with no Nvidia chips, an outcome read as significant given U.S. export controls.[5][10][15]

Constellation Research and others noted that GLM-5.1 topped public leaderboards on Hugging Face and chat arenas around launch, while some reviewers tempered the enthusiasm by stressing that the most striking claims came from Z.ai's own benchmarks and demonstrations.[3][1] The simultaneous price increase drew comment as an unusual move in a market trending toward cheaper tokens, and was generally read against Zhipu's financial position rather than as a sign of weakness in the model.[11][12]

## Limitations

Several limitations and caveats apply. Many of the most dramatic capability claims, the eight-hour autonomous runs and the leaderboard-topping benchmark scores, are self-reported by Z.ai or shown in company demonstrations, and independent reproduction of the long-horizon results at the scale Z.ai describes was limited at launch.[1][5] Detailed architecture figures such as the exact expert-routing configuration come from the model card and third-party analyses rather than a comprehensive Z.ai technical datasheet, so specific internal numbers should be treated as the disclosed configuration rather than independently audited facts.[8][14]

Practical access is constrained by scale: at roughly 754 billion parameters the full model is out of reach for typical local hardware, so most users rely on the hosted API or quantized community builds, and the MIT license does not change those hardware requirements.[2][17] The model is text-only, without the image or audio input found in some rival systems, and its knowledge and benchmark coverage reflect an early-2026 snapshot.[6][2] Finally, as with any frontier coding agent, sustained autonomous operation raises the usual concerns about unsupervised tool use, which the long-horizon framing makes more salient rather than less.[1][5]

## References

1. MarkTechPost, "Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution," https://www.marktechpost.com/2026/04/08/z-ai-introduces-glm-5-1-an-open-weight-754b-agentic-model-that-achieves-sota-on-swe-bench-pro-and-sustains-8-hour-autonomous-execution/
2. Hugging Face, "zai-org/GLM-5.1" model card, https://huggingface.co/zai-org/GLM-5.1
3. Constellation Research, "Z.ai ups ante in open-source LLMs with GLM-5.1," https://www.constellationr.com/insights/news/zai-ups-ante-open-source-llms-glm-51
4. LLM-Stats, "GLM-5.1 Benchmarks, Pricing & Context Window," https://llm-stats.com/models/glm-5.1
5. The Decoder, "Zhipu AI's GLM-5.1 can rethink its own coding strategy across hundreds of iterations," https://the-decoder.com/zhipu-ais-glm-5-1-can-rethink-its-own-coding-strategy-across-hundreds-of-iterations/
6. Z.ai Developer Documentation, "GLM-5.1 - Overview," https://docs.z.ai/guides/llm/glm-5.1
7. TestingCatalog, "Zhipu AI launches open-source GLM-5.1 model for coding tasks," https://www.testingcatalog.com/zhipu-ai-launches-open-source-glm-5-1-model-for-coding-tasks/
8. APXML, "GLM-5.1: Specifications and GPU VRAM Requirements," https://apxml.com/models/glm-51
9. AIMadeTools, "What is Z.ai (Zhipu)? The Lab Behind GLM-5.1," https://www.aimadetools.com/blog/what-is-z-ai-zhipu/
10. Awesome Agents, "GLM-5.1 Tops SWE-Bench Pro With Zero NVIDIA Hardware," https://awesomeagents.ai/news/glm-5-1-swe-bench-pro-huawei-chips/
11. Yahoo Finance / Bloomberg, "Zhipu Raises AI Model Prices 8%-17% As GLM-5.1 Launches," https://finance.yahoo.com/sectors/technology/articles/zhipu-raises-ai-model-prices-131652529.html
12. Futunn News, "The release of Zhipu GLM-5.1: A 10% price increase against market trends, open-source models surpass closed-source counterparts," https://news.futunn.com/en/post/71197414/the-release-of-zhipu-glm-5-1-a-10-price
13. Atlas Cloud Blog, "LLM GLM-5.1 is coming soon to Atlas Cloud: Opus 4.6-Level Coding with 8-Hour Autonomous Execution," https://www.atlascloud.ai/blog/ai-updates/llm-glm-5-1-is-coming-soon-to-atlas-cloud-opus-4-6-level-coding-with-8-hour-autonomous-execution
14. Spheron Blog, "Deploy GLM-5.1 on GPU Cloud: Self-Host the 754B MoE Model (2026 Guide)," https://www.spheron.network/blog/deploy-glm-5-1-gpu-cloud/
15. Let's Data Science, "How China's GLM-5 Works: 744B Model on Huawei Chips," https://letsdatascience.com/blog/china-trained-frontier-ai-model-glm-5-without-nvidia
16. Z.ai Developer Documentation, "GLM-5 - Overview," https://docs.z.ai/guides/llm/glm-5
17. MindStudio, "GLM 5.1: The Open-Source Model That Matches GPT and Claude on Coding," https://www.mindstudio.ai/blog/glm-5-1-open-source-coding-model
18. Together AI, "GLM-5.1 API," https://www.together.ai/models/glm-51