GLM-4.6

AI Models Chinese AI Large Language Models Mixture of Experts Open Source AI

15 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

17 citations

Revision

v2 · 3,021 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

GLM-4.6 is a flagship open-weight large language model released by Zhipu AI under its international brand Z.ai on September 30, 2025, built on a sparse Mixture of Experts (MoE) architecture with roughly 357 billion total parameters and about 32 billion active per token. As the successor to GLM-4.5, it expands the context window from 128,000 to 200,000 tokens, with a maximum output of 128,000 tokens, and is published under the MIT License on Hugging Face and ModelScope ^[1]^[2]^[3]. Its single most cited result: on Zhipu's CC-Bench coding harness, GLM-4.6 wins 48.6 percent of head-to-head tasks against Claude Sonnet 4 while using roughly 15 percent fewer tokens than GLM-4.5, the closest an open-weight Chinese model had come to a leading Western coding model at the time ^[1]^[2]^[4].

GLM-4.6 is positioned by Zhipu as a frontier-class general purpose model with particular strength in real-world coding, long-context processing, reasoning, search, writing, and agentic tool use. On CC-Bench, an extended human-graded coding harness run as 74 tasks in isolated Docker containers, the model records a 48.6 percent win rate, a 9.5 percent tie rate, and a 41.9 percent loss rate against Claude Sonnet 4. It still trails Claude Sonnet 4.5 on the hardest coding evaluations, which Zhipu's own release post acknowledges ^[1]^[2]^[4]^[16]. The model is available through Z.ai's API, OpenRouter, Together AI, and a growing list of inference partners, and it ships with integrations for Claude Code, Cline, Roo Code, Kilo Code, and OpenCode out of the box ^[3]^[5].

What is GLM-4.6 used for?

Zhipu describes GLM-4.6 as a general purpose model tuned for developer and agent workloads rather than a narrow chat assistant. The release post lists five priority capabilities: real-world coding inside agentic harnesses, long-context processing up to 200K tokens, reasoning with optional tool use, search-based agents, and natural-sounding writing. In Zhipu's own words, GLM-4.6 "better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios," and shows "stronger performance in tool using and search-based agents" relative to GLM-4.5 ^[1]^[6]. In practice the model is most often deployed as the engine behind coding agents such as Claude Code, Cline, Roo Code, and Kilo Code, and for long-document drafting and analysis that benefits from the expanded context window ^[1]^[3]^[5].

Background

The GLM series was started by Zhipu AI, a Beijing-based research lab that spun out of the Knowledge Engineering Group at Tsinghua University in 2019. Early GLM models used a General Language Model objective that combined autoregressive and span-corruption pretraining, and the team published the open-source ChatGLM-6B chatbot in March 2023, which became one of the most widely downloaded Chinese language models on Hugging Face during the year. GLM-4 followed in January 2024 as a closed proprietary model, and the GLM-4.5 family in July 2025 returned the line to open weights with a 355B-A32B MoE flagship and a lighter 106B-A12B "Air" variant ^[3]^[6].

GLM-4.6 was announced under the Z.ai brand, which Zhipu adopted in mid-2025 for its international product line. The release was timed against a busy week in late September that also saw the launch of DeepSeek V3.2 and the steady ramp of Qwen3-Max, and Zhipu framed GLM-4.6 as the strongest open-weight Chinese coding model on the market at the time. The team explicitly compared itself to Anthropic and to the rest of the domestic Chinese pack, calling the model competitive with "leading domestic and international models such as DeepSeek-V3.1-Terminus and Claude Sonnet 4" ^[1]^[6]^[7]. The release also coincided with broader investor attention on Zhipu, which had raised substantial funding from Saudi Arabian and other strategic backers earlier in 2025 and was preparing the Hong Kong listing that it completed on January 8, 2026, becoming the first Chinese large language model company to go public ^[16].

Architecture

GLM-4.6 keeps the broad architectural template of GLM-4.5 but tunes several details and pushes context length to a new ceiling. The model is a decoder-only transformer with a sparse MoE feed-forward layer at every block. Of the 357 billion total parameters, only about 32 billion are activated per token, giving it roughly an 11 to 1 sparsity ratio. That choice mirrors the design space staked out by DeepSeek V3 and Qwen3-235B-A22B and reflects the field's growing consensus that very large MoE models with modest active parameter counts are the most cost-efficient way to push toward frontier quality on commodity inference hardware ^[1]^[8]^[9].

The attention stack uses Grouped-Query Attention with 96 query heads, which keeps the inference memory footprint manageable for long contexts. Position information is encoded with a partial Rotary Position Embedding scheme, and attention logits are stabilized with QK-Norm. Routing inside the MoE layers uses loss-free balance routing with sigmoid gates, an approach designed to encourage roughly uniform expert utilization without introducing the explicit auxiliary balancing loss that earlier MoE designs depended on. Zhipu reports BF16 and F32 tensor types as native, and the published weights on Hugging Face are distributed in BF16 ^[1]^[8].

Context window and output budget

The headline architectural change relative to GLM-4.5 is the expansion of the context window from 128K tokens to 200K tokens. The model can still emit up to 128,000 output tokens in a single response, which makes it usable for long-form code generation, end-to-end document drafting, and multi-turn agentic workflows that need to keep large amounts of state in context. Zhipu reports that the longer window was achieved primarily through changes to position embeddings and continued pre-training on long sequence data, rather than through purely inference-time tricks like YaRN extrapolation ^[1]^[2]^[5].

Tool use during reasoning

A notable behavioral change is that GLM-4.6 is trained to invoke tools during its internal reasoning trace rather than only after producing a final chain of thought. In practice that means an agent built on GLM-4.6 can interleave search queries, code execution, and file operations with reasoning steps without breaking out of the model loop. Zhipu credits this design with much of the model's improvement on agentic and search-based benchmarks, and the same pattern shows up in third-party reviews of how the model behaves inside coding harnesses like Claude Code and Kilo Code ^[1]^[4]^[10].

Training

Zhipu has not published a full standalone technical report for GLM-4.6. The model card on Hugging Face references the GLM-4.5 technical report on arXiv (2508.06471) as the primary architectural and training source, and notes that GLM-4.6 inherits most of the GLM-4.5 recipe with continued pre-training on additional code, agent, and long-context data ^[1]^[8]. Public material describes three high level stages: pre-training on a large multilingual corpus with heavy emphasis on English and Chinese, a long-context continued pre-training phase that extended the effective window to 200,000 tokens, and a post-training phase combining supervised fine-tuning with reinforcement learning aimed at coding, tool use, and reasoning quality ^[1]^[8].

Beyond those high-level claims, Zhipu has not disclosed the size of the GLM-4.6 training corpus, the exact mix of data sources, or the reinforcement learning algorithm used in post training. The release blog focuses instead on outcome measures, in particular the CC-Bench win rate against Claude Sonnet 4 and the 15 percent token efficiency improvement over GLM-4.5 on multi-turn coding tasks. The model is open weight under the MIT license, but it is not open data, and the training corpus is not published ^[1]^[2]^[4].

How does GLM-4.6 differ from GLM-4.5?

GLM-4.6 is a refresh rather than a new architecture. The headline improvements are concentrated in long context, coding, agentic tool use, and writing quality. Zhipu reports that on CC-Bench the newer model completes the same tasks with about 15 percent fewer tokens than GLM-4.5, averaging roughly 651,000 tokens per task trajectory, a 14.6 percent reduction on the expanded test set ^[1]^[4]^[16]. The table below summarizes the most-cited differences between the two generations as reported in Zhipu's own materials and in independent reviews.

Area	GLM-4.5	GLM-4.6	Notes
Total parameters	355B	357B	Slight increase, same broad architecture ^[1]^[8]
Active parameters	~32B	~32B	Sparsity ratio unchanged ^[8]^[9]
Context window	128K tokens	200K tokens	About 56 percent longer input window ^[1]^[2]
Maximum output	96K tokens	128K tokens	Allows longer single-turn generations ^[2]^[5]
CC-Bench vs Claude Sonnet 4	Lower win rate	48.6 percent win rate	Near parity in extended human-graded coding tests ^[1]^[4]
Token efficiency on CC-Bench	Baseline	About 15 percent fewer tokens	~651K tokens per trajectory, a 14.6 percent reduction ^[1]^[4]^[16]
LiveCodeBench v6	63.3 percent	82.8 percent	Large jump on competitive coding benchmark ^[10]
Tool use inside reasoning	Limited	Native	Tools can be called mid-reasoning ^[1]^[4]
Writing alignment	GLM-4.5 baseline	Better human preference scores	"More naturally in role-playing scenarios" per Zhipu ^[1]^[6]

The model is otherwise drop-in compatible with GLM-4.5 deployments. Inference servers like vLLM and SGLang added day-one support for GLM-4.6, and the chat template, tokenizer, and tool-calling schema remain compatible with the GLM-4.5 specification ^[1]^[3].

Benchmark performance

Zhipu reported GLM-4.6 results across eight public benchmarks at launch, covering math, science, coding, agentic reasoning, and general knowledge. Independent measurement by Artificial Analysis and other third party trackers followed within days. The table below collects the most widely cited numbers and identifies which are vendor reported and which are independent. Benchmarks where Zhipu has not published a number are omitted rather than estimated.

Benchmark	Score	Source
AIME 2025 (math, standard)	93.9 percent	Vendor reported ^[1]^[10]
AIME 2025 (with tools enabled)	98.6 percent	Vendor reported ^[10]
GPQA Diamond	81.0 percent	Vendor reported ^[3]^[10]
GPQA (with tools)	82.9 percent	Vendor reported ^[10]
LiveCodeBench v6	82.8 percent	Vendor reported ^[3]^[10]
SWE-bench Verified	68.0 percent	Vendor reported ^[10]
Humanity's Last Exam (with tools)	30.4 percent	Vendor reported ^[10]
CC-Bench win rate vs Claude Sonnet 4	48.6 percent	Vendor reported ^[1]^[4]
CC-Bench win rate vs DeepSeek V3.1-Terminus	64.9 percent	Vendor reported ^[10]
Artificial Analysis Intelligence Index	33	Independent ^[11]
Output speed (Artificial Analysis)	42.2 tokens per second	Independent ^[11]
Time to first token (Artificial Analysis)	1.16 seconds	Independent ^[11]

The LiveCodeBench v6 result is a roughly 19 point jump over GLM-4.5, which scored 63.3 on the same benchmark. The SWE-bench Verified score of 68.0 is competitive with frontier open-weight models from the same generation but trails Claude Sonnet 4.5, which Anthropic reported in the high 70s on the same benchmark. AIME 2025 at 93.9 percent without tools, and 98.6 percent with a code interpreter, is comparable to top reasoning-focused models in the period. Humanity's Last Exam at 30.4 percent with tools is well below frontier reasoning models, which is consistent with GLM-4.6 being a non reasoning flagship rather than a dedicated chain of thought specialist ^[3]^[10].

Artificial Analysis placed GLM-4.6 at 33 on its Intelligence Index in October 2025, slightly above the median of 30 for comparable open weight models. The same harness measured output speed at 42.2 tokens per second, which the analysts described as below average for its class, and time to first token at 1.16 seconds, which they called very competitive. Total output volume across the full Intelligence Index run was about 57 million tokens, on the verbose end of the range ^[11].

Is GLM-4.6 open source, and how is it accessed?

The weights are released under the MIT license, which permits commercial use, fine-tuning, redistribution, and derivative works without royalties. Zhipu publishes the safetensors files on Hugging Face under the zai-org/GLM-4.6 repository and mirrors them on ModelScope. Local deployment is supported through vLLM, SGLang, and the standard Hugging Face Transformers library, and the recommended inference settings are a temperature of 1.0 for general evaluation and top_p 0.95 with top_k 40 for code-focused workloads ^[1]^[3].

Commercial access goes through the Z.ai API at https://api.z.ai/api/paas/v4/chat/completions, which uses an OpenAI-compatible request schema and accepts an explicit thinking parameter that can be set to enabled or disabled per request. The same endpoint is mirrored by external gateways. OpenRouter lists GLM-4.6 at $0.43 per million input tokens and $1.74 per million output tokens, while Together AI lists it at $0.60 input and $2.20 output. Artificial Analysis reports a blended price of about $0.96 per million tokens at a 3 to 1 input output ratio, which they describe as slightly expensive relative to comparable open weight peers but cheaper than most closed Western alternatives ^[5]^[11]^[12].

Provider	Input ($/M tokens)	Output ($/M tokens)	Context
Z.ai direct API	Tiered (varies by plan)	Tiered (varies by plan)	200K
OpenRouter	0.43	1.74	203K
Together AI	0.60	2.20	200K
Artificial Analysis blended	0.96 (3:1 mix)	n/a	200K

Zhipu also offers a coding-focused subscription product called GLM Coding Plan that bundles GLM-4.6 access with coding tool integrations at a fixed monthly rate, aimed at developers who use the model inside Claude Code, Cline, or similar harnesses for everyday work ^[1]^[5].

Reception and comparisons

GLM-4.6 was received as a strong incremental release rather than a step change. The CC-Bench result against Claude Sonnet 4 attracted the most attention, since reaching a 48.6 percent win rate on a Zhipu-defined harness is close to parity with one of the leading Western coding models of the same period. Several reviewers noted the same harness still puts the model behind Claude Sonnet 4.5, which Zhipu itself acknowledged in the release post ^[1]^[4]^[7]. Inside the open weight community, the MIT license drew positive attention because it imposes fewer restrictions than the licenses used by some other large Chinese releases.

The table below compares GLM-4.6 with three of its closest 2025 peers based on the most widely cited publicly reported numbers. Where a benchmark was not reported by a given vendor, the cell is left blank rather than filled with an estimate.

Feature	GLM-4.6	DeepSeek V3.1	Qwen3-Max-Instruct	Kimi K2
Total parameters	357B	671B	1T+	1T
Active parameters	~32B	~37B	Not disclosed	32B
Context window	200K	128K	262K	128K
Open weights	Yes (MIT)	Yes	No (closed)	Yes
LiveCodeBench v6	82.8	Comparable	57-75 reported range	Lower reported
SWE-bench Verified	68.0	Comparable	69.6	Lower reported
Released	September 30, 2025	August 2025	September 24, 2025	July 2025
Primary distribution	Z.ai API, Hugging Face	DeepSeek API, Hugging Face	Alibaba Model Studio API	Moonshot API, Hugging Face

The most consistent praise for GLM-4.6 in third party reviews concerns its front-end coding output, which several reviewers said produces more visually polished web pages than its open weight peers, and its agentic behavior inside coding harnesses, where the mid-reasoning tool use design is described as producing fewer broken tool calls and tighter end to end traces. The most consistent criticism is that pure reasoning benchmarks like Humanity's Last Exam still leave a clear gap between GLM-4.6 and the strongest closed reasoning models, and that output speed is below average for the open weight tier ^[4]^[7]^[10]^[11].

What came after GLM-4.6?

Zhipu continued to iterate on the line rapidly over the following months. On December 23, 2025, the company open-sourced GLM-4.7 under the Z.ai brand, positioned as the direct successor built for real development workflows ^[13]^[17]. That was followed by GLM-5, a roughly 744B-parameter MoE with about 40B active parameters released on February 12, 2026, and by GLM-5.1, open-sourced on April 8, 2026, with the line later extending the context window to one million tokens. Between the GLM-4.6 and GLM-5 releases, Zhipu (trading as Z.ai) listed on the Hong Kong Stock Exchange on January 8, 2026, in an offering that raised about 559 million US dollars and made it the first Chinese large language model company to go public ^[16]. GLM-4.6 nonetheless remains an important reference point as the model that first established 200K context on the open-weight Chinese frontier and that made coding parity with Claude Sonnet 4 a credible claim for an open weight model.

References

Zhipu AI (Z.ai). "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities." Z.ai blog, September 30, 2025. https://z.ai/blog/glm-4.6 ↩
Z.ai Developer Documentation. "GLM-4.6 Overview." https://docs.z.ai/guides/llm/glm-4.6 ↩
zai-org. "GLM-4.6 model card." Hugging Face. https://huggingface.co/zai-org/GLM-4.6 ↩
Asif Razzaq. "Zhipu AI Releases GLM-4.6: Achieving Enhancements in Real-World Coding, Long-Context Processing, Reasoning, Searching and Agentic AI." MarkTechPost, September 30, 2025. https://www.marktechpost.com/2025/09/30/zhipu-ai-releases-glm-4-6-achieving-enhancements-in-real-world-coding-long-context-processing-reasoning-searching-and-agentic-ai/ ↩
OpenRouter. "GLM 4.6: API Pricing and Benchmarks." https://openrouter.ai/z-ai/glm-4.6 ↩
zai-org. "GLM-4.5 / GLM-4.6 / GLM-4.7 README." GitHub. https://github.com/zai-org/GLM-4.5/blob/main/README.md ↩
CodeGPT. "Zhipu GLM 4.6: The Open-Source Frontier AI Model Guide." CodeGPT blog. https://www.codegpt.co/blog/zhipu-glm-4-6-open-source-ai ↩
Zhipu AI. "GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models." arXiv:2508.06471. https://arxiv.org/abs/2508.06471 ↩
APXML. "GLM-4.6: Specifications and GPU VRAM Requirements." https://apxml.com/models/glm-46 ↩
Barnacle Goose. "GLM-4.6 Review: Zhipu AI's Latest Leap." Medium, October 2025. https://medium.com/@leucopsis/glm-4-6-review-0600e9425c73 ↩
Artificial Analysis. "GLM-4.6 (Reasoning): Intelligence, Performance and Price Analysis." https://artificialanalysis.ai/models/glm-4-6-reasoning ↩
Together AI. "GLM-4.6 API." https://www.together.ai/models/glm-4-6 ↩
Z.ai. "Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows." BusinessWire, December 23, 2025. https://www.businesswire.com/news/home/20251223393714/en/Z.ai-Open-Sources-GLM-4.7-a-New-Generation-Large-Language-Model-Built-for-Real-Development-Workflows ↩
Cirra. "GLM-4.6 vs Claude Sonnet: A Performance and Cost Analysis." https://cirra.ai/articles/glm-4-6-vs-claude-sonnet-comparison
llm-stats. "GLM-4.6 Benchmarks, Pricing and Context Window." https://llm-stats.com/models/glm-4.6
Implicator.ai. "GLM-4.6: Open Weights, Real Coding Tests, Cheaper Tokens." October 2025. https://www.implicator.ai/glm-4-6-puts-receipts-on-the-table-open-weights-real-coding-runs-cheaper-tokens/ ↩
Barnacle Goose. "A Technical Analysis of GLM-4.7." Medium, December 2025. https://medium.com/@leucopsis/a-technical-analysis-of-glm-4-7-db7fcc54210a ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)ChatGLM Doubao Seed 1.6 GLM-4 GLM-4-Voice GLM-4.5 GLM-5 GLM-5.1 Generalized Linear Model LLM Size and Parameter Comparison Mixture of Experts (MoE)Open-Weight LLM License Comparison Zhipu AI

What is GLM-4.6 used for?

Background

Architecture

Context window and output budget

Tool use during reasoning

Training

How does GLM-4.6 differ from GLM-4.5?

Benchmark performance

Is GLM-4.6 open source, and how is it accessed?

Reception and comparisons

What came after GLM-4.6?

See also

References

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here