# GLM-4.6

> Source: https://aiwiki.ai/wiki/glm_4_6
> Updated: 2026-06-24
> Categories: AI Models, Chinese AI, Large Language Models, Mixture of Experts, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**GLM-4.6** is a flagship open-weight [large language model](/wiki/large_language_model) released by [Zhipu AI](/wiki/zhipu_ai) under its international brand [Z.ai](/wiki/z_ai) on September 30, 2025, built on a sparse [Mixture of Experts](/wiki/mixture_of_experts) (MoE) architecture with roughly 357 billion total parameters and about 32 billion active per token. As the successor to [GLM-4.5](/wiki/glm_4_5), it expands the context window from 128,000 to 200,000 tokens, with a maximum output of 128,000 tokens, and is published under the [MIT License](/wiki/mit_license) on Hugging Face and ModelScope [1][2][3]. Its single most cited result: on Zhipu's CC-Bench coding harness, GLM-4.6 wins 48.6 percent of head-to-head tasks against [Claude Sonnet 4](/wiki/claude_sonnet_4) while using roughly 15 percent fewer tokens than GLM-4.5, the closest an open-weight Chinese model had come to a leading Western coding model at the time [1][2][4].

GLM-4.6 is positioned by Zhipu as a frontier-class general purpose model with particular strength in real-world coding, long-context processing, reasoning, search, writing, and agentic tool use. On CC-Bench, an extended human-graded coding harness run as 74 tasks in isolated Docker containers, the model records a 48.6 percent win rate, a 9.5 percent tie rate, and a 41.9 percent loss rate against Claude Sonnet 4. It still trails [Claude Sonnet 4.5](/wiki/claude_sonnet_4_5) on the hardest coding evaluations, which Zhipu's own release post acknowledges [1][2][4][16]. The model is available through Z.ai's API, OpenRouter, Together AI, and a growing list of inference partners, and it ships with integrations for [Claude Code](/wiki/claude_code), Cline, Roo Code, Kilo Code, and OpenCode out of the box [3][5].

## What is GLM-4.6 used for?

Zhipu describes GLM-4.6 as a general purpose model tuned for developer and agent workloads rather than a narrow chat assistant. The release post lists five priority capabilities: real-world coding inside agentic harnesses, long-context processing up to 200K tokens, reasoning with optional tool use, search-based agents, and natural-sounding writing. In Zhipu's own words, GLM-4.6 "better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios," and shows "stronger performance in tool using and search-based agents" relative to GLM-4.5 [1][6]. In practice the model is most often deployed as the engine behind coding agents such as Claude Code, Cline, Roo Code, and Kilo Code, and for long-document drafting and analysis that benefits from the expanded context window [1][3][5].

## Background

The GLM series was started by Zhipu AI, a Beijing-based research lab that spun out of the Knowledge Engineering Group at [Tsinghua University](/wiki/tsinghua_university) in 2019. Early GLM models used a General Language Model objective that combined autoregressive and span-corruption pretraining, and the team published the open-source ChatGLM-6B chatbot in March 2023, which became one of the most widely downloaded Chinese language models on Hugging Face during the year. GLM-4 followed in January 2024 as a closed proprietary model, and the GLM-4.5 family in July 2025 returned the line to open weights with a 355B-A32B MoE flagship and a lighter 106B-A12B "Air" variant [3][6].

GLM-4.6 was announced under the Z.ai brand, which Zhipu adopted in mid-2025 for its international product line. The release was timed against a busy week in late September that also saw the launch of [DeepSeek V3.2](/wiki/deepseek_v3_2) and the steady ramp of [Qwen3-Max](/wiki/qwen3_max), and Zhipu framed GLM-4.6 as the strongest open-weight Chinese coding model on the market at the time. The team explicitly compared itself to Anthropic and to the rest of the domestic Chinese pack, calling the model competitive with "leading domestic and international models such as DeepSeek-V3.1-Terminus and Claude Sonnet 4" [1][6][7]. The release also coincided with broader investor attention on Zhipu, which had raised substantial funding from Saudi Arabian and other strategic backers earlier in 2025 and was preparing the Hong Kong listing that it completed on January 8, 2026, becoming the first Chinese large language model company to go public [16].

## Architecture

GLM-4.6 keeps the broad architectural template of GLM-4.5 but tunes several details and pushes context length to a new ceiling. The model is a decoder-only transformer with a sparse MoE feed-forward layer at every block. Of the 357 billion total parameters, only about 32 billion are activated per token, giving it roughly an 11 to 1 sparsity ratio. That choice mirrors the design space staked out by [DeepSeek V3](/wiki/deepseek_v3) and Qwen3-235B-A22B and reflects the field's growing consensus that very large MoE models with modest active parameter counts are the most cost-efficient way to push toward frontier quality on commodity inference hardware [1][8][9].

The attention stack uses [Grouped-Query Attention](/wiki/grouped_query_attention) with 96 query heads, which keeps the inference memory footprint manageable for long contexts. Position information is encoded with a partial [Rotary Position Embedding](/wiki/rotary_position_embedding) scheme, and attention logits are stabilized with QK-Norm. Routing inside the MoE layers uses loss-free balance routing with sigmoid gates, an approach designed to encourage roughly uniform expert utilization without introducing the explicit auxiliary balancing loss that earlier MoE designs depended on. Zhipu reports BF16 and F32 tensor types as native, and the published weights on Hugging Face are distributed in BF16 [1][8].

### Context window and output budget

The headline architectural change relative to GLM-4.5 is the expansion of the context window from 128K tokens to 200K tokens. The model can still emit up to 128,000 output tokens in a single response, which makes it usable for long-form code generation, end-to-end document drafting, and multi-turn agentic workflows that need to keep large amounts of state in context. Zhipu reports that the longer window was achieved primarily through changes to position embeddings and continued pre-training on long sequence data, rather than through purely inference-time tricks like YaRN extrapolation [1][2][5].

### Tool use during reasoning

A notable behavioral change is that GLM-4.6 is trained to invoke tools during its internal reasoning trace rather than only after producing a final chain of thought. In practice that means an agent built on GLM-4.6 can interleave search queries, code execution, and file operations with reasoning steps without breaking out of the model loop. Zhipu credits this design with much of the model's improvement on agentic and search-based benchmarks, and the same pattern shows up in third-party reviews of how the model behaves inside coding harnesses like Claude Code and Kilo Code [1][4][10].

## Training

Zhipu has not published a full standalone technical report for GLM-4.6. The model card on Hugging Face references the GLM-4.5 technical report on arXiv (2508.06471) as the primary architectural and training source, and notes that GLM-4.6 inherits most of the GLM-4.5 recipe with continued pre-training on additional code, agent, and long-context data [1][8]. Public material describes three high level stages: pre-training on a large multilingual corpus with heavy emphasis on English and Chinese, a long-context continued pre-training phase that extended the effective window to 200,000 tokens, and a post-training phase combining supervised fine-tuning with reinforcement learning aimed at coding, tool use, and reasoning quality [1][8].

Beyond those high-level claims, Zhipu has not disclosed the size of the GLM-4.6 training corpus, the exact mix of data sources, or the reinforcement learning algorithm used in post training. The release blog focuses instead on outcome measures, in particular the CC-Bench win rate against Claude Sonnet 4 and the 15 percent token efficiency improvement over GLM-4.5 on multi-turn coding tasks. The model is open weight under the MIT license, but it is not open data, and the training corpus is not published [1][2][4].

## How does GLM-4.6 differ from GLM-4.5?

GLM-4.6 is a refresh rather than a new architecture. The headline improvements are concentrated in long context, coding, agentic tool use, and writing quality. Zhipu reports that on CC-Bench the newer model completes the same tasks with about 15 percent fewer tokens than GLM-4.5, averaging roughly 651,000 tokens per task trajectory, a 14.6 percent reduction on the expanded test set [1][4][16]. The table below summarizes the most-cited differences between the two generations as reported in Zhipu's own materials and in independent reviews.

| Area | GLM-4.5 | GLM-4.6 | Notes |
|---|---|---|---|
| Total parameters | 355B | 357B | Slight increase, same broad architecture [1][8] |
| Active parameters | ~32B | ~32B | Sparsity ratio unchanged [8][9] |
| Context window | 128K tokens | 200K tokens | About 56 percent longer input window [1][2] |
| Maximum output | 96K tokens | 128K tokens | Allows longer single-turn generations [2][5] |
| CC-Bench vs Claude Sonnet 4 | Lower win rate | 48.6 percent win rate | Near parity in extended human-graded coding tests [1][4] |
| Token efficiency on CC-Bench | Baseline | About 15 percent fewer tokens | ~651K tokens per trajectory, a 14.6 percent reduction [1][4][16] |
| LiveCodeBench v6 | 63.3 percent | 82.8 percent | Large jump on competitive coding benchmark [10] |
| Tool use inside reasoning | Limited | Native | Tools can be called mid-reasoning [1][4] |
| Writing alignment | GLM-4.5 baseline | Better human preference scores | "More naturally in role-playing scenarios" per Zhipu [1][6] |

The model is otherwise drop-in compatible with GLM-4.5 deployments. Inference servers like [vLLM](/wiki/vllm) and [SGLang](/wiki/sglang) added day-one support for GLM-4.6, and the chat template, tokenizer, and tool-calling schema remain compatible with the GLM-4.5 specification [1][3].

## Benchmark performance

Zhipu reported GLM-4.6 results across eight public benchmarks at launch, covering math, science, coding, agentic reasoning, and general knowledge. Independent measurement by Artificial Analysis and other third party trackers followed within days. The table below collects the most widely cited numbers and identifies which are vendor reported and which are independent. Benchmarks where Zhipu has not published a number are omitted rather than estimated.

| Benchmark | Score | Source |
|---|---|---|
| [AIME](/wiki/aime) 2025 (math, standard) | 93.9 percent | Vendor reported [1][10] |
| AIME 2025 (with tools enabled) | 98.6 percent | Vendor reported [10] |
| [GPQA](/wiki/gpqa) Diamond | 81.0 percent | Vendor reported [3][10] |
| GPQA (with tools) | 82.9 percent | Vendor reported [10] |
| [LiveCodeBench](/wiki/livecodebench) v6 | 82.8 percent | Vendor reported [3][10] |
| [SWE-bench](/wiki/swe_bench) Verified | 68.0 percent | Vendor reported [10] |
| Humanity's Last Exam (with tools) | 30.4 percent | Vendor reported [10] |
| CC-Bench win rate vs Claude Sonnet 4 | 48.6 percent | Vendor reported [1][4] |
| CC-Bench win rate vs DeepSeek V3.1-Terminus | 64.9 percent | Vendor reported [10] |
| Artificial Analysis Intelligence Index | 33 | Independent [11] |
| Output speed (Artificial Analysis) | 42.2 tokens per second | Independent [11] |
| Time to first token (Artificial Analysis) | 1.16 seconds | Independent [11] |

The LiveCodeBench v6 result is a roughly 19 point jump over GLM-4.5, which scored 63.3 on the same benchmark. The SWE-bench Verified score of 68.0 is competitive with frontier open-weight models from the same generation but trails Claude Sonnet 4.5, which Anthropic reported in the high 70s on the same benchmark. AIME 2025 at 93.9 percent without tools, and 98.6 percent with a code interpreter, is comparable to top reasoning-focused models in the period. Humanity's Last Exam at 30.4 percent with tools is well below frontier reasoning models, which is consistent with GLM-4.6 being a non reasoning flagship rather than a dedicated chain of thought specialist [3][10].

Artificial Analysis placed GLM-4.6 at 33 on its Intelligence Index in October 2025, slightly above the median of 30 for comparable open weight models. The same harness measured output speed at 42.2 tokens per second, which the analysts described as below average for its class, and time to first token at 1.16 seconds, which they called very competitive. Total output volume across the full Intelligence Index run was about 57 million tokens, on the verbose end of the range [11].

## Is GLM-4.6 open source, and how is it accessed?

The weights are released under the MIT license, which permits commercial use, fine-tuning, redistribution, and derivative works without royalties. Zhipu publishes the safetensors files on Hugging Face under the `zai-org/GLM-4.6` repository and mirrors them on ModelScope. Local deployment is supported through vLLM, SGLang, and the standard Hugging Face Transformers library, and the recommended inference settings are a temperature of 1.0 for general evaluation and `top_p` 0.95 with `top_k` 40 for code-focused workloads [1][3].

Commercial access goes through the Z.ai API at `https://api.z.ai/api/paas/v4/chat/completions`, which uses an OpenAI-compatible request schema and accepts an explicit `thinking` parameter that can be set to enabled or disabled per request. The same endpoint is mirrored by external gateways. [OpenRouter](/wiki/openrouter) lists GLM-4.6 at $0.43 per million input tokens and $1.74 per million output tokens, while [Together AI](/wiki/together_ai) lists it at $0.60 input and $2.20 output. [Artificial Analysis](/wiki/artificial_analysis) reports a blended price of about $0.96 per million tokens at a 3 to 1 input output ratio, which they describe as slightly expensive relative to comparable open weight peers but cheaper than most closed Western alternatives [5][11][12].

| Provider | Input ($/M tokens) | Output ($/M tokens) | Context |
|---|---|---|---|
| Z.ai direct API | Tiered (varies by plan) | Tiered (varies by plan) | 200K |
| OpenRouter | 0.43 | 1.74 | 203K |
| Together AI | 0.60 | 2.20 | 200K |
| Artificial Analysis blended | 0.96 (3:1 mix) | n/a | 200K |

Zhipu also offers a coding-focused subscription product called GLM Coding Plan that bundles GLM-4.6 access with coding tool integrations at a fixed monthly rate, aimed at developers who use the model inside Claude Code, Cline, or similar harnesses for everyday work [1][5].

## Reception and comparisons

GLM-4.6 was received as a strong incremental release rather than a step change. The CC-Bench result against Claude Sonnet 4 attracted the most attention, since reaching a 48.6 percent win rate on a Zhipu-defined harness is close to parity with one of the leading Western coding models of the same period. Several reviewers noted the same harness still puts the model behind Claude Sonnet 4.5, which Zhipu itself acknowledged in the release post [1][4][7]. Inside the open weight community, the MIT license drew positive attention because it imposes fewer restrictions than the licenses used by some other large Chinese releases.

The table below compares GLM-4.6 with three of its closest 2025 peers based on the most widely cited publicly reported numbers. Where a benchmark was not reported by a given vendor, the cell is left blank rather than filled with an estimate.

| Feature | GLM-4.6 | [DeepSeek V3.1](/wiki/deepseek_v3_1) | Qwen3-Max-Instruct | [Kimi K2](/wiki/kimi_k2) |
|---|---|---|---|---|
| Total parameters | 357B | 671B | 1T+ | 1T |
| Active parameters | ~32B | ~37B | Not disclosed | 32B |
| Context window | 200K | 128K | 262K | 128K |
| Open weights | Yes (MIT) | Yes | No (closed) | Yes |
| LiveCodeBench v6 | 82.8 | Comparable | 57-75 reported range | Lower reported |
| SWE-bench Verified | 68.0 | Comparable | 69.6 | Lower reported |
| Released | September 30, 2025 | August 2025 | September 24, 2025 | July 2025 |
| Primary distribution | Z.ai API, Hugging Face | DeepSeek API, Hugging Face | Alibaba Model Studio API | Moonshot API, Hugging Face |

The most consistent praise for GLM-4.6 in third party reviews concerns its front-end coding output, which several reviewers said produces more visually polished web pages than its open weight peers, and its agentic behavior inside coding harnesses, where the mid-reasoning tool use design is described as producing fewer broken tool calls and tighter end to end traces. The most consistent criticism is that pure reasoning benchmarks like Humanity's Last Exam still leave a clear gap between GLM-4.6 and the strongest closed reasoning models, and that output speed is below average for the open weight tier [4][7][10][11].

## What came after GLM-4.6?

Zhipu continued to iterate on the line rapidly over the following months. On December 23, 2025, the company open-sourced GLM-4.7 under the Z.ai brand, positioned as the direct successor built for real development workflows [13][17]. That was followed by [GLM-5](/wiki/glm_5), a roughly 744B-parameter MoE with about 40B active parameters released on February 12, 2026, and by [GLM-5.1](/wiki/glm_5_1), open-sourced on April 8, 2026, with the line later extending the context window to one million tokens. Between the GLM-4.6 and GLM-5 releases, Zhipu (trading as Z.ai) listed on the Hong Kong Stock Exchange on January 8, 2026, in an offering that raised about 559 million US dollars and made it the first Chinese large language model company to go public [16]. GLM-4.6 nonetheless remains an important reference point as the model that first established 200K context on the open-weight Chinese frontier and that made coding parity with Claude Sonnet 4 a credible claim for an open weight model.

## See also

- [GLM-4.5](/wiki/glm_4_5)
- [GLM-5](/wiki/glm_5)
- [GLM-5.1](/wiki/glm_5_1)
- [Z.ai](/wiki/z_ai)
- [Zhipu AI](/wiki/zhipu_ai)
- [DeepSeek V3](/wiki/deepseek_v3)
- [DeepSeek V3.1](/wiki/deepseek_v3_1)
- [DeepSeek V3.2](/wiki/deepseek_v3_2)
- [Qwen3-Max](/wiki/qwen3_max)
- [Kimi K2](/wiki/kimi_k2)
- [Mixture of Experts](/wiki/mixture_of_experts)
- [Claude Sonnet 4](/wiki/claude_sonnet_4)
- [Claude Sonnet 4.5](/wiki/claude_sonnet_4_5)
- [Claude Code](/wiki/claude_code)
- [vLLM](/wiki/vllm)
- [SGLang](/wiki/sglang)
- [SWE-bench](/wiki/swe_bench)
- [LiveCodeBench](/wiki/livecodebench)
- [AIME](/wiki/aime)
- [GPQA](/wiki/gpqa)
- [MIT License](/wiki/mit_license)

## References

1. Zhipu AI (Z.ai). "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities." Z.ai blog, September 30, 2025. https://z.ai/blog/glm-4.6
2. Z.ai Developer Documentation. "GLM-4.6 Overview." https://docs.z.ai/guides/llm/glm-4.6
3. zai-org. "GLM-4.6 model card." Hugging Face. https://huggingface.co/zai-org/GLM-4.6
4. Asif Razzaq. "Zhipu AI Releases GLM-4.6: Achieving Enhancements in Real-World Coding, Long-Context Processing, Reasoning, Searching and Agentic AI." MarkTechPost, September 30, 2025. https://www.marktechpost.com/2025/09/30/zhipu-ai-releases-glm-4-6-achieving-enhancements-in-real-world-coding-long-context-processing-reasoning-searching-and-agentic-ai/
5. OpenRouter. "GLM 4.6: API Pricing and Benchmarks." https://openrouter.ai/z-ai/glm-4.6
6. zai-org. "GLM-4.5 / GLM-4.6 / GLM-4.7 README." GitHub. https://github.com/zai-org/GLM-4.5/blob/main/README.md
7. CodeGPT. "Zhipu GLM 4.6: The Open-Source Frontier AI Model Guide." CodeGPT blog. https://www.codegpt.co/blog/zhipu-glm-4-6-open-source-ai
8. Zhipu AI. "GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models." arXiv:2508.06471. https://arxiv.org/abs/2508.06471
9. APXML. "GLM-4.6: Specifications and GPU VRAM Requirements." https://apxml.com/models/glm-46
10. Barnacle Goose. "GLM-4.6 Review: Zhipu AI's Latest Leap." Medium, October 2025. https://medium.com/@leucopsis/glm-4-6-review-0600e9425c73
11. Artificial Analysis. "GLM-4.6 (Reasoning): Intelligence, Performance and Price Analysis." https://artificialanalysis.ai/models/glm-4-6-reasoning
12. Together AI. "GLM-4.6 API." https://www.together.ai/models/glm-4-6
13. Z.ai. "Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows." BusinessWire, December 23, 2025. https://www.businesswire.com/news/home/20251223393714/en/Z.ai-Open-Sources-GLM-4.7-a-New-Generation-Large-Language-Model-Built-for-Real-Development-Workflows
14. Cirra. "GLM-4.6 vs Claude Sonnet: A Performance and Cost Analysis." https://cirra.ai/articles/glm-4-6-vs-claude-sonnet-comparison
15. llm-stats. "GLM-4.6 Benchmarks, Pricing and Context Window." https://llm-stats.com/models/glm-4.6
16. Implicator.ai. "GLM-4.6: Open Weights, Real Coding Tests, Cheaper Tokens." October 2025. https://www.implicator.ai/glm-4-6-puts-receipts-on-the-table-open-weights-real-coding-runs-cheaper-tokens/
17. Barnacle Goose. "A Technical Analysis of GLM-4.7." Medium, December 2025. https://medium.com/@leucopsis/a-technical-analysis-of-glm-4-7-db7fcc54210a