# GLM-4

> Source: https://aiwiki.ai/wiki/glm_4
> Updated: 2026-06-24
> Categories: AI Models, Chinese AI, Large Language Models
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

GLM-4 is the fourth-generation foundation model family from [Zhipu AI](/wiki/zhipu_ai), a Beijing company spun out of the Knowledge Engineering Group at [Tsinghua University](/wiki/tsinghua_university) and now trading internationally as Z.ai. First announced on 16 January 2024, the flagship GLM-4 was pre-trained on roughly ten trillion tokens and, by its makers' measurements, "closely rivals or outperforms GPT-4" on general benchmarks such as MMLU, GSM8K and HumanEval while outperforming GPT-4 on Chinese alignment [1][2][3]. The family spans closed API models served through the bigmodel.cn platform, an agentic "All Tools" mode, the multimodal GLM-4V, the 2024 flagship [GLM-4-Plus](/wiki/glm_4_plus), and an open-weights GLM-4-9B series whose lineage of open models had drawn more than 10 million downloads on Hugging Face in 2023 alone [3]. GLM-4 was first announced at the company's inaugural Technology Open Day (Zhipu DevDay) and is documented in the technical report "ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools)" (arXiv:2406.12793) [1][2][3].

GLM-4 continues the lineage that runs from the [GLM-130B](/wiki/glm_130b) base model through the [ChatGLM](/wiki/chatglm) chat series, all built on the General Language Model (GLM) autoregressive blank-infilling architecture co-developed by [Jie Tang](/wiki/tang_jie) and colleagues. According to the technical report, the GLM-4 models were pre-trained on roughly ten trillion tokens, mostly Chinese and English, plus a smaller corpus drawn from 24 languages, then aligned through supervised fine-tuning and reinforcement learning from human feedback [3]. It precedes the [GLM-4.5](/wiki/glm_4_5) and [GLM-4.6](/wiki/glm_4_6) models, which are covered separately.

## What is the GLM architecture and how does it relate to ChatGLM?

The "GLM" in GLM-4 stands for General Language Model, the autoregressive blank-infilling pretraining objective Zhipu researchers introduced in 2021 and later scaled to the 130-billion-parameter [GLM-130B](/wiki/glm_130b) base model. The consumer-facing chat line began with ChatGLM-6B, a 6.2-billion-parameter bilingual chat model open-sourced on 14 March 2023, trained on about 1 trillion tokens of Chinese and English and small enough to run locally on a consumer GPU using INT4 quantization [3]. GLM-4 is the direct successor to that lineage: the technical report states it represents "our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM" [3]. The report also notes that Zhipu's open models, including ChatGLM-6B (three generations), GLM-4-9B, GLM-4V-9B, WebGLM and CodeGeeX, attracted "over 10 million downloads on Hugging face in the year 2023 alone" [3].

## API model tiers

Zhipu serves several GLM-4 variants through its API. The first release, internally dated 0116, was followed by an improved GLM-4 (0520) and a set of smaller, cheaper tiers introduced through 2024. Zhipu has described the lineup as covering price points roughly 10x, 100x and 1000x below the flagship while keeping usable quality [4]. The main tiers are summarized below.

| Tier | Context length | Notes |
| --- | --- | --- |
| GLM-4 | 128K | Flagship; versions dated 0116 and 0520 [3][5] |
| GLM-4-Air | 128K | Cost-performance tier (version 0605), performance close to GLM-4 (0116) with lower latency and cost [4][5] |
| GLM-4-AirX | 8K | High-performance Air variant, about 2.6x faster inference than GLM-4-Air [4] |
| GLM-4-Flash | 128K | Lightweight tier, made free to the public on 27 August 2024; stable throughput around 72 tokens per second [6] |
| GLM-4-Long | 1M | Ultra-long-context tier launched 13 August 2024, priced as low as 0.001 yuan per 1,000 tokens [7] |
| GLM-4-Plus | 128K | 2024 flagship update, announced 29 August 2024; positioned on par with GPT-4o [10] |
| GLM-4V | N/A | Multimodal (vision plus language) tier; see below [3] |

The exact roster and version dates have shifted over time as Zhipu added and retired snapshots; later additions such as GLM-4-FlashX sit alongside the original tiers on the platform.

## What is GLM-4-Plus?

GLM-4-Plus is the 2024 flagship of the GLM-4 line, announced by Zhipu on 29 August 2024 at the KDD 2024 conference [10]. It was released alongside a refreshed model suite: the text-to-image model CogView-3-Plus, the image and video understanding model GLM-4V-Plus, and the video generation model CogVideoX [10]. Zhipu described GLM-4-Plus as delivering comprehensive improvements in language understanding, instruction following and long-text processing, positioning it as "on par" with first-tier models such as GPT-4o, and a bilingual Chinese and English model with a 128K context window and native tool use [10]. At the same event Zhipu launched what it described as the first video-call service for consumer users in China on its Qingyan app, spanning text, audio and video with real-time inference [10].

## GLM-4V multimodal

GLM-4V is the vision-language member of the family, accepting images alongside text. Zhipu positions it for document understanding, chart reading, optical character recognition, and general visual question answering in both Chinese and English. An open 9-billion-parameter version, GLM-4V-9B, was released with the rest of the open series in June 2024 and operates at a native input resolution of 1120 by 1120 pixels with an 8K context window [8].

On Zhipu's reported multimodal evaluations, GLM-4V-9B is competitive with much larger proprietary systems on several benchmarks while trailing on others, notably the knowledge-heavy MMMU. Selected scores from the model card are shown below [8].

| Benchmark | GLM-4V-9B | GPT-4o (0513) | GPT-4V (0409) | Gemini 1.0 Pro | Claude 3 Opus |
| --- | --- | --- | --- | --- | --- |
| MMBench-EN | 81.1 | 83.4 | 81.0 | 73.6 | 63.3 |
| MMBench-CN | 79.4 | 82.1 | 80.2 | 74.3 | 59.2 |
| SEEDBench-IMG | 76.8 | 77.1 | 73.0 | 70.7 | 64.0 |
| MMStar | 58.7 | 63.9 | 56.0 | 38.6 | 45.7 |
| MMMU | 47.2 | 69.2 | 61.7 | 49.0 | 54.9 |
| AI2D | 81.1 | 84.6 | 78.6 | 72.9 | 70.6 |
| OCRBench | 786 | 736 | 656 | 680 | 694 |

## Is GLM-4 open source?

Partly. The closed API tiers (GLM-4, GLM-4-Air, GLM-4-Flash, GLM-4-Plus and others) are commercial services billed per token on bigmodel.cn, with GLM-4-Flash offered free of charge. The open-weights branch is the GLM-4-9B series. On 5 June 2024 Zhipu open-sourced GLM-4-9B through the THUDM organization on GitHub and Hugging Face [9]. The release includes a base model and several chat and multimodal variants:

| Model | Type | Context |
| --- | --- | --- |
| GLM-4-9B | Base | 8K |
| GLM-4-9B-Chat | Chat | 128K |
| GLM-4-9B-Chat-1M | Chat | 1M |
| GLM-4V-9B | Multimodal chat | 8K |

GLM-4-9B-Chat adds web browsing, code execution, function calling (custom tool use) and long-text reasoning on top of multi-turn dialogue. The 1M-context variant is positioned for whole-document workloads; Zhipu reports lossless retrieval in a "Needle in a Haystack" test at the 1M length [7][9].

On Zhipu's published comparisons, GLM-4-9B-Chat outperforms Llama-3-8B-Instruct across most general benchmarks. Reported chat-model scores are shown below [9].

| Model | MMLU | C-Eval | GSM8K | MATH | HumanEval | IFEval | MT-Bench |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Llama-3-8B-Instruct | 68.4 | 51.3 | 79.6 | 30.0 | 62.2 | 68.6 | 8.00 |
| GLM-4-9B-Chat | 72.4 | 75.6 | 79.6 | 50.6 | 71.8 | 69.0 | 8.35 |

The repository also reports base-model figures (for example, MMLU 74.7 and C-Eval 77.1 for GLM-4-9B before instruction tuning) and multilingual results across six datasets where GLM-4-9B-Chat leads Llama-3-8B-Instruct [9].

## All-Tools agent capabilities

A central feature of the 2024 launch was "GLM-4 All Tools," an agentic mode in which the model decides on its own when and which tools to invoke rather than relying on a fixed pipeline [3]. The technical report describes four tool types: a web browser for retrieving and reading online information, a Python interpreter (code interpreter) for computation and data tasks, a text-to-image model, and user-defined functions [3]. The text-to-image component is CogView3, Zhipu's diffusion image generator [3]. The browser tool exposes operations such as searching and opening a page by URL, letting the model gather and synthesize current information across multiple steps.

The report states that in practical applications, GLM-4 All Tools "matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter" [3]. The same All-Tools behavior, including web browsing, code execution and function calling, was carried into the open GLM-4-9B-Chat model [9].

## Long context

GLM-4 and the All Tools model both support a 128K-token context window, equivalent to roughly 300 pages of text [3][2]. Zhipu later pushed this further with two 1M-token offerings: the open GLM-4-9B-Chat-1M model and the GLM-4-Long API tier, the latter launched on 13 August 2024 and marketed for tasks such as translating long documents, analyzing financial reports and building chatbots with very long memory [7]. The company has emphasized near-perfect recall at these lengths, reporting lossless information retrieval at 1M tokens in Needle-in-a-Haystack evaluations [7]. The context budget grew across generations of the GLM line, from a 2K window in the earliest ChatGLM models up to the 1M maximum reached here.

## How does GLM-4 compare to GPT-4 on benchmarks?

The technical report benchmarks the GLM-4 (0520) snapshot against GPT-4 (0314) and GPT-4 Turbo (2024-04-09) on standard academic suites. GLM-4 closely rivals or beats the original GPT-4 on these general metrics while trailing the newer GPT-4 Turbo on several. Reported figures are below [3].

| Benchmark | GLM-4 (0520) | GPT-4 (0314) | GPT-4 Turbo (0409) |
| --- | --- | --- | --- |
| MMLU | 83.3 | 86.4 | 86.7 |
| GSM8K | 93.3 | 92.0 | 95.6 |
| MATH | 61.3 | 52.9 | 73.4 |
| BBH | 84.7 | 83.1 | 88.2 |
| GPQA | 39.9 | 35.7 | 49.3 |
| HumanEval | 78.5 | 67.0 | 88.2 |

On instruction following (IFEval, English strict-instruction), GLM-4 (0520) scores 85.0 against 85.9 for GPT-4 Turbo [3]. On the Chinese alignment benchmark AlignBench, GLM-4 (0520) and GPT-4 Turbo tie at 8.00 overall, and the report states GLM-4 "outperforms GPT-4 in Chinese alignments as measured by AlignBench" [3]. On the bilingual coding benchmark NaturalCodeBench, GLM-4 (0520) scores 47.1 overall versus 53.8 for GPT-4 Turbo [3]. These results are Zhipu's own measurements, so the precise figures depend on the prompts and evaluation settings used.

## Licensing

The open GLM-4-9B series is released under a custom Model License rather than a standard open-source license. The model weights are governed by the GLM-4 license posted on the Hugging Face repositories, while the accompanying code in the GitHub repository is released under Apache 2.0 [9]. The model license permits research and, subject to its terms, commercial use; users are directed to follow the license conditions, which historically have included a registration or notification requirement for commercial deployment. By contrast, the later GLM-4.5 and GLM-4.6 open weights moved to the permissive MIT license, removing those restrictions [11][13].

## What came after GLM-4? GLM-4.5, GLM-4.6 and the Z.ai IPO

Zhipu continued to iterate on the GLM-4 line after the 2024 launch. In April 2025 it open-sourced the GLM-4-0414 series, including a 32-billion-parameter GLM-4-32B-0414 and the reasoning-focused GLM-Z1 and GLM-Z1-Rumination variants, all supporting context extension to 128K [9]. A separate audio model, [GLM-4-Voice](/wiki/glm_4_voice), added end-to-end speech conversation.

The family was then superseded by a new generation of Mixture-of-Experts (MoE) flagships built for agentic and coding workloads. [GLM-4.5](/wiki/glm_4_5), released on 28 July 2025 under the MIT license, is a 355-billion-parameter MoE model with 32 billion active parameters (often written 355B-A32B), trained on 23 trillion tokens [11][12]. Z.ai reports that GLM-4.5 scores 63.2 averaged across 12 industry-standard benchmarks, third overall among the models tested, including 64.2 percent on SWE-bench Verified, 91.0 percent on AIME 2024 and 70.1 percent on TAU-bench [11][12]. The smaller GLM-4.5-Air variant scores 59.8 on the same composite [11]. [GLM-4.6](/wiki/glm_4_6), released in late September 2025, kept the 355B MoE design while extending the context window to 200K tokens and improving real-world coding and agentic search; it too is published under the MIT license [13][14].

The GLM line is the backbone of a company that became China's first publicly listed large-model developer. Zhipu AI rebranded its international identity as Z.ai in July 2025 (retaining the Zhipu AI / Knowledge Atlas name in China), and on 8 January 2026 it listed on the Hong Kong Stock Exchange, raising about HK$4.35 billion (around US$560 million) in what was described as the first IPO by one of China's "AI tiger" model startups [15][16].

## References

1. [Zhipu AI Unveils Next-Gen Foundation Model GLM-4, Claims Performance Comparable to GPT-4 (Maginative)](https://www.maginative.com/article/zhipu-ai-unveils-next-gen-foundation-model-glm-4-claims-performance-comparable-to-gpt-4/)
2. [Zhipu AI Introduces GLM-4 Model: Next-Generation Foundation Model Comparable with GPT-4 (MarkTechPost)](https://www.marktechpost.com/2024/01/23/zhipu-ai-introduces-glm-4-model-next-generation-foundation-model-comparable-with-gpt-4/)
3. [ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools) (arXiv:2406.12793)](https://arxiv.org/abs/2406.12793)
4. [Chatbot Arena issue: Add GLM-4 variants AirX, Air and Flash (lm-sys/FastChat #3420)](https://github.com/lm-sys/FastChat/issues/3420)
5. [GLM-4 series repository README (zai-org/GLM-4 on GitHub)](https://github.com/zai-org/GLM-4)
6. [Zhipu AI: GLM-4-Flash Large Model API Interface Free to the Public (AIbase)](https://www.aibase.com/news/11312)
7. [Zhipu AI: GLM-4-Long API Launched with Input and Output Price of 0.001 Yuan / 1,000 Tokens (AIbase)](https://news.aibase.com/news/11031)
8. [THUDM/glm-4v-9b model card (Hugging Face)](https://huggingface.co/THUDM/glm-4v-9b/blob/main/README_en.md)
9. [GLM-4-9B open-source release notes, README_20240605 (zai-org/GLM-4 on GitHub)](https://github.com/zai-org/GLM-4/blob/main/README_20240605.md)
10. [Zhipu Releases Next-Generation Foundation Model GLM-4-Plus and Upgrades Video Call Feature of Qingyan APP (AIbase)](https://www.aibase.com/news/11387)
11. [Zhipu AI (Z.ai) Releases Open-Weights GLM-4.5 Models That Perform Comparably To the Latest from Claude and DeepSeek (DeepLearning.AI, The Batch)](https://www.deeplearning.ai/the-batch/zhipu-ai-z-ai-releases-open-weights-glm-4-5-models-that-perform-comparably-to-the-latest-from-claude-and-deepseek)
12. [GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models (arXiv:2508.06471)](https://arxiv.org/abs/2508.06471)
13. [zai-org/GLM-4.6 model card (Hugging Face)](https://huggingface.co/zai-org/GLM-4.6)
14. [Zhipu AI Releases GLM-4.6: Enhancements in Real-World Coding, Long-Context Processing, Reasoning, Searching and Agentic AI (MarkTechPost)](https://www.marktechpost.com/2025/09/30/zhipu-ai-releases-glm-4-6-achieving-enhancements-in-real-world-coding-long-context-processing-reasoning-searching-and-agentic-ai/)
15. [Z.ai (company overview) (Wikipedia)](https://en.wikipedia.org/wiki/Z.ai)
16. [The first of China's 'AI tigers' goes public as Zhipu climbs in Hong Kong debut (CNBC)](https://www.cnbc.com/2026/01/08/china-ai-tiger-goes-ipo-zhipu-hong-kong-debut-openai-knowledge-atlas-hsi-hang-seng-listing.html)