GLM-4
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,635 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,635 words
Add missing citations, update stale details, or suggest a clearer explanation.
GLM-4 is the fourth-generation foundation model family from Zhipu AI, a Beijing company spun out of the Knowledge Engineering Group at Tsinghua University and now trading internationally as Z.ai. The flagship GLM-4 model was first announced on 16 January 2024 at the company's inaugural Technology Open Day (Zhipu DevDay) [1][2]. The family spans closed API models served through the bigmodel.cn platform, an agentic "All Tools" mode, the multimodal GLM-4V, and an open-weights GLM-4-9B series released in June 2024. It is documented in the technical report "ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools)" (arXiv:2406.12793) [3].
GLM-4 continues the lineage that runs from the GLM-130B base model through the ChatGLM chat series, all built on the General Language Model (GLM) autoregressive blank-infilling architecture co-developed by Jie Tang and colleagues. According to the technical report, the GLM-4 models were pre-trained on roughly ten trillion tokens, mostly Chinese and English, plus a smaller corpus drawn from 24 languages, then aligned through supervised fine-tuning and reinforcement learning from human feedback [3]. It precedes the GLM-4.5 and GLM-4.6 models, which are covered separately.
Zhipu serves several GLM-4 variants through its API. The first release, internally dated 0116, was followed by an improved GLM-4 (0520) and a set of smaller, cheaper tiers introduced through 2024. Zhipu has described the lineup as covering price points roughly 10x, 100x and 1000x below the flagship while keeping usable quality [4]. The main tiers are summarized below.
| Tier | Context length | Notes |
|---|---|---|
| GLM-4 | 128K | Flagship; versions dated 0116 and 0520 [3][5] |
| GLM-4-Air | 128K | Cost-performance tier (version 0605), performance close to GLM-4 (0116) with lower latency and cost [4][5] |
| GLM-4-AirX | 8K | High-performance Air variant, about 2.6x faster inference than GLM-4-Air [4] |
| GLM-4-Flash | 128K | Lightweight tier, made free to the public on 27 August 2024; stable throughput around 72 tokens per second [6] |
| GLM-4-Long | 1M | Ultra-long-context tier launched 13 August 2024, priced as low as 0.001 yuan per 1,000 tokens [7] |
| GLM-4V | N/A | Multimodal (vision plus language) tier; see below [3] |
The exact roster and version dates have shifted over time as Zhipu added and retired snapshots; later additions such as GLM-4-FlashX and the GLM-4-Plus update sit alongside the original tiers on the platform.
GLM-4V is the vision-language member of the family, accepting images alongside text. Zhipu positions it for document understanding, chart reading, optical character recognition, and general visual question answering in both Chinese and English. An open 9-billion-parameter version, GLM-4V-9B, was released with the rest of the open series in June 2024 and operates at a native input resolution of 1120 by 1120 pixels with an 8K context window [8].
On Zhipu's reported multimodal evaluations, GLM-4V-9B is competitive with much larger proprietary systems on several benchmarks while trailing on others, notably the knowledge-heavy MMMU. Selected scores from the model card are shown below [8].
| Benchmark | GLM-4V-9B | GPT-4o (0513) | GPT-4V (0409) | Gemini 1.0 Pro | Claude 3 Opus |
|---|---|---|---|---|---|
| MMBench-EN | 81.1 | 83.4 | 81.0 | 73.6 | 63.3 |
| MMBench-CN | 79.4 | 82.1 | 80.2 | 74.3 | 59.2 |
| SEEDBench-IMG | 76.8 | 77.1 | 73.0 | 70.7 | 64.0 |
| MMStar | 58.7 | 63.9 | 56.0 | 38.6 | 45.7 |
| MMMU | 47.2 | 69.2 | 61.7 | 49.0 | 54.9 |
| AI2D | 81.1 | 84.6 | 78.6 | 72.9 | 70.6 |
| OCRBench | 786 | 736 | 656 | 680 | 694 |
On 5 June 2024 Zhipu open-sourced the GLM-4-9B series through the THUDM organization on GitHub and Hugging Face [9]. The release includes a base model and several chat and multimodal variants:
| Model | Type | Context |
|---|---|---|
| GLM-4-9B | Base | 8K |
| GLM-4-9B-Chat | Chat | 128K |
| GLM-4-9B-Chat-1M | Chat | 1M |
| GLM-4V-9B | Multimodal chat | 8K |
GLM-4-9B-Chat adds web browsing, code execution, function calling (custom tool use) and long-text reasoning on top of multi-turn dialogue. The 1M-context variant is positioned for whole-document workloads; Zhipu reports lossless retrieval in a "Needle in a Haystack" test at the 1M length [7][9].
On Zhipu's published comparisons, GLM-4-9B-Chat outperforms Llama-3-8B-Instruct across most general benchmarks. Reported chat-model scores are shown below [9].
| Model | MMLU | C-Eval | GSM8K | MATH | HumanEval | IFEval | MT-Bench |
|---|---|---|---|---|---|---|---|
| Llama-3-8B-Instruct | 68.4 | 51.3 | 79.6 | 30.0 | 62.2 | 68.6 | 8.00 |
| GLM-4-9B-Chat | 72.4 | 75.6 | 79.6 | 50.6 | 71.8 | 69.0 | 8.35 |
The repository also reports base-model figures (for example, MMLU 74.7 and C-Eval 77.1 for GLM-4-9B before instruction tuning) and multilingual results across six datasets where GLM-4-9B-Chat leads Llama-3-8B-Instruct [9].
A central feature of the 2024 launch was "GLM-4 All Tools," an agentic mode in which the model decides on its own when and which tools to invoke rather than relying on a fixed pipeline [3]. The technical report describes four tool types: a web browser for retrieving and reading online information, a Python interpreter (code interpreter) for computation and data tasks, a text-to-image model, and user-defined functions [3]. The text-to-image component is CogView3, Zhipu's diffusion image generator [3]. The browser tool exposes operations such as searching and opening a page by URL, letting the model gather and synthesize current information across multiple steps.
Zhipu reports that GLM-4 All Tools matches, and in some cases surpasses, GPT-4 All Tools on tasks like accessing online information through web browsing, and that it performs comparably on code-interpreter problem solving and image generation [3]. The same All-Tools behavior, including web browsing, code execution and function calling, was carried into the open GLM-4-9B-Chat model [9].
GLM-4 and the All Tools model both support a 128K-token context window, equivalent to roughly 300 pages of text [3][2]. Zhipu later pushed this further with two 1M-token offerings: the open GLM-4-9B-Chat-1M model and the GLM-4-Long API tier, the latter launched on 13 August 2024 and marketed for tasks such as translating long documents, analyzing financial reports and building chatbots with very long memory [7]. The company has emphasized near-perfect recall at these lengths, reporting lossless information retrieval at 1M tokens in Needle-in-a-Haystack evaluations [7]. The context budget grew across generations of the GLM line, from a 2K window in the earliest ChatGLM models up to the 1M maximum reached here.
The technical report benchmarks the GLM-4 (0520) snapshot against GPT-4 (0314) and GPT-4 Turbo (2024-04-09) on standard academic suites. GLM-4 closely rivals or beats the original GPT-4 on these general metrics while trailing the newer GPT-4 Turbo on several. Reported figures are below [3].
| Benchmark | GLM-4 (0520) | GPT-4 (0314) | GPT-4 Turbo (0409) |
|---|---|---|---|
| MMLU | 83.3 | 86.4 | 86.7 |
| GSM8K | 93.3 | 92.0 | 95.6 |
| MATH | 61.3 | 52.9 | 73.4 |
| BBH | 84.7 | 83.1 | 88.2 |
| GPQA | 39.9 | 35.7 | 49.3 |
| HumanEval | 78.5 | 67.0 | 88.2 |
On instruction following (IFEval, English strict-instruction), GLM-4 (0520) scores 85.0 against 85.9 for GPT-4 Turbo [3]. On the Chinese alignment benchmark AlignBench, GLM-4 (0520) and GPT-4 Turbo tie at 8.00 overall, and the report states GLM-4 outperforms GPT-4 in Chinese alignment tasks [3]. On the bilingual coding benchmark NaturalCodeBench, GLM-4 (0520) scores 47.1 overall versus 53.8 for GPT-4 Turbo [3]. These results are Zhipu's own measurements, so the precise figures depend on the prompts and evaluation settings used.
The open GLM-4-9B series is released under a custom Model License rather than a standard open-source license. The model weights are governed by the GLM-4 license posted on the Hugging Face repositories, while the accompanying code in the GitHub repository is released under Apache 2.0 [9]. The model license permits research and, subject to its terms, commercial use; users are directed to follow the license conditions, which historically have included a registration or notification requirement for commercial deployment. The closed API tiers (GLM-4, GLM-4-Air, GLM-4-Flash and others) are commercial services billed per token on bigmodel.cn, with GLM-4-Flash offered free of charge.
Zhipu continued to iterate on the GLM-4 line after the 2024 launch. In April 2025 it open-sourced the GLM-4-0414 series, including a 32-billion-parameter GLM-4-32B-0414 and the reasoning-focused GLM-Z1 and GLM-Z1-Rumination variants, all supporting context extension to 128K [9]. A separate audio model, GLM-4-Voice, added end-to-end speech conversation. The family was later superseded by the hybrid-reasoning GLM-4.5 and the long-context coding model GLM-4.6, which became Zhipu's main open flagships.