GLM-4

GLM-4 is the fourth-generation foundation model family from Zhipu AI, a Beijing company spun out of the Knowledge Engineering Group at Tsinghua University and now trading internationally as Z.ai. First announced on 16 January 2024, the flagship GLM-4 was pre-trained on roughly ten trillion tokens and, by its makers' measurements, "closely rivals or outperforms GPT-4" on general benchmarks such as MMLU, GSM8K and HumanEval while outperforming GPT-4 on Chinese alignment ^[1]^[2]^[3]. The family spans closed API models served through the bigmodel.cn platform, an agentic "All Tools" mode, the multimodal GLM-4V, the 2024 flagship GLM-4-Plus, and an open-weights GLM-4-9B series whose lineage of open models had drawn more than 10 million downloads on Hugging Face in 2023 alone ^[3]. GLM-4 was first announced at the company's inaugural Technology Open Day (Zhipu DevDay) and is documented in the technical report "ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools)" (arXiv:2406.12793) ^[1]^[2]^[3].

GLM-4 continues the lineage that runs from the GLM-130B base model through the ChatGLM chat series, all built on the General Language Model (GLM) autoregressive blank-infilling architecture co-developed by Jie Tang and colleagues. According to the technical report, the GLM-4 models were pre-trained on roughly ten trillion tokens, mostly Chinese and English, plus a smaller corpus drawn from 24 languages, then aligned through supervised fine-tuning and reinforcement learning from human feedback ^[3]. It precedes the GLM-4.5 and GLM-4.6 models, which are covered separately.

What is the GLM architecture and how does it relate to ChatGLM?

The "GLM" in GLM-4 stands for General Language Model, the autoregressive blank-infilling pretraining objective Zhipu researchers introduced in 2021 and later scaled to the 130-billion-parameter GLM-130B base model. The consumer-facing chat line began with ChatGLM-6B, a 6.2-billion-parameter bilingual chat model open-sourced on 14 March 2023, trained on about 1 trillion tokens of Chinese and English and small enough to run locally on a consumer GPU using INT4 quantization ^[3]. GLM-4 is the direct successor to that lineage: the technical report states it represents "our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM" ^[3]. The report also notes that Zhipu's open models, including ChatGLM-6B (three generations), GLM-4-9B, GLM-4V-9B, WebGLM and CodeGeeX, attracted "over 10 million downloads on Hugging face in the year 2023 alone" ^[3].

API model tiers

Zhipu serves several GLM-4 variants through its API. The first release, internally dated 0116, was followed by an improved GLM-4 (0520) and a set of smaller, cheaper tiers introduced through 2024. Zhipu has described the lineup as covering price points roughly 10x, 100x and 1000x below the flagship while keeping usable quality ^[4]. The main tiers are summarized below.

Tier	Context length	Notes
GLM-4	128K	Flagship; versions dated 0116 and 0520 ^[3]^[5]
GLM-4-Air	128K	Cost-performance tier (version 0605), performance close to GLM-4 (0116) with lower latency and cost ^[4]^[5]
GLM-4-AirX	8K	High-performance Air variant, about 2.6x faster inference than GLM-4-Air ^[4]
GLM-4-Flash	128K	Lightweight tier, made free to the public on 27 August 2024; stable throughput around 72 tokens per second ^[6]
GLM-4-Long	1M	Ultra-long-context tier launched 13 August 2024, priced as low as 0.001 yuan per 1,000 tokens ^[7]
GLM-4-Plus	128K	2024 flagship update, announced 29 August 2024; positioned on par with GPT-4o ^[10]
GLM-4V	N/A	Multimodal (vision plus language) tier; see below ^[3]

The exact roster and version dates have shifted over time as Zhipu added and retired snapshots; later additions such as GLM-4-FlashX sit alongside the original tiers on the platform.

What is GLM-4-Plus?

GLM-4-Plus is the 2024 flagship of the GLM-4 line, announced by Zhipu on 29 August 2024 at the KDD 2024 conference ^[10]. It was released alongside a refreshed model suite: the text-to-image model CogView-3-Plus, the image and video understanding model GLM-4V-Plus, and the video generation model CogVideoX ^[10]. Zhipu described GLM-4-Plus as delivering comprehensive improvements in language understanding, instruction following and long-text processing, positioning it as "on par" with first-tier models such as GPT-4o, and a bilingual Chinese and English model with a 128K context window and native tool use ^[10]. At the same event Zhipu launched what it described as the first video-call service for consumer users in China on its Qingyan app, spanning text, audio and video with real-time inference ^[10].

GLM-4V multimodal

GLM-4V is the vision-language member of the family, accepting images alongside text. Zhipu positions it for document understanding, chart reading, optical character recognition, and general visual question answering in both Chinese and English. An open 9-billion-parameter version, GLM-4V-9B, was released with the rest of the open series in June 2024 and operates at a native input resolution of 1120 by 1120 pixels with an 8K context window ^[8].

On Zhipu's reported multimodal evaluations, GLM-4V-9B is competitive with much larger proprietary systems on several benchmarks while trailing on others, notably the knowledge-heavy MMMU. Selected scores from the model card are shown below ^[8].

Benchmark	GLM-4V-9B	GPT-4o (0513)	GPT-4V (0409)	Gemini 1.0 Pro	Claude 3 Opus
MMBench-EN	81.1	83.4	81.0	73.6	63.3
MMBench-CN	79.4	82.1	80.2	74.3	59.2
SEEDBench-IMG	76.8	77.1	73.0	70.7	64.0
MMStar	58.7	63.9	56.0	38.6	45.7
MMMU	47.2	69.2	61.7	49.0	54.9
AI2D	81.1	84.6	78.6	72.9	70.6
OCRBench	786	736	656	680	694

Is GLM-4 open source?

Partly. The closed API tiers (GLM-4, GLM-4-Air, GLM-4-Flash, GLM-4-Plus and others) are commercial services billed per token on bigmodel.cn, with GLM-4-Flash offered free of charge. The open-weights branch is the GLM-4-9B series. On 5 June 2024 Zhipu open-sourced GLM-4-9B through the THUDM organization on GitHub and Hugging Face ^[9]. The release includes a base model and several chat and multimodal variants:

Model	Type	Context
GLM-4-9B	Base	8K
GLM-4-9B-Chat	Chat	128K
GLM-4-9B-Chat-1M	Chat	1M
GLM-4V-9B	Multimodal chat	8K

GLM-4-9B-Chat adds web browsing, code execution, function calling (custom tool use) and long-text reasoning on top of multi-turn dialogue. The 1M-context variant is positioned for whole-document workloads; Zhipu reports lossless retrieval in a "Needle in a Haystack" test at the 1M length ^[7]^[9].

On Zhipu's published comparisons, GLM-4-9B-Chat outperforms Llama-3-8B-Instruct across most general benchmarks. Reported chat-model scores are shown below ^[9].

Model	MMLU	C-Eval	GSM8K	MATH	HumanEval	IFEval	MT-Bench
Llama-3-8B-Instruct	68.4	51.3	79.6	30.0	62.2	68.6	8.00
GLM-4-9B-Chat	72.4	75.6	79.6	50.6	71.8	69.0	8.35

The repository also reports base-model figures (for example, MMLU 74.7 and C-Eval 77.1 for GLM-4-9B before instruction tuning) and multilingual results across six datasets where GLM-4-9B-Chat leads Llama-3-8B-Instruct ^[9].

All-Tools agent capabilities

A central feature of the 2024 launch was "GLM-4 All Tools," an agentic mode in which the model decides on its own when and which tools to invoke rather than relying on a fixed pipeline ^[3]. The technical report describes four tool types: a web browser for retrieving and reading online information, a Python interpreter (code interpreter) for computation and data tasks, a text-to-image model, and user-defined functions ^[3]. The text-to-image component is CogView3, Zhipu's diffusion image generator ^[3]. The browser tool exposes operations such as searching and opening a page by URL, letting the model gather and synthesize current information across multiple steps.

The report states that in practical applications, GLM-4 All Tools "matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter" ^[3]. The same All-Tools behavior, including web browsing, code execution and function calling, was carried into the open GLM-4-9B-Chat model ^[9].

Long context

GLM-4 and the All Tools model both support a 128K-token context window, equivalent to roughly 300 pages of text ^[3]^[2]. Zhipu later pushed this further with two 1M-token offerings: the open GLM-4-9B-Chat-1M model and the GLM-4-Long API tier, the latter launched on 13 August 2024 and marketed for tasks such as translating long documents, analyzing financial reports and building chatbots with very long memory ^[7]. The company has emphasized near-perfect recall at these lengths, reporting lossless information retrieval at 1M tokens in Needle-in-a-Haystack evaluations ^[7]. The context budget grew across generations of the GLM line, from a 2K window in the earliest ChatGLM models up to the 1M maximum reached here.

How does GLM-4 compare to GPT-4 on benchmarks?

The technical report benchmarks the GLM-4 (0520) snapshot against GPT-4 (0314) and GPT-4 Turbo (2024-04-09) on standard academic suites. GLM-4 closely rivals or beats the original GPT-4 on these general metrics while trailing the newer GPT-4 Turbo on several. Reported figures are below ^[3].

Benchmark	GLM-4 (0520)	GPT-4 (0314)	GPT-4 Turbo (0409)
MMLU	83.3	86.4	86.7
GSM8K	93.3	92.0	95.6
MATH	61.3	52.9	73.4
BBH	84.7	83.1	88.2
GPQA	39.9	35.7	49.3
HumanEval	78.5	67.0	88.2

On instruction following (IFEval, English strict-instruction), GLM-4 (0520) scores 85.0 against 85.9 for GPT-4 Turbo ^[3]. On the Chinese alignment benchmark AlignBench, GLM-4 (0520) and GPT-4 Turbo tie at 8.00 overall, and the report states GLM-4 "outperforms GPT-4 in Chinese alignments as measured by AlignBench" ^[3]. On the bilingual coding benchmark NaturalCodeBench, GLM-4 (0520) scores 47.1 overall versus 53.8 for GPT-4 Turbo ^[3]. These results are Zhipu's own measurements, so the precise figures depend on the prompts and evaluation settings used.

Licensing

The open GLM-4-9B series is released under a custom Model License rather than a standard open-source license. The model weights are governed by the GLM-4 license posted on the Hugging Face repositories, while the accompanying code in the GitHub repository is released under Apache 2.0 ^[9]. The model license permits research and, subject to its terms, commercial use; users are directed to follow the license conditions, which historically have included a registration or notification requirement for commercial deployment. By contrast, the later GLM-4.5 and GLM-4.6 open weights moved to the permissive MIT license, removing those restrictions ^[11]^[13].

What came after GLM-4? GLM-4.5, GLM-4.6 and the Z.ai IPO

Zhipu continued to iterate on the GLM-4 line after the 2024 launch. In April 2025 it open-sourced the GLM-4-0414 series, including a 32-billion-parameter GLM-4-32B-0414 and the reasoning-focused GLM-Z1 and GLM-Z1-Rumination variants, all supporting context extension to 128K ^[9]. A separate audio model, GLM-4-Voice, added end-to-end speech conversation.

The family was then superseded by a new generation of Mixture-of-Experts (MoE) flagships built for agentic and coding workloads. GLM-4.5, released on 28 July 2025 under the MIT license, is a 355-billion-parameter MoE model with 32 billion active parameters (often written 355B-A32B), trained on 23 trillion tokens ^[11]^[12]. Z.ai reports that GLM-4.5 scores 63.2 averaged across 12 industry-standard benchmarks, third overall among the models tested, including 64.2 percent on SWE-bench Verified, 91.0 percent on AIME 2024 and 70.1 percent on TAU-bench ^[11]^[12]. The smaller GLM-4.5-Air variant scores 59.8 on the same composite ^[11]. GLM-4.6, released in late September 2025, kept the 355B MoE design while extending the context window to 200K tokens and improving real-world coding and agentic search; it too is published under the MIT license ^[13]^[14].

The GLM line is the backbone of a company that became China's first publicly listed large-model developer. Zhipu AI rebranded its international identity as Z.ai in July 2025 (retaining the Zhipu AI / Knowledge Atlas name in China), and on 8 January 2026 it listed on the Hong Kong Stock Exchange, raising about HK$4.35 billion (around US$560 million) in what was described as the first IPO by one of China's "AI tiger" model startups ^[15]^[16].

GLM-4

What is the GLM architecture and how does it relate to ChatGLM?

API model tiers

What is GLM-4-Plus?

GLM-4V multimodal

Is GLM-4 open source?

All-Tools agent capabilities

Long context

How does GLM-4 compare to GPT-4 on benchmarks?

Licensing

What came after GLM-4? GLM-4.5, GLM-4.6 and the Z.ai IPO

References

Improve this article

What links here

What is the GLM architecture and how does it relate to ChatGLM?

API model tiers

What is GLM-4-Plus?

GLM-4V multimodal

Is GLM-4 open source?

All-Tools agent capabilities

Long context

How does GLM-4 compare to GPT-4 on benchmarks?

Licensing

What came after GLM-4? GLM-4.5, GLM-4.6 and the Z.ai IPO

References

What links here

What is the GLM architecture and how does it relate to ChatGLM?

API model tiers

What is GLM-4-Plus?

GLM-4V multimodal

Is GLM-4 open source?

All-Tools agent capabilities

Long context

How does GLM-4 compare to GPT-4 on benchmarks?

Licensing

What came after GLM-4? GLM-4.5, GLM-4.6 and the Z.ai IPO

References

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

What is the GLM architecture and how does it relate to ChatGLM?

API model tiers

What is GLM-4-Plus?

GLM-4V multimodal

Is GLM-4 open source?

All-Tools agent capabilities

Long context

How does GLM-4 compare to GPT-4 on benchmarks?

Licensing

What came after GLM-4? GLM-4.5, GLM-4.6 and the Z.ai IPO

References

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here