ChatGLM
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,736 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,736 words
Add missing citations, update stale details, or suggest a clearer explanation.
ChatGLM is a series of open, bilingual (Chinese and English) conversational large language models developed by Zhipu AI together with the Knowledge Engineering Group (KEG) lab at Tsinghua University. The models are built on GLM (General Language Model), an autoregressive blank-infilling pretraining architecture first described by KEG researchers in 2021. ChatGLM-6B, released on 14 March 2023, was one of the earliest widely used open Chinese chat models and could run on a single consumer GPU through INT4 quantization. It was followed by ChatGLM2-6B (June 2023) and ChatGLM3-6B (October 2023). The line bridges the earlier GLM-130B base model and the later GLM-4 family, a lineage documented in the 2024 technical report "ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools)" [1].
GLM stands for General Language Model. It was introduced in the paper "GLM: General Language Model Pretraining with Autoregressive Blank Infilling," published at ACL 2022 by Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang and Jie Tang [2]. Rather than following the pure autoencoding objective of BERT or the left-to-right autoregressive objective of GPT, GLM masks spans of text and trains the model to regenerate those spans autoregressively, a procedure the authors call autoregressive blank infilling. The method adds 2D positional encodings and lets the model predict masked spans in an arbitrary order. The authors reported that, at equivalent model size and training data, GLM outperformed BERT, T5 and GPT across natural language understanding, conditional generation and unconditional generation tasks. This single architecture, which can behave like an encoder or a decoder depending on how blanks are configured, underpins every model in the ChatGLM and GLM family.
The same architecture was scaled up to GLM-130B, a 130-billion-parameter bilingual model presented at ICLR 2023 [3]. GLM-130B was trained on more than 400 billion tokens (roughly 200 billion each of English and Chinese) and was notable for reaching INT4 quantization with almost no loss in quality, which let it run on a single server with four RTX 3090 (24 GB) cards. ChatGLM grew out of this base: the dialogue model was tuned for conversation, and the smaller open ChatGLM-6B made the approach reachable for individual researchers and hobbyists.
ChatGLM-6B was open-sourced on 14 March 2023 through the THUDM GitHub organization and Hugging Face. It has about 6.2 billion parameters and was trained on roughly 1 trillion tokens of Chinese and English text, then refined with supervised fine-tuning, feedback bootstrapping and reinforcement learning from human feedback [4]. The original model used a 2K-token context window. Its appeal was practical: with INT4 quantization it needed only about 6 GB of GPU memory, so it could run on a mainstream gaming card rather than a data-center accelerator.
The repository shipped three precision levels, summarized below [4]:
| Precision | GPU memory required |
|---|---|
| FP16 (no quantization) | 13 GB |
| INT8 | 10 GB |
| INT4 | 6 GB |
The project also provided parameter-efficient fine-tuning based on P-Tuning v2, which let users adapt the model to a downstream task with as little as 7 GB of memory at INT4. A v1.1 update on 15 May 2023 added more English instruction data to balance the Chinese-heavy training mix. The repository code is released under the Apache 2.0 license, while the model weights carry a separate license that permits academic research freely and allows commercial use after the user completes a registration questionnaire [4]. ChatGLM-6B became one of the most downloaded open models from China during 2023; the GLM team later reported that its open models drew over 10 million downloads on Hugging Face in 2023 alone [1].
ChatGLM2-6B arrived on 25 June 2023 as the second generation [4]. It kept the roughly 6-billion-parameter footprint but was pre-trained on 1.4 trillion bilingual tokens and adopted a hybrid objective from the GLM line. Three engineering changes mattered most: FlashAttention extended the trainable context to 32K tokens (with dialogue alignment performed at 8K); Multi-Query Attention shrank the key-value cache and sped up inference; and a causal-mask training scheme let the model reuse the cache across dialogue turns. Zhipu reported inference roughly 42 percent faster than the first generation (44.62 versus 31.49 tokens per second on an A100-SXM-80G), and at INT4 with 6 GB of memory the supported dialogue length rose from about 1K to 8K tokens [4].
ChatGLM3 was announced on 27 October 2023 at the China National Computer Conference (CNCC) and was again a joint Zhipu AI and Tsinghua KEG release [5]. ChatGLM3-6B introduced a redesigned prompt format that natively supports tool invocation (function calling), a built-in code interpreter and multi-step agent tasks, in addition to ordinary multi-turn chat. The third generation shipped as a small family rather than a single checkpoint:
| Model | Context length | Notes |
|---|---|---|
| ChatGLM3-6B | 8K | Aligned chat model |
| ChatGLM3-6B-Base | 8K | Base (pretrained) checkpoint |
| ChatGLM3-6B-32K | 32K | Long-context chat variant |
| ChatGLM3-6B-128K | 128K | Extended long-context variant |
Zhipu also released edge-oriented checkpoints, ChatGLM3-1.5B and ChatGLM3-3B, aimed at phones and in-vehicle platforms, and stated that ChatGLM3 ranked first among domestic models of comparable size across 44 Chinese and English public benchmark datasets [5].
| Model | Release date | Parameters | Context length | Training tokens |
|---|---|---|---|---|
| ChatGLM-6B | 14 Mar 2023 | ~6.2B | 2K | ~1T |
| ChatGLM-6B v1.1 | 15 May 2023 | ~6.2B | 2K | ~1T |
| ChatGLM2-6B | 25 Jun 2023 | ~6B | 32K (8K aligned) | 1.4T |
| ChatGLM3-6B | 27 Oct 2023 | ~6B | 8K / 32K / 128K | N/A |
A defining feature of the ChatGLM-6B series is that it was engineered to run on hardware most people already own. The models support FP16/BF16, INT8 and INT4 weight quantization, and the INT4 path is the reason a 6-billion-parameter chat model could fit in 6 GB of video memory. This continued the INT4 work pioneered on GLM-130B, where the authors found a scaling property of the GLM weights that allowed 4-bit quantization without quantization-aware training and with negligible accuracy loss [3]. For ChatGLM2-6B the published memory profile under the two quantized settings is shown below [4]:
| Format | Encoding 2048 tokens | Decoding 8192 tokens |
|---|---|---|
| FP16 / BF16 | 13.1 GB | 12.8 GB |
| INT8 | 8.2 GB | 8.1 GB |
| INT4 | 5.5 GB | 5.1 GB |
Because the weights and reference code were openly available, the models spread quickly through community tooling and were packaged for CPU and quantized GPU inference well beyond the official repositories.
The first ChatGLM-6B was not a strong benchmark performer; its value was accessibility. Successive generations improved markedly on standard evaluations. The figures below are reported by Zhipu for ChatGLM2-6B (versus the first generation) and for the ChatGLM3-6B-Base checkpoint [4][5].
| Benchmark | ChatGLM2-6B | ChatGLM3-6B-Base |
|---|---|---|
| MMLU | 45.46 | 61.4 |
| C-Eval | 50.1 | 69.0 |
| GSM8K | 28.05 | 72.3 |
| BBH | 30.00 | 66.1 |
| MATH | N/A | 25.7 |
The jump from ChatGLM2-6B to ChatGLM3-6B-Base on math and reasoning (for example GSM8K rising from about 28 to about 72) reflects both a stronger pretraining corpus and the improved alignment recipe carried over from the GLM-4 work. These scores place the 6B-Base model at or near the top of its size class among open models released in 2023, though they sit well below the much larger flagship models of the same period.
ChatGLM follows a split licensing approach that became common for Chinese open models. The source code in the official repositories is released under the permissive Apache 2.0 license. The model weights are governed by a separate "Model License" that grants free use for academic research and permits commercial use after the user registers through a questionnaire, with additional clauses prohibiting uses that could harm national or social interests [4][5]. This arrangement is open enough for researchers and most product builders while keeping a registration step for commercial deployment. Later flagship models in the family moved toward more conventional open licenses: GLM-4-9B and several subsequent releases were published under named model licenses, and parts of the newer GLM-4.5 generation adopted the MIT license.
ChatGLM sits in the middle of a continuous research arc. It descends from GLM-130B, the 130-billion-parameter bilingual base model that proved a GPT-3-scale system could be trained and openly released outside the largest Western labs [3]. The conversational tuning that produced the unreleased full-size ChatGLM, and the openly released ChatGLM-6B, established the recipe of GLM pretraining plus supervised fine-tuning and human-feedback alignment.
That recipe fed forward into GLM-4. The 2024 report "ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools)" was submitted to arXiv on 18 June 2024 by a large Team GLM, with first author Aohan Zeng and Jie Tang among the co-authors [1]. It traces the path from GLM-130B through the three ChatGLM generations to GLM-4, GLM-4-Air and the openly released GLM-4-9B, models pre-trained on roughly ten trillion tokens and aligned through multi-stage supervised fine-tuning and reinforcement learning from human feedback. The report states that GLM-4 closely rivals or surpasses GPT-4 on general benchmarks such as MMLU, GSM8K, MATH, BBH, GPQA and HumanEval, approaches GPT-4 Turbo on instruction following, and matches GPT-4 Turbo (128K) and Claude 3 on long-context tasks. The "All Tools" variant is trained to decide autonomously when to call a web browser, a Python interpreter, a text-to-image model or user-defined functions. Development continued past GLM-4 into later releases such as GLM-4.5 and GLM-4.6, but ChatGLM remains the name attached specifically to the 2023-era 6B chat models that first brought the GLM architecture to a broad open-source audience.