# ChatGLM

> Source: https://aiwiki.ai/wiki/chatglm
> Updated: 2026-06-23
> Categories: Chinese AI, Conversational AI, Large Language Models
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

ChatGLM is a series of open, bilingual (Chinese and English) conversational [large language models](/wiki/large_language_model) developed by [Zhipu AI](/wiki/zhipu_ai) together with the Knowledge Engineering Group (KEG) lab at [Tsinghua University](/wiki/tsinghua_university). Its breakthrough model, ChatGLM-6B, was open-sourced on 14 March 2023 with about 6.2 billion parameters and could run on a single consumer GPU in roughly 6 GB of memory using INT4 quantization, which made it one of the earliest widely used open Chinese chat models [1][6]. The ChatGLM-6B repository drew more than 41,000 GitHub stars, and the GLM team reported that its open models attracted over 10 million downloads on Hugging Face in 2023 alone [1][6]. The models are built on [GLM (General Language Model)](/wiki/glm), an autoregressive blank-infilling pretraining architecture first described by KEG researchers in 2021 [2]. ChatGLM-6B was followed by ChatGLM2-6B (June 2023) and ChatGLM3-6B (October 2023), and the line bridges the earlier [GLM-130B](/wiki/glm_130b) base model and the later [GLM-4](/wiki/glm_4) family, a lineage documented in the 2024 technical report "ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools)" [1].

## What is the GLM architecture?

GLM stands for General Language Model. It was introduced in the paper "GLM: General Language Model Pretraining with Autoregressive Blank Infilling," published at ACL 2022 by Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang and [Jie Tang](/wiki/tang_jie) [2]. Rather than following the pure autoencoding objective of BERT or the left-to-right autoregressive objective of GPT, GLM masks spans of text and trains the model to regenerate those spans autoregressively, a procedure the authors call autoregressive blank infilling. As the paper puts it, "GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans" [2]. The authors reported that GLM "outperforms BERT, T5, and GPT given the same model sizes and data" across natural language understanding, conditional generation and unconditional generation tasks, and that it "achieves the best performance from a single pretrained model with 1.25x parameters of BERT Large" [2]. This single architecture, which can behave like an encoder or a decoder depending on how blanks are configured, underpins every model in the ChatGLM and GLM family.

The same architecture was scaled up to GLM-130B, a 130-billion-parameter bilingual model presented at ICLR 2023 [3]. GLM-130B was trained on more than 400 billion tokens (roughly 200 billion each of English and Chinese) and was notable for reaching INT4 quantization with almost no loss in quality, which let it run on a single server with four RTX 3090 (24 GB) cards. ChatGLM grew out of this base: the dialogue model was tuned for conversation, and the smaller open ChatGLM-6B made the approach reachable for individual researchers and hobbyists.

## When was ChatGLM-6B released, and why did it matter?

ChatGLM-6B was open-sourced on 14 March 2023 through the THUDM GitHub organization and Hugging Face [6]. It has about 6.2 billion parameters and was trained on roughly 1 trillion tokens of Chinese and English text, then refined with supervised fine-tuning, feedback bootstrapping and reinforcement learning from human feedback [6]. The original model used a 2K-token context window. Its appeal was practical: with INT4 quantization it needed only about 6 GB of GPU memory, so it could run on a mainstream gaming card rather than a data-center accelerator. The README described it as a model that "uses technology similar to ChatGPT, optimized for Chinese QA and dialogue" [6].

The repository shipped three precision levels, summarized below [6]:

| Precision | GPU memory required |
| --- | --- |
| FP16 (no quantization) | 13 GB |
| INT8 | 10 GB |
| INT4 | 6 GB |

The project also provided parameter-efficient fine-tuning based on P-Tuning v2, which let users adapt the model to a downstream task with as little as 7 GB of memory at INT4. A v1.1 update on 15 May 2023 added more English instruction data to balance the Chinese-heavy training mix [6]. The repository code is released under the Apache 2.0 license, while the model weights carry a separate license that permits academic research freely and allows commercial use after the user completes a registration questionnaire [6]. ChatGLM-6B became one of the most downloaded open models from China during 2023: the repository accumulated more than 41,000 GitHub stars, and the GLM team later reported that its open models drew over 10 million downloads on Hugging Face in 2023 alone [1][6].

## How do ChatGLM2 and ChatGLM3 differ from ChatGLM-6B?

ChatGLM2-6B arrived on 25 June 2023 as the second generation [4]. It kept the roughly 6-billion-parameter footprint but was pre-trained on 1.4 trillion bilingual tokens and adopted a hybrid objective from the GLM line. Three engineering changes mattered most: [FlashAttention](/wiki/flash_attention) extended the trainable context to 32K tokens (with dialogue alignment performed at 8K); Multi-Query Attention shrank the key-value cache and sped up inference; and a causal-mask training scheme let the model reuse the cache across dialogue turns. Zhipu reported inference roughly 42 percent faster than the first generation (44.62 versus 31.49 tokens per second on an A100-SXM-80G), and at INT4 with 6 GB of memory the supported dialogue length rose from about 1K to 8K tokens [4].

ChatGLM3 was announced on 27 October 2023 at the China National Computer Conference (CNCC) and was again a joint Zhipu AI and Tsinghua KEG release [5][7]. ChatGLM3-6B introduced a redesigned prompt format that, in the project's own words, "natively supports tool invocation (Function Call), code execution (Code Interpreter), and Agent tasks in complex scenarios," in addition to ordinary multi-turn chat [5]. The third generation shipped as a small family rather than a single checkpoint:

| Model | Context length | Notes |
| --- | --- | --- |
| ChatGLM3-6B | 8K | Aligned chat model |
| ChatGLM3-6B-Base | 8K | Base (pretrained) checkpoint |
| ChatGLM3-6B-32K | 32K | Long-context chat variant |
| ChatGLM3-6B-128K | 128K | Extended long-context variant |

Zhipu also released edge-oriented checkpoints, ChatGLM3-1.5B and ChatGLM3-3B, aimed at phones and in-vehicle platforms from vendors including vivo, Xiaomi and Samsung, reaching about 20 tokens per second on-device, and stated that ChatGLM3 ranked first among domestic models of comparable size across 44 Chinese and English public benchmark datasets [7]. The README adds that "ChatGLM3-6B-Base has the strongest performance among base models below 10B" [5].

## Version timeline

| Model | Release date | Parameters | Context length | Training tokens |
| --- | --- | --- | --- | --- |
| ChatGLM-6B | 14 Mar 2023 | ~6.2B | 2K | ~1T |
| ChatGLM-6B v1.1 | 15 May 2023 | ~6.2B | 2K | ~1T |
| ChatGLM2-6B | 25 Jun 2023 | ~6B | 32K (8K aligned) | 1.4T |
| ChatGLM3-6B | 27 Oct 2023 | ~6B | 8K / 32K / 128K | N/A |

## How can ChatGLM run on consumer hardware?

A defining feature of the ChatGLM-6B series is that it was engineered to run on hardware most people already own. The models support FP16/BF16, INT8 and INT4 weight quantization, and the INT4 path is the reason a 6-billion-parameter chat model could fit in 6 GB of video memory [6]. This continued the INT4 work pioneered on GLM-130B, where the authors found a scaling property of the GLM weights that allowed 4-bit quantization without quantization-aware training and with negligible accuracy loss [3]. For ChatGLM2-6B the published memory profile under the two quantized settings is shown below [4]:

| Format | Encoding 2048 tokens | Decoding 8192 tokens |
| --- | --- | --- |
| FP16 / BF16 | 13.1 GB | 12.8 GB |
| INT8 | 8.2 GB | 8.1 GB |
| INT4 | 5.5 GB | 5.1 GB |

Because the weights and reference code were openly available, the models spread quickly through community tooling and were packaged for CPU and quantized GPU inference well beyond the official repositories.

## How does ChatGLM perform on benchmarks?

The first ChatGLM-6B was not a strong benchmark performer; its value was accessibility. Successive generations improved markedly on standard evaluations. The figures below are reported by Zhipu for ChatGLM2-6B (versus the first generation) and for the ChatGLM3-6B-Base checkpoint [4][5].

| Benchmark | ChatGLM2-6B | ChatGLM3-6B-Base |
| --- | --- | --- |
| [MMLU](/wiki/mmlu) | 45.46 | 61.4 |
| C-Eval | 50.1 | 69.0 |
| GSM8K | 28.05 | 72.3 |
| BBH | 30.00 | 66.1 |
| MATH | N/A | 25.7 |

The jump from ChatGLM2-6B to ChatGLM3-6B-Base on math and reasoning (for example GSM8K rising from about 28 to about 72) reflects both a stronger pretraining corpus and the improved alignment recipe carried over from the GLM-4 work. These scores place the 6B-Base model at or near the top of its size class among open models released in 2023, though they sit well below the much larger flagship models of the same period.

## Is ChatGLM open source?

ChatGLM follows a split licensing approach that became common for Chinese open models. The source code in the official repositories is released under the permissive Apache 2.0 license. The model weights are governed by a separate "Model License" that grants free use for academic research and permits commercial use after the user registers through a questionnaire, with additional clauses prohibiting uses that could harm national or social interests [4][5][6]. This arrangement is open enough for researchers and most product builders while keeping a registration step for commercial deployment. Later flagship models in the family moved toward more conventional open licenses: GLM-4-9B and several subsequent releases were published under named model licenses, and parts of the newer GLM-4.5 generation adopted the MIT license.

## How does ChatGLM relate to GLM-130B and GLM-4?

ChatGLM sits in the middle of a continuous research arc. It descends from GLM-130B, the 130-billion-parameter bilingual base model that proved a GPT-3-scale system could be trained and openly released outside the largest Western labs [3]. The conversational tuning that produced the unreleased full-size ChatGLM, and the openly released ChatGLM-6B, established the recipe of GLM pretraining plus supervised fine-tuning and human-feedback alignment.

That recipe fed forward into GLM-4. The 2024 report "ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 (All Tools)" was submitted to arXiv on 18 June 2024 by a large Team GLM, with first author Aohan Zeng and Jie Tang among the co-authors [1]. It traces the path from GLM-130B through the three ChatGLM generations to GLM-4, GLM-4-Air and the openly released GLM-4-9B, models pre-trained on roughly ten trillion tokens and aligned through multi-stage supervised fine-tuning and reinforcement learning from human feedback. The report states that GLM-4 closely rivals or surpasses GPT-4 on general benchmarks such as MMLU, GSM8K, MATH, BBH, GPQA and HumanEval, approaches GPT-4 Turbo on instruction following, and matches GPT-4 Turbo (128K) and Claude 3 on long-context tasks. The "All Tools" variant is trained to decide autonomously when to call a web browser, a Python interpreter, a text-to-image model or user-defined functions. Development continued past GLM-4 into later releases such as [GLM-4.5](/wiki/glm_4_5) and [GLM-4.6](/wiki/glm_4_6), but ChatGLM remains the name attached specifically to the 2023-era 6B chat models that first brought the GLM architecture to a broad open-source audience.

## References

1. [ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools (arXiv:2406.12793)](https://arxiv.org/abs/2406.12793)
2. [GLM: General Language Model Pretraining with Autoregressive Blank Infilling (arXiv:2103.10360)](https://arxiv.org/abs/2103.10360)
3. [GLM-130B: An Open Bilingual Pre-trained Model (arXiv:2210.02414)](https://arxiv.org/abs/2210.02414)
4. [THUDM/ChatGLM2-6B GitHub repository (README)](https://github.com/zai-org/ChatGLM2-6B/blob/main/README_EN.md)
5. [THUDM/ChatGLM3 GitHub repository (README)](https://github.com/zai-org/ChatGLM3/blob/main/README_en.md)
6. [THUDM/ChatGLM-6B GitHub repository (README)](https://github.com/zai-org/ChatGLM-6B/blob/main/README_en.md)
7. [Zhipu AI launches third-generation ChatGLM3 at CNCC 2023, October 2023](https://www.aibase.com/news/3091)