InternLM
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,531 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,531 words
Add missing citations, update stale details, or suggest a clearer explanation.
InternLM (Chinese name Shusheng Puyu, 书生·浦语) is a series of open large language models developed by Shanghai AI Laboratory together with partners that include SenseTime, the Chinese University of Hong Kong, and Fudan University. The project began in mid 2023 with 7B and 20B models, and it has since grown into a family spanning InternLM, InternLM2, InternLM2.5, and InternLM3, alongside related multimodal and domain-specific models such as InternLM-XComposer and InternLM-Math. The base and instruction-tuned weights are released openly, and from the second generation onward the code and weights have been published under the permissive Apache-2.0 license. [1][2]
The name Shusheng (书生, "scholar") is the umbrella brand Shanghai AI Laboratory uses for its Intern series of foundation models, and Puyu (浦语) is the language-model line within it. The first public artifact was an internal foundation model of 104B parameters described in an early technical report, pre-trained on roughly 1.6T tokens of multilingual data. That 104B model was not released; instead, on 6 July 2023 the lab open-sourced a 7B derivative, InternLM-7B, providing a base model and a chat-tuned variant aimed at practical use. At launch the code was Apache-2.0 but the weights required written permission for commercial use, a restriction the lab relaxed in later updates so that the weights became free for commercial use after a registration step. [3][8]
InternLM-20B followed on 20 September 2023. It was a deliberately deeper network, 60 layers rather than the 32 to 40 layers typical of 7B to 13B models, pre-trained on over 2.3T tokens of English, Chinese, and code. Shanghai AI Laboratory positioned it against larger open models of the time, and its reported scores led the 13B-to-33B size band on several benchmarks. The 20B weights were published with terms stating that the code is Apache-2.0 while the weights are open for academic research and also allow free commercial usage. [4]
InternLM2 arrived in early 2024. The 7B and 20B models were released on 17 January 2024, with a 1.8B base and chat pair following on 31 January 2024. Each size shipped in several forms: a base model, an SFT-only chat model (often labeled internlm2-chat-*-sft), and a fully aligned chat model. A dedicated reward-model series, InternLM2-Reward in 1.8B, 7B, and 20B, was added on 19 July 2024. [1]
The accompanying InternLM2 Technical Report (arXiv:2403.17297) was submitted on 26 March 2024 with a large author list drawn from Shanghai AI Laboratory and collaborators. It reports that InternLM2 outperforms its predecessors across 6 capability dimensions and 30 benchmarks, and it is unusually detailed about the engineering behind the models. The report describes the data preparation pipeline for text, code, and long-context data; pre-training procedures; and the alignment stack. For alignment, InternLM2 uses supervised fine-tuning (SFT) followed by a reinforcement-learning method the authors call Conditional Online RLHF (COOL RLHF), designed to reconcile conflicting human preferences and to reduce reward hacking over multiple rounds of training. The architecture adopts Grouped-Query Attention (GQA) to keep the memory footprint manageable when serving long sequences. [5][6]
A central theme of InternLM2 is long-context handling. During pre-training the model is first trained on 4K-token texts and then on high-quality 32K-token texts, and positional-encoding extrapolation extends usable context well beyond the training length. The report demonstrates the result with the "Needle-in-a-Haystack" retrieval test at 200K tokens, where the model reliably locates inserted facts. The lab also constructed 32K data during SFT and RLHF so that long-context ability survives alignment rather than degrading after instruction tuning. [5][6]
The long-context line was pushed further in the next generation. InternLM2.5 introduced a 7B chat variant, InternLM2.5-7B-Chat-1M, trained to operate over a 1M-token context. Shanghai AI Laboratory reported near-full accuracy on needle-in-a-haystack retrieval at 1M tokens and competitive results on long-document suites such as LongBench and L-Eval, while keeping the 1M model's general performance close to the standard 7B chat model. [1][7]
InternLM2.5 was unveiled at the World Artificial Intelligence Conference (WAIC) in Shanghai in early July 2024. The 7B family (base, chat, and the 1M chat variant) was released on 3 July 2024, and the 1.8B and 20B models followed on 5 August 2024. The line kept the InternLM2 architecture and leaned on large volumes of synthetic data and an iterative "capability flywheel" to improve reasoning, with the lab citing roughly a 20 percent gain in reasoning over InternLM2 at the 20B scale, plus stronger tool use and the ability to gather and synthesize information from many web pages. [1][2][7]
InternLM3 narrowed the family to a single flagship size. InternLM3-8B-Instruct was released on 15 January 2025. Its headline claim is efficiency: it was trained on only 4 trillion high-quality tokens, which the lab says cuts training cost by more than 75 percent versus comparable models, while still beating Llama3.1-8B and Qwen2.5-7B on a range of reasoning and knowledge tasks. InternLM3 also adds a dual-mode interface: a normal response mode for ordinary conversation and a deep thinking mode that produces a long chain-of-thought (allocating up to 8192 tokens to the reasoning trace) for harder problems. Long-context behavior is reported on the RULER benchmark across a 4K-to-128K range. [1][2]
| Version | First release | Sizes | Notable context | License |
|---|---|---|---|---|
| InternLM (1st gen) | 6 Jul 2023 (7B); 20 Sep 2023 (20B) | 7B, 20B | 16K (20B, via extrapolation) | Apache-2.0 code; weights free for commercial use after registration |
| InternLM2 | 17 Jan 2024 | 1.8B, 7B, 20B | 200K (needle-in-a-haystack) | Apache-2.0 |
| InternLM2.5 | 3 Jul 2024 | 1.8B, 7B, 20B | 1M (7B-Chat-1M) | Apache-2.0 |
| InternLM3 | 15 Jan 2025 | 8B | 128K (RULER) | Apache-2.0 |
InternLM is the language-model core of a broader Intern ecosystem. The most prominent multimodal sibling is InternLM-XComposer, a vision-language model for text-image comprehension and composition first described in arXiv:2309.15112 (2023). InternLM-XComposer2 (7B) was released on 26 January 2024, a 1.8B variant followed on 9 April 2024, and InternLM-XComposer-2.5 arrived in July 2024 with a 24K interleaved image-text context that extends to 96K. A separate multimodal series, InternVL, is developed in the same orbit (with OpenGVLab) and is often used together with InternLM language backbones. On the domain side, InternLM-Math is a bilingual model specialized for mathematical reasoning. The training, fine-tuning, and serving toolchain (including XTuner for fine-tuning and LMDeploy for inference) is maintained under the same GitHub organization. [9][10]
The tables below collect figures reported by Shanghai AI Laboratory on the Hugging Face model cards and the project repository. Numbers come from different evaluation setups across generations, so they are best read within a row rather than directly across versions.
InternLM2.5-7B (base) compared with similarly sized open models:
| Benchmark | InternLM2.5-7B | Llama-3-8B | Yi-1.5-9B |
|---|---|---|---|
| MMLU (5-shot) | 71.6 | 66.4 | 71.6 |
| CMMLU (5-shot) | 79.1 | 51.0 | 74.1 |
| BBH (3-shot) | 70.1 | 59.7 | 71.1 |
| MATH (4-shot) | 34.0 | 16.4 | 31.9 |
| GSM8K (4-shot) | 74.8 | 54.3 | 74.5 |
| GPQA (0-shot) | 31.3 | 31.3 | 27.8 |
InternLM3-8B-Instruct compared with peer instruction models (MATH-500 score uses deep thinking mode):
| Benchmark | InternLM3-8B | Qwen2.5-7B | Llama3.1-8B | GPT-4o-mini |
|---|---|---|---|---|
| CMMLU | 83.1 | 75.8 | 53.9 | 66.0 |
| MMLU | 76.6 | 76.8 | 71.8 | 82.7 |
| MMLU-Pro | 57.6 | 56.2 | 48.1 | 64.1 |
| GPQA-Diamond | 37.4 | 33.3 | 24.2 | 42.9 |
| MATH-500 | 83.0 | 72.4 | 48.4 | 74.0 |
| HumanEval | 82.3 | 85.4 | 72.0 | 86.6 |
| AlpacaEval 2.0 | 51.1 | 30.3 | 25.0 | 50.7 |
For reference, the first-generation InternLM-20B reported MMLU 62.05, C-Eval 58.8, HumanEval 25.61, and MBPP 35.6, which the lab noted were the best results in the 13B-to-33B band at release. [1][2][4][7]
From InternLM2 onward, the project states plainly that "Code and model weights are licensed under Apache-2.0," which permits commercial use subject to including the license text and noting any modifications. The first-generation models used a split arrangement common to Chinese open-weight releases of 2023: the code was Apache-2.0, while the weights were open for academic research and allowed free commercial use, initially after seeking written permission and later via a registration or application step rather than a per-use fee. Commercial users are pointed to a contact address (internlm@pjlab.org.cn) for licensing questions. The practical effect across the family is that the weights can be downloaded, fine-tuned, and deployed commercially without royalties. [1][2][4]