# Qwen

> Source: https://aiwiki.ai/wiki/qwen
> Updated: 2026-06-20
> Categories: Chinese AI, Large Language Models, Natural Language Processing, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Qwen** is a family of open-source and proprietary [large language models](/wiki/large_language_model) (LLMs) and [multimodal models](/wiki/multimodal_model) developed by [Alibaba Cloud](/wiki/alibaba_cloud), the cloud computing division of Chinese technology company [Alibaba Group](/wiki/alibaba), and is the most-downloaded open-weight LLM family in the world.[1][3] As of January 2026, Qwen had surpassed approximately 700 million cumulative downloads on [Hugging Face](/wiki/hugging_face), overtook [Meta](/wiki/meta_ai)'s [LLaMA](/wiki/llama) as the most-downloaded open-source model family in 2025, and had spawned more than 180,000 derivative versions across nearly 400 released models in the Qwen lineup.[3][47] Qwen is also called **[Tongyi Qianwen](/wiki/tongyi_qianwen)** (Chinese: 通义千问; pinyin: Tongyì Qianwèn; literally "to comprehend the meaning, [and to answer] a thousand kinds of questions"), the Chinese brand name under which Alibaba Cloud's Qwen Team ships the models.[2] All top 10 open-source LLMs on Hugging Face's Open LLM Leaderboard were trained and developed on updated open-source versions of Qwen as of February 2025.[5]

A Qwen researcher summarized the team's strategy to Xinhua in January 2026: "Our core goal remains to keep pushing the performance frontier of LLMs while staying committed to open-source openness so that AI can truly help more people around the world."[47]

## What is Qwen?

Qwen spans dense and [Mixture of Experts](/wiki/mixture_of_experts) (MoE) language models from roughly 0.5 billion to over 1 trillion parameters, plus specialized variants for coding, mathematics, vision-language understanding, audio, image generation, translation, and safety. Most open-weight Qwen models since the Qwen3 generation are released under the permissive Apache 2.0 license, while the flagship "Max" and "Omni" hosted tiers are kept proprietary in some generations and served only through Alibaba Cloud's API. The models are widely cited as a foundation for the global open-source AI ecosystem: since January 2025, Chinese fine-tuned or derivative models accounted for 63% of all new fine-tuned or derivative models released on Hugging Face, with Qwen serving as the primary base.[4]

## History

### Early Development

Alibaba first launched a beta version of Qwen on April 11, 2023, during the Alibaba Cloud Summit under the name Tongyi Qianwen.[6] The initial architecture was based on the LLaMA framework developed by [Meta AI](/wiki/meta_ai).[1] Initially, it was integrated into various Alibaba business applications, including the workplace collaboration tool DingTalk and the voice assistant Tmall Genie.[7] The model received approval from the Chinese government and was publicly released in September 2023.[8]

### Open Source Release

In a significant move to foster a broader AI ecosystem, Alibaba Cloud began open-sourcing its models in August 2023. The first models released were **Qwen-7B** and its chat-fine-tuned variant, **Qwen-7B-Chat**.[9] This was followed by the release of **Qwen-1.8B** in November 2023, aimed at low-latency and resource-constrained environments.[10] In December 2023, Alibaba released the 72B parameter model, which demonstrated performance comparable to leading proprietary models like [GPT-3.5](/wiki/gpt-3.5) on several benchmarks.[11]

### Qwen1.5

Released on February 5, 2024, Qwen1.5 expanded the model lineup to include sizes ranging from 0.5B to 110B parameters, all supporting a 32K context window.[12] This generation introduced Group Query [Attention](/wiki/attention) (GQA) across all model sizes, improving inference speed and reducing memory usage. The release also included **CodeQwen1.5-7B**, a code-specialized variant trained on 3 trillion tokens of code data, supporting a 64K context window for long code comprehension and generation.[13]

### Qwen2

On June 6, 2024, Alibaba Cloud released the Qwen2 series, representing a substantial leap in model quality and multilingual support.[14] The Qwen2 family consists of five model sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B (a [Mixture of Experts](/wiki/mixture_of_experts) model), and Qwen2-72B. The MoE variant, Qwen2-57B-A14B, activates only 14 billion parameters per forward pass while maintaining the performance level of a 30-billion-parameter dense model, offering significant efficiency gains.[15]

Qwen2 was trained on 7 trillion tokens and introduced support for 27 additional languages beyond English and Chinese, including German, Italian, Arabic, Persian, and Hebrew.[14] The Qwen2-72B model topped the Hugging Face Open LLM Leaderboard for open-source models upon release.[16] Qwen2-7B-Instruct and Qwen2-72B-Instruct both support extended context lengths of up to 128K tokens. The Qwen2 technical report, published on July 15, 2024, detailed architectural improvements including reduced Key-Value (KV) cache sizes compared to Qwen1.5, translating to a smaller memory footprint during long-context inference.[15]

### Qwen2.5

On September 19, 2024, Alibaba released the Qwen2.5 family, a major update encompassing over 100 open-source models across the language, coding, and mathematics domains.[17] The base Qwen2.5 language models are available in seven sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, all supporting 128K token context windows and capable of generating up to 8K tokens in a single response.

Qwen2.5 was trained on 18 trillion tokens, a significant increase from Qwen2's 7 trillion tokens. This expanded training corpus resulted in substantial improvements across benchmarks. Qwen2.5-72B-Instruct achieved 86.1 on [MMLU](/wiki/mmlu) (up from Qwen2-72B's 84.2), 83.1 on [MATH](/wiki/math) (up from 69.0), and 55.5 on [LiveCodeBench](/wiki/livecodebench) (up from 32.2), even surpassing the much larger Llama-3.1-405B-Instruct on several critical benchmarks.[17][18]

Alongside the base models, Alibaba released two specialized model families:

- **Qwen2.5-Coder**: Available in six sizes (0.5B, 1.5B, 3B, 7B, 14B, and 32B), these models were trained on 5.5 trillion tokens of code-related data. Qwen2.5-Coder-32B-Instruct became the state-of-the-art open-source code LLM, with coding abilities matching those of [GPT-4o](/wiki/gpt_4o) on benchmarks including [HumanEval](/wiki/humaneval) (88.2) and [MBPP](/wiki/mbpp).[19]

- **Qwen2.5-Math**: Available in 1.5B, 7B, and 72B sizes, these models specialize in mathematical reasoning using Chain-of-Thought (CoT), Program-of-Thought (PoT), and Tool-Integrated [Reasoning](/wiki/reasoning) (TIR) approaches. Qwen2.5-Math-72B-Instruct surpassed both Qwen2-Math-72B-Instruct and [GPT-4o](/wiki/gpt_4o) on mathematical benchmarks, achieving 83.1 on MATH and 72.0 on MathVista.[20]

### Qwen2.5-Max

On January 28, 2025, Alibaba released Qwen2.5-Max, a large-scale Mixture of Experts model pre-trained on over 20 trillion tokens and further post-trained using Supervised Fine-Tuning (SFT) and [Reinforcement Learning from Human Feedback](/wiki/reinforcement_learning) ([RLHF](/wiki/rlhf)).[21] Qwen2.5-Max outperformed [DeepSeek](/wiki/deepseek) V3 on multiple benchmarks, including Arena-Hard (89.4 vs. 85.5), [LiveBench](/wiki/livebench), LiveCodeBench, and [GPQA](/wiki/gpqa)-Diamond, while also demonstrating competitive results against [Claude](/wiki/claude) 3.5 Sonnet and [GPT-4o](/wiki/gpt_4o). On the [GSM8K](/wiki/gsm8k) mathematics benchmark, Qwen2.5-Max achieved 94.5, well ahead of DeepSeek V3 (89.3). Unlike the open-weight Qwen2.5 models, Qwen2.5-Max is available only through Alibaba Cloud's API service.

### QwQ: Reasoning Models

Alibaba entered the reasoning model space, competing with [OpenAI](/wiki/openai)'s o1 series and [DeepSeek-R1](/wiki/deepseek_r1), through the [QwQ](/wiki/qwq) ("Qwen with Questions") line of models:

- **QwQ-32B-Preview** (November 28, 2024): The first reasoning-focused model in the Qwen family, released as an open-source preview under the Apache 2.0 license. It demonstrated multi-step reasoning capabilities for math, coding, and scientific tasks.[22]

- **QwQ-32B** (March 5, 2025): A refined 32.5-billion-parameter reasoning model with a 131,072-token context window, developed using reinforcement learning with two training phases focused on math/coding skills and general reasoning. QwQ-32B achieved 90.6% on MATH-500, 50.0% on [AIME 2024](/wiki/aime_2024), and 65.2% on GPQA, outperforming OpenAI's o1-preview in mathematical and scientific reasoning benchmarks while requiring significantly less computational power than comparable models like [DeepSeek-R1](/wiki/deepseek_r1) (671B parameters).[23] The release caused Alibaba's stock to jump more than 8%.[24]

### When was Qwen3 released?

Released on April 28-29, 2025, [Qwen3](/wiki/qwen_3) represents the third major generation of the Qwen model family.[25] The release includes both dense models (0.6B, 1.7B, 4B, 8B, 14B, and 32B) and Mixture of Experts models (Qwen3-30B-A3B with 30B total/3B active parameters, and Qwen3-235B-A22B with 235B total/22B active parameters). All Qwen3 models were released under the Apache 2.0 license. The Qwen3 technical report was published on arXiv on May 14, 2025 (arXiv:2505.09388).[39]

Qwen3 was trained on 36 trillion tokens across 119 languages and dialects, doubling the training data from Qwen2.5.[25][39] The flagship Qwen3-235B-A22B achieved 95.6 on Arena-Hard, 85.7 on AIME 2024, and 70.7 on LiveCodeBench, placing it among the top-performing models globally.[25] According to the Qwen3 technical report, the flagship "achieves competitive results in benchmark evaluations against other top-tier models" such as DeepSeek-R1, o1, o3-mini, and Gemini-2.5-Pro.[25][39]

A defining feature of Qwen3 is its hybrid thinking/non-thinking mode capability, allowing users to control the depth of reasoning per query. In thinking mode, the model performs step-by-step chain-of-thought reasoning before delivering a final answer, suitable for complex problems. In non-thinking mode, it provides quick, concise responses for simpler questions. Users can also set a "thinking budget" to balance response quality against latency. The Qwen3 technical report describes this as "the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework," alongside a thinking budget mechanism "allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity."[39]

### Qwen3-Coder

Released on July 22, 2025, [Qwen3-Coder](/wiki/qwen_3_coder) is the code-specialized variant of the Qwen3 line. The flagship **Qwen3-Coder-480B-A35B-Instruct** is a 480-billion-parameter Mixture-of-Experts model with 35 billion active parameters, employing 160 specialized expert networks of which 8 are activated per query.[30] It natively supports a 256K-token context window and can be extrapolated to 1 million tokens, targeting agentic coding, browser-use, and tool-use tasks. The model is released under the Apache 2.0 license and achieves performance comparable to [Claude](/wiki/claude) Sonnet 4 on agentic coding benchmarks.[30]

### Qwen3-Next

Announced on September 11, 2025, **Qwen3-Next-80B-A3B** introduced a new ultra-efficient architecture featuring a hybrid of Gated DeltaNet linear attention and standard gated full attention, paired with a highly sparse Mixture-of-Experts (only 3 billion of 80 billion parameters active per token).[40] Released in both Instruct (non-thinking) and Thinking variants, Qwen3-Next provides over 10x throughput improvement over Qwen3-32B at long-context inference, supports 256K context natively, and is open-sourced under Apache 2.0 on Hugging Face, ModelScope, and Kaggle.[40]

### Qwen3-Max

[Qwen3-Max](/wiki/qwen3_max) is Alibaba's flagship proprietary model in the Qwen3 generation. **Qwen3-Max-Preview** debuted on September 5, 2025, on Alibaba Cloud and OpenRouter, and the full Qwen3-Max model was released on September 23-24, 2025.[41] It is a trillion-parameter-class Mixture-of-Experts model with over 1 trillion parameters, trained on approximately 36 trillion tokens.[41][48] On agentic and coding evaluations, Qwen3-Max-Instruct scored 69.6 on [SWE-bench](/wiki/swe_bench) Verified and 74.8 on Tau2-Bench, with Alibaba reporting that the latter surpassed Claude Opus 4 and DeepSeek V3.1.[48] Unlike the open-weight Qwen3 models, Qwen3-Max is closed-source and accessible only through Alibaba Cloud's API and Qwen Chat. It ranked among the top models on the LMArena text leaderboard at release.[41]

### Qwen3-VL

The vision-language extension of Qwen3, **Qwen3-VL**, launched on September 23, 2025, with the flagship **Qwen3-VL-235B-A22B-Instruct** and **Qwen3-VL-235B-A22B-Thinking** variants.[35] Smaller variants followed in October 2025, including 30B-A3B (Instruct/Thinking) on October 4, 4B and 8B sizes on October 15, and 2B and 32B sizes on October 21. The Qwen3-VL technical report was released on November 27, 2025.[35]

### Qwen3-Omni

Released on September 22-23, 2025, **Qwen3-Omni** is a natively end-to-end omni-modal large language model that can understand text, audio, images, and video while generating real-time speech.[42] It uses a Thinker-Talker Mixture-of-Experts architecture in which the Thinker handles reasoning and multimodal understanding while the Talker produces audio tokens from the Thinker's hidden representations. Three open-source variants were released under the Apache 2.0 license: **Qwen3-Omni-30B-A3B-Instruct**, **Qwen3-Omni-30B-A3B-Thinking**, and **Qwen3-Omni-30B-A3B-Captioner**. The model achieves streaming latency as low as 234 ms for audio and 547 ms for video, and reaches state-of-the-art performance across 32 of 36 audio and audio-visual benchmarks.[42] The Qwen3-Omni technical report was published on arXiv on September 23, 2025 (arXiv:2509.17765).

### Qwen3.5

Announced on February 16, 2026, **Qwen3.5** represents a mid-cycle architectural refresh of the Qwen3 line.[43] The initial release was **Qwen3.5-397B-A17B**, a Mixture-of-Experts model with 397 billion total parameters and 17 billion active parameters per token, accompanied by a hosted **Qwen3.5-Plus** API offering a 1-million-token context window. Alibaba reported that the hybrid linear-attention plus sparse MoE design delivers decoding throughput 8.6 to 19 times faster than Qwen3-Max while retaining native multimodal capability.[43][49] Subsequent open-source releases followed on February 24, 2026 (Qwen3.5-122B-A10B, Qwen3.5-35B-A3B, and Qwen3.5-27B dense) and March 2, 2026 (Qwen3.5-9B, 4B, 2B, and 0.8B).

Qwen3.5 introduces several key architectural changes:[43]

- **Hybrid attention**: A 3:1 ratio combining Gated Delta Networks (linear attention) with standard full attention, with three of every four transformer blocks using linear attention for near-linear scaling on long contexts.
- **Unified vision-language foundation**: Natively multimodal from the ground up, with early-fusion training on trillions of multimodal tokens. This removed the need for a separate Qwen3.5-VL line.
- **Scalable reinforcement learning**: Training across million-agent environments with progressively complex task distributions.
- **Expanded language coverage**: Support for 201 languages and dialects, up from Qwen3's 119.[43][49]

### Qwen3.5-Omni

Released around March 30, 2026, **Qwen3.5-Omni** is the multimodal extension of Qwen3.5, supporting text, audio, image, and video understanding alongside real-time speech generation.[44] The model targets latency-sensitive multimodal interaction with a 256K context window. Unlike Qwen3-Omni, the initial Qwen3.5-Omni release was kept proprietary, with access restricted to the Qwen Chat interface and Alibaba Cloud's API.

### Qwen3.6

In April 2026, the **Qwen3.6** series began rolling out, emphasizing stability, agentic coding, and a new "thinking preservation" feature that maintains reasoning context across multi-turn conversations.[45] The initial open-source releases are **Qwen3.6-35B-A3B** (an MoE model with 3B active parameters, April 16, 2026) and **Qwen3.6-27B** (a dense model, April 22, 2026). The flagship proprietary **Qwen3.6-Max-Preview** debuted on April 20, 2026, and at release claimed the top score on six major coding benchmarks (SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode), supporting a 256K-token context window and adding a preserve_thinking feature for multi-turn agentic workflows.[46][50]

### Development Timeline

| Date (UTC) | Generation / Model | Key Details |
| --- | --- | --- |
| 2023-04-11 | Tongyi Qianwen (beta) | Initial corporate announcement by Alibaba Cloud for a company-scale LLM initiative.[6] |
| 2023-08-03 | Qwen-7B / Qwen-7B-Chat | First broadly distributed open weights via ModelScope and Hugging Face. Quantized INT4 chat variant followed on 2023-08-21.[9] |
| 2023-09-13 | Qwen (public release) | Model approved by Chinese government for public release.[8] |
| 2023-11-30 | Qwen-72B | 72-billion parameter model released, competitive with GPT-3.5.[11] |
| 2023-11-30 | Qwen-Audio | Multimodal audio-language model supporting 30+ tasks and multiple audio types.[26] |
| 2024-02-05 | Qwen1.5 | Models ranging from 0.5B to 110B parameters with 32K context window.[12] |
| 2024-04 | CodeQwen1.5-7B | Code-specialized model trained on 3 trillion tokens with 64K context window.[13] |
| 2024-06-06 | Qwen2 | Dense and MoE models (0.5B to 72B) trained on 7 trillion tokens with 128K context support.[14] |
| 2024-07-15 | Qwen2 (tech report) | Technical report details five model sizes and efficiency optimizations.[15] |
| 2024-08-09 | Qwen2-Audio | Updated audio-language model supporting 8+ languages, outperforming Gemini-1.5-pro on AIR-Bench.[27] |
| 2024-08-30 | Qwen2-VL | Vision-language series with dynamic resolution and M-RoPE; 2B, 7B, and 72B sizes.[28] |
| 2024-09-19 | Qwen2.5 | 18 trillion token training, 128K context, seven model sizes (0.5B to 72B).[17] |
| 2024-09-19 | Qwen2.5-Coder | Code-specialized models (0.5B to 32B) trained on 5.5T tokens; 32B matches GPT-4o.[19] |
| 2024-09-18 | Qwen2.5-Math | Math-specialized instruction models (1.5B/7B/72B) surpassing GPT-4o on MathVista.[20] |
| 2024-11-28 | QwQ-32B-Preview | Open-source reasoning model preview, competing with OpenAI's o1.[22] |
| 2025-01-28 | Qwen2.5-Max | Large-scale MoE model outperforming DeepSeek V3 on Arena-Hard, LiveBench, and GPQA.[21] |
| 2025-01-29 | Qwen2.5-VL | Enhanced vision-language model with 1-hour video comprehension and temporal reasoning.[29] |
| 2025-03-05 | QwQ-32B | Refined reasoning model; 90.6% MATH-500, 65.2% GPQA; Apache 2.0 license.[23] |
| 2025-03-26 | Qwen2.5-Omni | End-to-end omni-modal model with Thinker-Talker architecture; arXiv:2503.20215.[34] |
| 2025-04-28 | Qwen3 | Thinking/non-thinking modes, 36 trillion tokens, 119 languages, Apache 2.0.[25] |
| 2025-05-14 | Qwen3 (tech report) | arXiv:2505.09388 details hybrid thinking framework across 0.6B-235B scales.[39] |
| 2025-07-22 | Qwen3-Coder | 480B-A35B MoE for agentic coding, 256K context (1M with extrapolation).[30] |
| 2025-07-24 | Qwen-MT | Translation model supporting 92 languages.[31] |
| 2025-08-04 | Qwen-Image | 20B MMDiT image generation model with complex text rendering.[32] |
| 2025-08-19 | Qwen-Image-Edit | Image editing extension with precise text and appearance control.[32] |
| 2025-09-05 | Qwen3-Max-Preview | Trillion-parameter MoE preview on Alibaba Cloud and OpenRouter.[41] |
| 2025-09-11 | Qwen3-Next-80B-A3B | Hybrid Gated DeltaNet + sparse MoE, 10x throughput vs. Qwen3-32B.[40] |
| 2025-09-22 | Qwen3-Omni | Open-source omni-modal Thinker-Talker MoE; arXiv:2509.17765.[42] |
| 2025-09-23 | Qwen3-VL | 235B-A22B Instruct/Thinking vision-language models; smaller sizes followed in October.[35] |
| 2025-09-23 | Qwen3Guard | Safety guardrail model for real-time moderation.[33] |
| 2025-09-23 | Qwen3-Max | Flagship trillion-parameter MoE; closed-source via Alibaba Cloud.[41] |
| 2026-02-16 | Qwen3.5 | Qwen3.5-397B-A17B with hybrid Gated DeltaNet attention, 201 languages.[43] |
| 2026-03-30 | Qwen3.5-Omni | Proprietary multimodal extension of Qwen3.5 with 256K context.[44] |
| 2026-04-16 | Qwen3.6-35B-A3B | MoE focused on agentic coding and thinking preservation.[45] |
| 2026-04-20 | Qwen3.6-Max-Preview | Flagship proprietary preview topping six coding benchmarks.[46][50] |
| 2026-04-22 | Qwen3.6-27B | Dense model optimized for coding and stability.[45] |

## Architecture and Technical Features

### Core Architecture

Qwen models are based on the [Transformer](/wiki/transformer) architecture, the standard for modern LLMs. Key architectural features include:

- **Attention Mechanism**: Uses self-attention with Group Query Attention (GQA) introduced in Qwen1.5 and expanded across the Qwen2 and later series. GQA groups multiple query heads under shared key-value heads, improving inference speed and reducing memory usage compared to standard multi-head attention.[14] Beginning with Qwen3-Next (September 2025) and continuing in Qwen3.5 (February 2026), the architecture adopts a hybrid of Gated Delta Networks (linear attention) and standard gated full attention for efficient long-context scaling.[40][43]

- **Tokenizer**: Custom tokenizer with over 150,000 token vocabulary size, efficiently representing text from multiple languages and reducing token count for non-English text.[12]

- **Position [Embeddings](/wiki/embeddings)**: Evolution from Absolute Position Embeddings (ALiBi) in early models to Rotary Position Embeddings (RoPE) for better long-context performance. Multimodal variants use M-RoPE (Multimodal [Rotary Position Embedding](/wiki/rotary_position_embedding)) to decompose positional information into 1D textual, 2D visual, and 3D video components. Qwen2.5-Omni introduces TMRoPE (Time-aligned Multimodal RoPE) for synchronizing video and audio timestamps.[28][34]

- **Architecture Types**: Both dense and Mixture of Experts (MoE) variants exist. MoE models activate only a subset of parameters (called "experts") per token, allowing larger total parameter counts with lower computational cost. For example, Qwen3-235B-A22B has 235 billion total parameters but activates just 22 billion per token, and Qwen3-Coder-480B-A35B activates 35 billion of its 480 billion parameters via 8 of 160 experts.[25][30]

### Training Data Scale

The evolution of training data across generations demonstrates aggressive scaling:

| Generation | Training Tokens | Languages |
| --- | --- | --- |
| Qwen (2023) | Not disclosed | Chinese, English, multilingual |
| Qwen2 (June 2024) | 7 trillion | 29 languages |
| Qwen2.5 (September 2024) | 18 trillion | 29+ core languages |
| Qwen2.5-Max (January 2025) | 20+ trillion | 29+ core languages |
| Qwen3 (April 2025) | 36 trillion | 119 languages and dialects |
| Qwen3-Max (September 2025) | ~36 trillion | 119+ languages |
| Qwen3.5 (February 2026) | Undisclosed | 201 languages and dialects |

The pre-training data includes high-quality Chinese language data, multilingual text, code, mathematics, and multimodal data, with extensive filtering and deduplication pipelines to ensure data quality.

### Model Sizes and Variants

#### Qwen2 Model Sizes

| Model | Parameters | Type | Context Length | Key Feature |
| --- | --- | --- | --- | --- |
| Qwen2-0.5B | 0.5B | Dense | 32K | Ultra-lightweight for edge devices |
| Qwen2-1.5B | 1.5B | Dense | 32K | Low-resource deployment |
| Qwen2-7B | 7B | Dense | 128K | General-purpose, long context |
| Qwen2-57B-A14B | 57B (14B active) | MoE | 64K | Efficient MoE, performance of 30B dense |
| Qwen2-72B | 72B | Dense | 128K | Flagship, topped Open LLM Leaderboard |

#### Qwen2.5 Model Sizes

| Model | Parameters | Type | Context Length | Key Feature |
| --- | --- | --- | --- | --- |
| Qwen2.5-0.5B | 0.5B | Dense | 128K | Smallest, for mobile/edge |
| Qwen2.5-1.5B | 1.5B | Dense | 128K | Light deployment |
| Qwen2.5-3B | 3B | Dense | 128K | New size tier |
| Qwen2.5-7B | 7B | Dense | 128K | Balanced performance/cost |
| Qwen2.5-14B | 14B | Dense | 128K | Mid-range |
| Qwen2.5-32B | 32B | Dense | 128K | High performance |
| Qwen2.5-72B | 72B | Dense | 128K | Flagship open-weight model |

#### Qwen3 Model Sizes

| Model | Parameters | Type | Context Length | Key Feature |
| --- | --- | --- | --- | --- |
| Qwen3-0.6B | 0.6B | Dense | 32K | Ultra-lightweight |
| Qwen3-1.7B | 1.7B | Dense | 32K | Edge deployment |
| Qwen3-4B | 4B | Dense | 32K | Mobile-friendly |
| Qwen3-8B | 8B | Dense | 128K | General-purpose |
| Qwen3-14B | 14B | Dense | 128K | Mid-range performance |
| Qwen3-32B | 32B | Dense | 128K | High performance |
| Qwen3-30B-A3B | 30B (3B active) | MoE | 128K | Efficient MoE for constrained hardware |
| Qwen3-235B-A22B | 235B (22B active) | MoE | 128K | Flagship, competitive with top proprietary models |
| Qwen3-Next-80B-A3B | 80B (3B active) | Hybrid MoE | 256K | Hybrid Gated DeltaNet, 10x throughput[40] |
| Qwen3-Coder-480B-A35B | 480B (35B active) | MoE (160 experts/8 active) | 256K (1M extrap.) | Agentic coding flagship[30] |
| Qwen3-Max | ~1T (proprietary) | MoE | API only | Closed-source flagship[41] |

#### Qwen3.5 Model Sizes

| Model | Parameters | Type | Context Length | Release |
| --- | --- | --- | --- | --- |
| Qwen3.5-0.8B | 0.8B | Dense | Standard | March 2, 2026 |
| Qwen3.5-2B | 2B | Dense | Standard | March 2, 2026 |
| Qwen3.5-4B | 4B | Dense | Standard | March 2, 2026 |
| Qwen3.5-9B | 9B | Dense | Standard | March 2, 2026 |
| Qwen3.5-27B | 27B | Dense | Long | February 24, 2026 |
| Qwen3.5-35B-A3B | 35B (3B active) | Hybrid MoE | Long | February 24, 2026 |
| Qwen3.5-122B-A10B | 122B (10B active) | Hybrid MoE | Long | February 24, 2026 |
| Qwen3.5-397B-A17B | 397B (17B active) | Hybrid MoE | 1M (Plus tier) | February 16, 2026 (flagship)[43] |

### What is the difference between Qwen's thinking and non-thinking modes?

Qwen3 introduces a hybrid approach to problem-solving with two distinct inference modes:[25]

- **Thinking Mode**: The model takes time to reason step by step before delivering the final answer, similar to the approach used by OpenAI's o1 and DeepSeek-R1. This mode is suitable for complex problems in mathematics, coding, and scientific reasoning.

- **Non-Thinking Mode**: Provides quick, near-instant responses for simpler questions where speed is prioritized over depth of reasoning.

- **Thinking Budget**: Users can control how much "thinking" the model performs, allowing fine-grained trade-offs between response quality and latency depending on the task at hand. The Qwen3 technical report frames the thinking budget as a mechanism "allowing users to allocate computational resources adaptively during inference."[39]

This dual-mode capability is available across all Qwen3 models and can be toggled within a single conversation. Subsequent generations (Qwen3-Next, Qwen3-Omni, Qwen3.5) released distinct "Instruct" (non-thinking) and "Thinking" model variants rather than always toggling within one checkpoint. Qwen3.6 introduces a "thinking preservation" mechanism to retain reasoning context across multi-turn conversations.[45]

## Multimodal Capabilities

### Qwen-VL Series (Vision-Language)

The Qwen-VL series represents Qwen's multimodal models that process both text and images. Each generation has expanded the capabilities significantly.

#### Qwen2-VL

Released on August 30, 2024, Qwen2-VL introduced several architectural innovations for vision-language understanding:[28]

- **Naive Dynamic Resolution**: Processes images of varying resolutions by mapping them into a dynamic number of visual tokens, maintaining consistency between model input and the inherent information in the image.
- **Multimodal Rotary Position Embedding (M-RoPE)**: Decomposes positional embedding to capture 1D textual, 2D visual, and 3D video positional information for more effective multimodal fusion.
- **Video Understanding**: Supports videos over 20 minutes in length, enabling video-based question answering and content creation.
- **Multilingual OCR**: Understands text in images across most European languages, Japanese, Korean, Arabic, and Vietnamese.
- Available in 2B, 7B, and 72B parameter sizes, with the model achieving state-of-the-art results on MathVista, DocVQA, RealWorldQA, and MTVQA benchmarks.

#### Qwen2.5-VL

Released on January 29, 2025, this version brought significant enhancements:[29]

- **Extended Video Comprehension**: Can process videos over 1 hour in length, with the ability to pinpoint specific moments within videos down to the exact second.
- **Dynamic FPS Sampling**: Extends dynamic resolution to the temporal dimension, enabling comprehension of videos at various sampling rates.
- **Enhanced OCR**: Multi-scenario, multi-language, and multi-orientation text recognition capabilities.
- **Visual Localization**: Bounding box generation for object detection and spatial understanding.
- **Structured Output**: Generation of structured data from documents, forms, and tables.
- Available in 3B, 7B, and 72B sizes.

#### Qwen3-VL

Launched on September 23, 2025, the Qwen3-VL series initially shipped as **Qwen3-VL-235B-A22B-Instruct** and **Qwen3-VL-235B-A22B-Thinking**, with smaller 30B-A3B variants on October 4, 4B/8B on October 15, and 2B/32B on October 21.[35] The technical report appeared on November 27, 2025. Key features include:

- Visual [Agent](/wiki/agent) capabilities for PC and mobile GUI operation
- Advanced spatial perception and 3D grounding
- DeepStack architecture for fine-grained visual detail capture
- Text-Timestamp Alignment for precise video event localization
- Both Instruct and Thinking inference variants

### Qwen-Audio Series

The Qwen-Audio line provides audio understanding capabilities integrated with language models.

#### Qwen-Audio (November 2023)

The original Qwen-Audio model, released on November 30, 2023, was a fundamental multi-task audio-language model supporting over 30 tasks across multiple audio types (human speech, natural sounds, music, and song). It achieved state-of-the-art results on Aishell1, CochlScene, ClothoAQA, and VocalSound benchmarks.[26]

#### Qwen2-Audio (August 2024)

Released on August 9, 2024, Qwen2-Audio introduced two key improvements:[27]

- **Voice Chat Mode**: For the first time, users can give voice instructions directly to the audio-language model without separate Automatic Speech Recognition (ASR) modules.
- **Audio Analysis Mode**: Processes speech, sound, and music with text-based instructions.
- Supports 8+ languages and dialects including Chinese, English, Cantonese, French, Italian, Spanish, German, and Japanese.
- Qwen2-Audio outperformed previous state-of-the-art models, including [Gemini](/wiki/gemini)-1.5-pro, on the AIR-Bench evaluation for audio-centric instruction-following capabilities.
- Optimized for audio clips under 30 seconds.

### Qwen2.5-Omni

Released on March 26-27, 2025, with technical report on arXiv (2503.20215), Qwen2.5-Omni introduced a unique Thinker-Talker architecture:[34]

- Simultaneous text and speech generation
- Real-time voice and video chat support
- TMRoPE (Time-aligned Multimodal RoPE) for synchronizing video and audio timestamps
- Block-wise processing in audio and visual encoders for streaming
- Sliding-window DiT for low-latency streaming audio token decoding
- Processing of text, images, videos, and audio inputs
- Bilingual support (English/Chinese) with low-latency interaction

### Qwen3-Omni

Released on September 22-23, 2025, with technical report (arXiv:2509.17765), Qwen3-Omni extended the Thinker-Talker paradigm into a Mixture-of-Experts architecture.[42] Three Apache 2.0 variants were released:

- **Qwen3-Omni-30B-A3B-Instruct**: Standard non-thinking variant.
- **Qwen3-Omni-30B-A3B-Thinking**: Reasoning-focused variant.
- **Qwen3-Omni-30B-A3B-Captioner**: Specialized for visual captioning.

Streaming latency reaches 234 ms for audio and 547 ms for video. Qwen3-Omni reached state-of-the-art performance on 32 of 36 audio and audio-visual benchmarks, including outperforming Gemini-2.5-Pro, Seed-ASR, and GPT-4o-Transcribe on key tasks.[42]

### Qwen3.5-Omni

Released around March 30, 2026, Qwen3.5-Omni is a proprietary multimodal model supporting text, audio, image, and video understanding with real-time speech generation, providing a 256K-token context window.[44] Access is limited to Qwen Chat and Alibaba Cloud's API at launch.

### QVQ (Visual Reasoning Model)

QVQ-72B-Preview is an experimental research model for enhanced visual reasoning that scored 70.3% on [MMMU](/wiki/mmmu) (Multimodal Massive Multi-task Understanding), with superior performance on MathVision and OlympiadBench for advanced multidisciplinary understanding.[3]

## Specialized Models

### Coding Models

Qwen's coding model lineup has evolved through three generations:

| Model | Release | Parameters | Training Data | Context | Performance |
| --- | --- | --- | --- | --- | --- |
| CodeQwen1.5-7B | April 2024 | 7B | 3T tokens of code | 64K | Strong on text-to-SQL and bug fixing[13] |
| Qwen2.5-Coder | September 2024 | 0.5B to 32B | 5.5T tokens of code | 128K | 88.2 on HumanEval; 32B matches GPT-4o[19] |
| Qwen3-Coder | July 2025 | 480B-A35B (MoE) | Extended code corpus | 256K (1M extrap.) | Agentic coding flagship comparable to Claude Sonnet 4[30] |

Qwen2.5-Coder represented a major leap, with six model sizes covering everything from lightweight on-device code completion (0.5B) to full-featured code generation, reasoning, and fixing at the 32B scale. The Qwen2.5-Coder-32B-Instruct model became the state-of-the-art open-source code LLM upon release, matching GPT-4o's coding abilities. Qwen3-Coder-480B-A35B-Instruct subsequently set new state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use benchmarks.[30]

### Mathematics Models

| Model | Release | Parameters | Key Capabilities |
| --- | --- | --- | --- |
| Qwen2-Math | June 2024 | 1.5B, 7B, 72B | Chain-of-Thought mathematical reasoning |
| Qwen2.5-Math | September 2024 | 1.5B, 7B, 72B | CoT, PoT, and TIR; 72.0 on MathVista; surpasses GPT-4o[20] |

Qwen2.5-Math supports both Chinese and English and uses multiple reasoning approaches: Chain-of-Thought (natural language step-by-step), Program-of-Thought (generating code to solve problems), and Tool-Integrated Reasoning (combining language reasoning with computational tools). Even the small 1.5B variant achieves competitive performance against much larger general-purpose models.

### Other Specialized Models

| Variant | Release Date | Focus | Key Features |
| --- | --- | --- | --- |
| Qwen-MT | July 2025 | Translation | 92 languages covering 95% of global population; reinforcement learning for accuracy[31] |
| Qwen-Image | August 2025 | Image Generation | 20B MMDiT model; complex text rendering, multi-line layouts[32] |
| Qwen-Image-Edit | August 2025 | Image Editing | Precise text editing, semantic and appearance control[32] |
| Qwen3Guard | September 2025 | Safety | Real-time moderation, risk classification; state-of-the-art on multilingual safety benchmarks[33] |

## Performance and Benchmarks

### Cross-Generation Comparison

The following table summarizes benchmark improvements across Qwen generations for flagship models:

| Benchmark | Qwen2-72B | Qwen2.5-72B-Instruct | Qwen3-235B-A22B |
| --- | --- | --- | --- |
| MMLU | 84.2 | 86.1 | 88.5 (MMLU-Redux) |
| MATH | 69.0 | 83.1 | 85.7 (AIME 2024) |
| LiveCodeBench | 32.2 | 55.5 | 70.7 |
| Arena-Hard | N/A | N/A | 95.6 |

### Qwen2.5-Max Benchmarks (January 2025)

| Benchmark | Qwen2.5-Max | DeepSeek V3 | Claude 3.5 Sonnet | GPT-4o |
| --- | --- | --- | --- | --- |
| Arena-Hard | 89.4 | 85.5 | 85.2 | Comparable |
| MMLU-Pro | 76.1 | 75.9 | 78.0 | Comparable |
| GSM8K | 94.5 | 89.3 | N/A | N/A |
| LiveCodeBench | Leading | Below Qwen2.5-Max | N/A | N/A |
| GPQA-Diamond | Leading | Below Qwen2.5-Max | N/A | N/A |

### QwQ-32B Reasoning Benchmarks (March 2025)

| Benchmark | QwQ-32B | DeepSeek-R1 | OpenAI o1-preview |
| --- | --- | --- | --- |
| MATH-500 | 90.6% | Comparable | Below QwQ-32B |
| AIME 2024 | 50.0% | Comparable | Below QwQ-32B |
| GPQA | 65.2% | Comparable | Below QwQ-32B |
| LiveCodeBench | 50.0% | Comparable | N/A |
| Parameters | 32.5B | 671B | Proprietary |

QwQ-32B's strong performance at just 32.5 billion parameters, compared to DeepSeek-R1's 671 billion, highlights the efficiency gains achieved through Alibaba's reinforcement learning training methodology.

### Qwen3 Performance (April 2025)

| Benchmark | Qwen3-235B-A22B | Qwen3-30B-A3B | QwQ-32B |
| --- | --- | --- | --- |
| Arena-Hard | 95.6 | 91.0 | 89.5 |
| AIME 2024 | 85.7 | 80.4 | 50.0 |
| LiveCodeBench | 70.7 | N/A | 50.0 |
| CodeForces Elo | N/A | 1974 | N/A |

### Qwen3-Max Agentic Benchmarks (September 2025)

The trillion-parameter Qwen3-Max emphasized agentic coding and tool use. On the SWE-bench Verified benchmark of real GitHub issue resolution, Qwen3-Max-Instruct scored 69.6, and on Tau2-Bench (agentic tool use) it scored 74.8, which Alibaba reported as surpassing Claude Opus 4 and DeepSeek V3.1.[48]

| Benchmark | Qwen3-Max-Instruct | Comparison |
| --- | --- | --- |
| SWE-bench Verified | 69.6 | Strong agentic coding among frontier models[48] |
| Tau2-Bench | 74.8 | Reported to surpass Claude Opus 4 and DeepSeek V3.1[48] |

## Capabilities

Qwen models support a comprehensive array of tasks across multiple domains:

- **Multilingual Processing**: Core models handle 29 languages, with Qwen3 extending to 119 languages and dialects, Qwen3.5 expanding to 201 languages and dialects, and Qwen-MT covering 92 languages for translation (representing 95% of the global population).[25][31][43]

- **Long-Context Understanding**: 128K tokens across Qwen2/Qwen2.5/Qwen3, 256K tokens natively in Qwen3-Next and Qwen3-Coder (extrapolable to 1M for Qwen3-Coder), and a 1-million-token context window in the hosted Qwen3.5-Plus tier.[30][40][43]

- **Coding and Mathematics**: Specialized models achieve state-of-the-art results, with Qwen2.5-Coder scoring 88.2 on HumanEval, Qwen2.5-Math achieving 72.0 on MathVista, and Qwen3-Coder reaching parity with Claude Sonnet 4 on agentic coding benchmarks.

- **Multimodal Tasks**: Image understanding, video comprehension (up to 1+ hours), audio processing, image generation and editing, and cross-modal reasoning with low-latency speech generation in the Qwen-Omni line.

- **Safety and Moderation**: Qwen3Guard provides real-time detection with categorized risk levels for content filtering.

- **Agentic and Reasoning**: Models like QwQ-32B, Qwen3, and Qwen3-Coder support advanced chain-of-thought reasoning, tool use, multi-step tasks, and agentic workflows for autonomous task completion.

- **Structured Data Analysis**: Enhanced capabilities for processing tables, forms, and structured documents, with models able to generate structured JSON output from visual inputs.

- **Real-time Interaction**: Support for low-latency voice and video chat through Qwen2.5-Omni and Qwen3-Omni, with streaming latency as low as 234 ms for audio in Qwen3-Omni.[42]

## Open-Source Strategy and Licensing

Alibaba Cloud's approach to open-sourcing the Qwen family has evolved significantly over time, becoming increasingly permissive.

### Is Qwen open source?

Most open-weight Qwen models are open source. Since the Qwen3 generation, all open-weight Qwen3 models have been released under the permissive Apache 2.0 license, which allows any organization to use, modify, and distribute the models without restriction.[25] Alibaba retains closed weights only for its flagship "Max" and "Omni" hosted tiers in some generations (for example Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, and Qwen3.6-Max), which are served exclusively through Alibaba Cloud's API. By January 2026, Alibaba had open-sourced nearly 400 models in the Qwen lineup, the open foundation behind more than 180,000 derivative versions on Hugging Face.[47]

### Licensing Timeline

| Period | License | Scope |
| --- | --- | --- |
| 2023 (Qwen) | Tongyi Qianwen LICENSE | Restricted; commercial use over 100M MAU requires approval |
| 2024 (Qwen1.5, Qwen2) | Mixed | Most models Apache 2.0; some larger models under Tongyi Qianwen license |
| 2024 (Qwen2.5) | Mostly Apache 2.0 | Most models Apache 2.0; select variants under Qianwen license |
| 2025 (Qwen3) | Apache 2.0 | All open-weight Qwen3 models released under Apache 2.0; Qwen3-Max closed-source |
| 2025 (Qwen3-Omni, Qwen3-Coder, Qwen3-Next, Qwen3-VL) | Apache 2.0 | All open-weight variants released for research and commercial use |
| 2026 (Qwen3.5) | Apache 2.0 (open variants) | Open-weight variants 0.8B-397B Apache 2.0; Qwen3.5-Plus hosted; Qwen3.5-Omni closed-source |
| 2026 (Qwen3.6) | Apache 2.0 (open variants) | Qwen3.6-27B and 35B-A3B open under Apache 2.0; Qwen3.6-Max closed-source |

The shift to Apache 2.0 licensing across most of the Qwen3 and Qwen3.5 lineups removed barriers for commercial adoption, allowing any organization to use, modify, and distribute the models without restrictions. This open approach has been a major driver of Qwen's rapid community adoption and the proliferation of derivative models, although Alibaba retains closed weights for its flagship "Max" and "Omni" hosted tiers in some generations.

### Distribution Channels

Qwen models are distributed through multiple platforms:

- [Hugging Face](/wiki/hugging_face): Primary distribution platform for the international community[2]
- [ModelScope](/wiki/modelscope): Alibaba's model hosting platform, popular in China[2]
- GitHub: Source code, training scripts, and documentation[36]
- Kaggle: Additional distribution for some Qwen3-Next and Qwen3-Coder variants[40]

## Deployment and Accessibility

### Commercial API Services

Alibaba Cloud provides commercial access to Qwen models through several channels:[37]

- **Alibaba Cloud Model Studio**: A managed platform for deploying and fine-tuning Qwen models, offering both OpenAI-compatible APIs and the native DashScope SDK.
- **DashScope API**: The native API interface providing the most complete set of features and parameters, with regional endpoints for China (Beijing), International, US, and Hong Kong.
- **Qwen Chat** (chat.qwen.ai): A free web-based chat interface for interacting with the latest Qwen models directly.
- **OpenAI-Compatible API**: Model Studio provides an OpenAI-compatible endpoint, allowing developers to switch from OpenAI to Qwen with minimal code changes.

The API provides access to models not available as open weights, including Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, and Qwen3.6-Max-Preview.

### Deployment Frameworks

Qwen models support deployment through a variety of open-source inference frameworks:[36]

- **[vLLM](/wiki/vllm)**: High-throughput inference with PagedAttention
- **[SGLang](/wiki/sglang)**: Large-scale deployment with structured generation
- **[TensorRT](/wiki/tensorrt)-LLM**: NVIDIA GPU optimization for production workloads
- **[Ollama](/wiki/ollama)**: Local deployment with simple setup for individual developers
- **[llama.cpp](/wiki/llama_cpp)**: CPU and GPU inference using [GGUF](/wiki/gguf) quantized formats
- Integration with popular AI frameworks including [LangChain](/wiki/langchain), [LlamaIndex](/wiki/llamaindex), and Transformers

## Community Adoption and Impact

The Qwen model family has achieved remarkable adoption milestones since its initial open-source release in 2023. In January 2026, China's official Xinhua news agency reported that Qwen "leads global open-source AI community with 700 million downloads," having overtaken Meta's Llama in cumulative downloads by October 2025.[47]

### How many times has Qwen been downloaded?

| Metric | Value | Date |
| --- | --- | --- |
| Cumulative downloads | ~700 million | January 2026 |
| Derivative versions on Hugging Face | 180,000+ | January 2026[47] |
| Including all tagged models | 200,000+ | Early 2026 |
| Models open-sourced in the Qwen lineup | Nearly 400 | January 2026[47] |
| Most-downloaded LLM family on Hugging Face | Yes (surpassed LLaMA) | 2025 |
| Top 10 Open LLM Leaderboard models built on Qwen | 10 out of 10 | February 2025 |

In December 2025, Qwen's single-month downloads exceeded the combined total of the next eight most popular model families (Meta, DeepSeek, OpenAI, Mistral, [Nvidia](/wiki/nvidia), Zhipu.AI, Moonshot, and [MiniMax](/wiki/minimax)).[4][47] Alibaba as an organization now has more derivative models on Hugging Face than both Google and Meta combined. Since January 2025, Chinese fine-tuned or derivative models accounted for 63% of all new fine-tuned or derivative models released on Hugging Face, with Qwen serving as the primary base.[4]

### Ecosystem Influence

The development trajectory reflects Alibaba's ambition to position Qwen as a foundational "operating system" for AI, analogous to Android in mobile computing.[38] Fine-tuned versions created by the community, such as "Liberated Qwen" by Abacus AI, have removed content restrictions for specialized use cases.[1] The breadth of community-built models spans applications in healthcare, legal, finance, education, customer service, and creative industries.

## Applications

Qwen powers diverse applications across industries:

- **Enterprise AI Solutions**: Document analysis, customer service automation, and business intelligence integrated across Alibaba's ecosystem of products.

- **Software Development**: Code generation, debugging, code review, and agentic coding workflows through Qwen2.5-Coder and Qwen3-Coder.

- **Education**: Personalized tutoring, especially in mathematics (via Qwen2.5-Math) and programming.

- **Healthcare**: Medical document analysis, clinical note processing, and research assistance.

- **E-commerce**: Product descriptions, customer support, and recommendation systems within Alibaba's retail platforms.

- **Creative Content**: Story writing, article generation, image creation (Qwen-Image), and image editing.

- **Translation**: Professional-grade translation across 92 languages through Qwen-MT.

- **Research**: Academic paper analysis, scientific computing, and data analysis.

## How many languages does Qwen support?

Qwen models provide extensive multilingual support. Qwen2 supported 29 languages, Qwen3 supports 119 languages and dialects, and Qwen3.5 expands coverage to 201 languages and dialects, while the Qwen-MT translation model spans 92 languages representing roughly 95% of the global population.[25][31][43] Core language support includes:

- Chinese (Simplified and Traditional)
- English
- French
- Spanish
- Portuguese
- German
- Italian
- Russian
- Japanese
- Korean
- Vietnamese
- Thai
- Arabic
- Turkish
- Indonesian
- Dutch
- Polish
- Swedish
- Hindi
- Hebrew
- Finnish
- Danish
- Norwegian
- Czech
- Hungarian
- Romanian
- Greek
- Bulgarian
- Ukrainian

## Limitations and Considerations

While Qwen models demonstrate strong capabilities, they have known limitations:[39]

- **Language mixing**: Models may unexpectedly switch between languages during generation, particularly in multilingual prompts.

- **Circular reasoning**: Can get stuck in repetitive reasoning loops, particularly in complex multi-step problems when using thinking mode.

- **Safety concerns**: Despite Qwen3Guard, production deployments require additional safety layers and content filtering.

- **Performance gaps**: While strong in math and coding, improvements are still needed in common sense reasoning and nuanced cultural understanding.

- **Context limitations**: Although supporting 128K to 1M token contexts depending on the variant, performance may degrade with extremely long inputs, especially for tasks requiring precise recall from the middle of long documents.

- **Computational requirements**: Larger models (72B dense, 235B and 480B MoE, 397B MoE) require significant GPU resources. Even MoE models, while efficient at inference, still demand multi-GPU setups for self-hosted deployment.

- **API-only models**: Some of the most capable models (Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, Qwen3.6-Max) are available only through Alibaba Cloud's API, limiting self-hosted deployment options for the highest-performing variants.

## See Also

- [Qwen3.6](/wiki/qwen3_6)
- [Qwen3.7-Max](/wiki/qwen3_7_max)
- [Qwen3](/wiki/qwen_3)
- [Qwen3-Max](/wiki/qwen3_max)
- [Qwen3-Coder](/wiki/qwen_3_coder)
- [QwQ](/wiki/qwq)
- [Tongyi Qianwen](/wiki/tongyi_qianwen)
- [Large language model](/wiki/large_language_model)
- [Mixture of Experts](/wiki/mixture_of_experts)
- [Alibaba Cloud](/wiki/alibaba_cloud)
- [DeepSeek](/wiki/deepseek)
- [DeepSeek-R1](/wiki/deepseek_r1)
- [LLaMA](/wiki/llama)
- [GPT-4o](/wiki/gpt_4o)
- [Reinforcement learning](/wiki/reinforcement_learning)
- [Hugging Face](/wiki/hugging_face)
- [Ollama](/wiki/ollama)
- [Transformer](/wiki/transformer)
- [ModelScope](/wiki/modelscope)

## References

[1] "Qwen." Wikipedia. https://en.wikipedia.org/wiki/Qwen

[2] "Qwen (Qwen)." Hugging Face. https://huggingface.co/Qwen

[3] "Alibaba Cloud Unveils New Research Model for Enhanced Visual Reasoning." Alibaba Cloud Community, 2024.

[4] "State of Open Source on Hugging Face: Spring 2026." Hugging Face Blog, 2026.

[5] "All top 10 open-source LLMs on Hugging Face's Open LLM Leaderboard." Hugging Face, February 2025.

[6] "Alibaba Cloud Summit 2023: Tongyi Qianwen Announcement." Alibaba Group, April 2023.

[7] "Alibaba integrates Tongyi Qianwen into DingTalk and Tmall Genie." Alizila, 2023.

[8] "Qwen approved by Chinese government for public release." September 2023.

[9] "Qwen-7B open-source release." ModelScope/Hugging Face, August 2023.

[10] "Qwen-1.8B release for low-latency environments." Alibaba Cloud, November 2023.

[11] "Qwen-72B: competitive with [GPT-3](/wiki/gpt-3).5." Alibaba Cloud, December 2023.

[12] "Qwen1.5 release: 0.5B to 110B with 32K context." Alibaba Cloud, February 2024.

[13] "CodeQwen1.5-7B." Hugging Face. https://huggingface.co/Qwen/CodeQwen1.5-7B

[14] "Hello Qwen2." Qwen Blog. https://qwenlm.github.io/blog/qwen2/

[15] "Qwen2 Technical Report." arXiv:2407.10671, July 2024. https://arxiv.org/abs/2407.10671

[16] "Alibaba Cloud's Qwen2 with Enhanced Capabilities Tops LLM Leaderboard." Alizila, June 2024.

[17] "Qwen2.5: A Party of Foundation Models!" Qwen Blog. https://qwenlm.github.io/blog/qwen2.5/

[18] "Qwen2.5-LLM: Extending the Boundary of LLMs." Alibaba Cloud Community, 2024.

[19] "Qwen2.5-Coder Series: Powerful, Diverse, Practical." Alibaba Cloud Community, 2024.

[20] "Qwen2.5-Math: The world's leading open-sourced mathematical LLMs." Qwen Blog. https://qwenlm.github.io/blog/qwen2.5-math/

[21] "Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model." Qwen Blog. https://qwenlm.github.io/blog/qwen2.5-max/

[22] "QwQ: Reflect Deeply on the Boundaries of the Unknown." Qwen Blog. https://qwenlm.github.io/blog/qwq-32b-preview/

[23] "Alibaba Cloud Unveils QwQ-32B: A Compact Reasoning Model with Cutting-Edge Performance." Alibaba Cloud Community, March 2025.

[24] "Alibaba shares jump on new open-source QwQ-32B reasoning model." SiliconANGLE, March 2025.

[25] "Qwen3: Think Deeper, Act Faster." Qwen Blog. https://qwenlm.github.io/blog/qwen3/

[26] "Qwen-Audio: A Versatile Audio Understanding Model." GitHub. https://github.com/QwenLM/Qwen-Audio

[27] "Qwen2-Audio Technical Report." arXiv:2407.10759, 2024.

[28] "Qwen2-VL: To See the World More Clearly." Qwen Blog. https://qwenlm.github.io/blog/qwen2-vl/

[29] "Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL!" Qwen Blog. https://qwenlm.github.io/blog/qwen2.5-vl/

[30] "Qwen3-Coder: Agentic Coding in the World." Qwen Blog. https://qwenlm.github.io/blog/qwen3-coder/ and GitHub. https://github.com/QwenLM/Qwen3-Coder

[31] "Qwen-MT: Where Speed Meets Smart Translation." Qwen Blog, July 24, 2025.

[32] "Qwen-Image: Crafting with Native Text Rendering." Qwen Blog, August 4, 2025. "Qwen-Image-Edit," August 19, 2025.

[33] "Qwen3Guard: Real-time Safety for Your Token Stream." Qwen Blog, September 23, 2025.

[34] "Qwen2.5-Omni Technical Report." arXiv:2503.20215. https://arxiv.org/abs/2503.20215

[35] "Qwen3-VL." GitHub. https://github.com/QwenLM/Qwen3-VL

[36] "Qwen GitHub organization." https://github.com/QwenLM

[37] "Alibaba Cloud Model Studio documentation." https://www.alibabacloud.com/help/en/model-studio/

[38] "Alibaba's ambition to position Qwen as a foundational OS for AI." Alizila, 2024.

[39] "Qwen3 Technical Report." arXiv:2505.09388, May 14, 2025. https://arxiv.org/abs/2505.09388

[40] "Qwen3-Next: A New Generation of Ultra-Efficient Model Architecture." Alibaba Cloud Community, September 11, 2025. https://www.alibabacloud.com/blog/602536

[41] "Alibaba Releases Qwen3-Max-Preview." Alibaba Cloud / Qwen Team, September 5, 2025; full Qwen3-Max release September 23-24, 2025.

[42] "Qwen3-Omni Technical Report." arXiv:2509.17765, September 23, 2025. https://arxiv.org/abs/2509.17765

[43] "Qwen3.5: Towards Native Multimodal Agents." Qwen Team, February 16, 2026. https://qwen.ai/blog?id=qwen3.5 and https://github.com/QwenLM/Qwen3.5

[44] "Alibaba Qwen Team Releases Qwen3.5-Omni." March 30, 2026.

[45] "Qwen3.6: Stability and Real-World Utility." GitHub. https://github.com/QwenLM/Qwen3.6

[46] "Alibaba releases Qwen3.6-Max preview with stronger instruction-following capabilities." April 20, 2026.

[47] "Alibaba's Qwen leads global open-source AI community with 700 million downloads." Xinhua, January 13, 2026. https://english.news.cn/20260113/004b0522f987475cbf83ffc3a8d009aa/c.html

[48] "Alibaba's Qwen3-Max Joins the Frontier of Trillion-Parameter AI Models." AIwire / HPCwire, September 24, 2025; SWE-bench Verified 69.6 and Tau2-Bench 74.8 reported by Alibaba Cloud / Qwen Team.

[49] "Qwen3.5: Nobody Agrees on Attention Anymore." Maxime Labonne, Hugging Face Blog, February 2026. https://huggingface.co/blog/mlabonne/qwen35

[50] "Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving." Qwen Team, April 20, 2026. https://qwen.ai/blog?id=qwen3.6-max-preview

## External Links

- [Official Qwen website](https://qwenlm.github.io/)
- [Qwen Chat interface](https://chat.qwen.ai/)
- [GitHub organization](https://github.com/QwenLM)
- [Hugging Face models](https://huggingface.co/Qwen)
- [Alibaba Cloud Qwen page](https://www.alibabacloud.com/en/solutions/generative-ai/qwen)
- [ModelScope models](https://modelscope.cn/organization/qwen)