Qwen
Last reviewed
Jun 1, 2026
Sources
46 citations
Review status
Source-backed
Revision
v6 · 6,769 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 1, 2026
Sources
46 citations
Review status
Source-backed
Revision
v6 · 6,769 words
Add missing citations, update stale details, or suggest a clearer explanation.
Qwen (also called Tongyi Qianwen, Chinese: 通义千问; pinyin: Tongyì Qianwèn; literally "to comprehend the meaning, [and to answer] a thousand kinds of questions") is a family of large language models (LLMs) and multimodal models developed by Alibaba Cloud, the cloud computing division of Chinese technology company Alibaba Group.[1] The name "Qwen" is derived from the Chinese brand Tongyi Qianwen and refers to the large language model family built by Alibaba Cloud's Qwen Team.[2] As of early 2026, Qwen models have become one of the most widely adopted open-source model families globally, surpassing Meta's LLaMA as the most-downloaded LLM family on Hugging Face in September 2025, with cumulative downloads reaching approximately 700 million and more than 100,000 derivative models built on top of the Qwen family.[3][4] All top 10 open-source LLMs on Hugging Face's Open LLM Leaderboard were trained and developed on updated open-source versions of Qwen as of February 2025.[5]
Alibaba first launched a beta version of Qwen on April 11, 2023, during the Alibaba Cloud Summit under the name Tongyi Qianwen.[6] The initial architecture was based on the LLaMA framework developed by Meta AI.[1] Initially, it was integrated into various Alibaba business applications, including the workplace collaboration tool DingTalk and the voice assistant Tmall Genie.[7] The model received approval from the Chinese government and was publicly released in September 2023.[8]
In a significant move to foster a broader AI ecosystem, Alibaba Cloud began open-sourcing its models in August 2023. The first models released were Qwen-7B and its chat-fine-tuned variant, Qwen-7B-Chat.[9] This was followed by the release of Qwen-1.8B in November 2023, aimed at low-latency and resource-constrained environments.[10] In December 2023, Alibaba released the 72B parameter model, which demonstrated performance comparable to leading proprietary models like GPT-3.5 on several benchmarks.[11]
Released on February 5, 2024, Qwen1.5 expanded the model lineup to include sizes ranging from 0.5B to 110B parameters, all supporting a 32K context window.[12] This generation introduced Group Query Attention (GQA) across all model sizes, improving inference speed and reducing memory usage. The release also included CodeQwen1.5-7B, a code-specialized variant trained on 3 trillion tokens of code data, supporting a 64K context window for long code comprehension and generation.[13]
On June 6, 2024, Alibaba Cloud released the Qwen2 series, representing a substantial leap in model quality and multilingual support.[14] The Qwen2 family consists of five model sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B (a Mixture of Experts model), and Qwen2-72B. The MoE variant, Qwen2-57B-A14B, activates only 14 billion parameters per forward pass while maintaining the performance level of a 30-billion-parameter dense model, offering significant efficiency gains.[15]
Qwen2 was trained on 7 trillion tokens and introduced support for 27 additional languages beyond English and Chinese, including German, Italian, Arabic, Persian, and Hebrew.[14] The Qwen2-72B model topped the Hugging Face Open LLM Leaderboard for open-source models upon release.[16] Qwen2-7B-Instruct and Qwen2-72B-Instruct both support extended context lengths of up to 128K tokens. The Qwen2 technical report, published on July 15, 2024, detailed architectural improvements including reduced Key-Value (KV) cache sizes compared to Qwen1.5, translating to a smaller memory footprint during long-context inference.[15]
On September 19, 2024, Alibaba released the Qwen2.5 family, a major update encompassing over 100 open-source models across the language, coding, and mathematics domains.[17] The base Qwen2.5 language models are available in seven sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, all supporting 128K token context windows and capable of generating up to 8K tokens in a single response.
Qwen2.5 was trained on 18 trillion tokens, a significant increase from Qwen2's 7 trillion tokens. This expanded training corpus resulted in substantial improvements across benchmarks. Qwen2.5-72B-Instruct achieved 86.1 on MMLU (up from Qwen2-72B's 84.2), 83.1 on MATH (up from 69.0), and 55.5 on LiveCodeBench (up from 32.2), even surpassing the much larger Llama-3.1-405B-Instruct on several critical benchmarks.[17][18]
Alongside the base models, Alibaba released two specialized model families:
Qwen2.5-Coder: Available in six sizes (0.5B, 1.5B, 3B, 7B, 14B, and 32B), these models were trained on 5.5 trillion tokens of code-related data. Qwen2.5-Coder-32B-Instruct became the state-of-the-art open-source code LLM, with coding abilities matching those of GPT-4o on benchmarks including HumanEval (88.2) and MBPP.[19]
Qwen2.5-Math: Available in 1.5B, 7B, and 72B sizes, these models specialize in mathematical reasoning using Chain-of-Thought (CoT), Program-of-Thought (PoT), and Tool-Integrated Reasoning (TIR) approaches. Qwen2.5-Math-72B-Instruct surpassed both Qwen2-Math-72B-Instruct and GPT-4o on mathematical benchmarks, achieving 83.1 on MATH and 72.0 on MathVista.[20]
On January 28, 2025, Alibaba released Qwen2.5-Max, a large-scale Mixture of Experts model pre-trained on over 20 trillion tokens and further post-trained using Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).[21] Qwen2.5-Max outperformed DeepSeek V3 on multiple benchmarks, including Arena-Hard (89.4 vs. 85.5), LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results against Claude 3.5 Sonnet and GPT-4o. On the GSM8K mathematics benchmark, Qwen2.5-Max achieved 94.5, well ahead of DeepSeek V3 (89.3). Unlike the open-weight Qwen2.5 models, Qwen2.5-Max is available only through Alibaba Cloud's API service.
Alibaba entered the reasoning model space, competing with OpenAI's o1 series and DeepSeek-R1, through the QwQ ("Qwen with Questions") line of models:
QwQ-32B-Preview (November 28, 2024): The first reasoning-focused model in the Qwen family, released as an open-source preview under the Apache 2.0 license. It demonstrated multi-step reasoning capabilities for math, coding, and scientific tasks.[22]
QwQ-32B (March 5, 2025): A refined 32.5-billion-parameter reasoning model with a 131,072-token context window, developed using reinforcement learning with two training phases focused on math/coding skills and general reasoning. QwQ-32B achieved 90.6% on MATH-500, 50.0% on AIME 2024, and 65.2% on GPQA, outperforming OpenAI's o1-preview in mathematical and scientific reasoning benchmarks while requiring significantly less computational power than comparable models like DeepSeek-R1 (671B parameters).[23] The release caused Alibaba's stock to jump more than 8%.[24]
Released on April 28-29, 2025, Qwen3 represents the third major generation of the Qwen model family.[25] The release includes both dense models (0.6B, 1.7B, 4B, 8B, 14B, and 32B) and Mixture of Experts models (Qwen3-30B-A3B with 30B total/3B active parameters, and Qwen3-235B-A22B with 235B total/22B active parameters). All Qwen3 models were released under the Apache 2.0 license. The Qwen3 technical report was published on arXiv on May 14, 2025 (arXiv:2505.09388).[39]
Qwen3 was trained on 36 trillion tokens across 119 languages and dialects, doubling the training data from Qwen2.5.[25] The flagship Qwen3-235B-A22B achieved 95.6 on Arena-Hard, 85.7 on AIME 2024, and 70.7 on LiveCodeBench, placing it among the top-performing models globally.
A defining feature of Qwen3 is its hybrid thinking/non-thinking mode capability, allowing users to control the depth of reasoning per query. In thinking mode, the model performs step-by-step chain-of-thought reasoning before delivering a final answer, suitable for complex problems. In non-thinking mode, it provides quick, concise responses for simpler questions. Users can also set a "thinking budget" to balance response quality against latency.
Released on July 22, 2025, Qwen3-Coder is the code-specialized variant of the Qwen3 line. The flagship Qwen3-Coder-480B-A35B-Instruct is a 480-billion-parameter Mixture-of-Experts model with 35 billion active parameters, employing 160 specialized expert networks of which 8 are activated per query.[30] It natively supports a 256K-token context window and can be extrapolated to 1 million tokens, targeting agentic coding, browser-use, and tool-use tasks. The model is released under the Apache 2.0 license and achieves performance comparable to Claude Sonnet 4 on agentic coding benchmarks.[30]
Announced on September 11, 2025, Qwen3-Next-80B-A3B introduced a new ultra-efficient architecture featuring a hybrid of Gated DeltaNet linear attention and standard gated full attention, paired with a highly sparse Mixture-of-Experts (only 3 billion of 80 billion parameters active per token).[40] Released in both Instruct (non-thinking) and Thinking variants, Qwen3-Next provides over 10x throughput improvement over Qwen3-32B at long-context inference, supports 256K context natively, and is open-sourced under Apache 2.0 on Hugging Face, ModelScope, and Kaggle.[40]
Qwen3-Max is Alibaba's flagship proprietary model in the Qwen3 generation. Qwen3-Max-Preview debuted on September 5, 2025, on Alibaba Cloud and OpenRouter, and the full Qwen3-Max model was released on September 23-24, 2025.[41] It is a trillion-parameter-class Mixture-of-Experts model trained on approximately 36 trillion tokens. Unlike the open-weight Qwen3 models, Qwen3-Max is closed-source and accessible only through Alibaba Cloud's API and Qwen Chat. It ranked among the top models on the LMArena text leaderboard at release.[41]
The vision-language extension of Qwen3, Qwen3-VL, launched on September 23, 2025, with the flagship Qwen3-VL-235B-A22B-Instruct and Qwen3-VL-235B-A22B-Thinking variants.[35] Smaller variants followed in October 2025, including 30B-A3B (Instruct/Thinking) on October 4, 4B and 8B sizes on October 15, and 2B and 32B sizes on October 21. The Qwen3-VL technical report was released on November 27, 2025.[35]
Released on September 22-23, 2025, Qwen3-Omni is a natively end-to-end omni-modal large language model that can understand text, audio, images, and video while generating real-time speech.[42] It uses a Thinker-Talker Mixture-of-Experts architecture in which the Thinker handles reasoning and multimodal understanding while the Talker produces audio tokens from the Thinker's hidden representations. Three open-source variants were released under the Apache 2.0 license: Qwen3-Omni-30B-A3B-Instruct, Qwen3-Omni-30B-A3B-Thinking, and Qwen3-Omni-30B-A3B-Captioner. The model achieves streaming latency as low as 234 ms for audio and 547 ms for video, and reaches state-of-the-art performance across 32 of 36 audio and audio-visual benchmarks.[42] The Qwen3-Omni technical report was published on arXiv on September 23, 2025 (arXiv:2509.17765).
Announced on February 16, 2026, Qwen3.5 represents a mid-cycle architectural refresh of the Qwen3 line.[43] The initial release was Qwen3.5-397B-A17B, a Mixture-of-Experts model with 397 billion total parameters and 17 billion active parameters per token, accompanied by a hosted Qwen3.5-Plus API offering a 1-million-token context window. Subsequent open-source releases followed on February 24, 2026 (Qwen3.5-122B-A10B, Qwen3.5-35B-A3B, and Qwen3.5-27B dense) and March 2, 2026 (Qwen3.5-9B, 4B, 2B, and 0.8B).
Qwen3.5 introduces several key architectural changes:[43]
Released around March 30, 2026, Qwen3.5-Omni is the multimodal extension of Qwen3.5, supporting text, audio, image, and video understanding alongside real-time speech generation.[44] The model targets latency-sensitive multimodal interaction with a 256K context window. Unlike Qwen3-Omni, the initial Qwen3.5-Omni release was kept proprietary, with access restricted to the Qwen Chat interface and Alibaba Cloud's API.
In April 2026, the Qwen3.6 series began rolling out, emphasizing stability, agentic coding, and a new "thinking preservation" feature that maintains reasoning context across multi-turn conversations.[45] The initial open-source releases are Qwen3.6-35B-A3B (an MoE model with 3B active parameters, April 16, 2026) and Qwen3.6-27B (a dense model, April 22, 2026). The flagship proprietary Qwen3.6-Max-Preview debuted on April 20, 2026, demonstrating substantial improvements in instruction-following, programming, and knowledge benchmarks over Qwen3-Max.[46]
| Date (UTC) | Generation / Model | Key Details |
|---|---|---|
| 2023-04-11 | Tongyi Qianwen (beta) | Initial corporate announcement by Alibaba Cloud for a company-scale LLM initiative.[6] |
| 2023-08-03 | Qwen-7B / Qwen-7B-Chat | First broadly distributed open weights via ModelScope and Hugging Face. Quantized INT4 chat variant followed on 2023-08-21.[9] |
| 2023-09-13 | Qwen (public release) | Model approved by Chinese government for public release.[8] |
| 2023-11-30 | Qwen-72B | 72-billion parameter model released, competitive with GPT-3.5.[11] |
| 2023-11-30 | Qwen-Audio | Multimodal audio-language model supporting 30+ tasks and multiple audio types.[26] |
| 2024-02-05 | Qwen1.5 | Models ranging from 0.5B to 110B parameters with 32K context window.[12] |
| 2024-04 | CodeQwen1.5-7B | Code-specialized model trained on 3 trillion tokens with 64K context window.[13] |
| 2024-06-06 | Qwen2 | Dense and MoE models (0.5B to 72B) trained on 7 trillion tokens with 128K context support.[14] |
| 2024-07-15 | Qwen2 (tech report) | Technical report details five model sizes and efficiency optimizations.[15] |
| 2024-08-09 | Qwen2-Audio | Updated audio-language model supporting 8+ languages, outperforming Gemini-1.5-pro on AIR-Bench.[27] |
| 2024-08-30 | Qwen2-VL | Vision-language series with dynamic resolution and M-RoPE; 2B, 7B, and 72B sizes.[28] |
| 2024-09-19 | Qwen2.5 | 18 trillion token training, 128K context, seven model sizes (0.5B to 72B).[17] |
| 2024-09-19 | Qwen2.5-Coder | Code-specialized models (0.5B to 32B) trained on 5.5T tokens; 32B matches GPT-4o.[19] |
| 2024-09-18 | Qwen2.5-Math | Math-specialized instruction models (1.5B/7B/72B) surpassing GPT-4o on MathVista.[20] |
| 2024-11-28 | QwQ-32B-Preview | Open-source reasoning model preview, competing with OpenAI's o1.[22] |
| 2025-01-28 | Qwen2.5-Max | Large-scale MoE model outperforming DeepSeek V3 on Arena-Hard, LiveBench, and GPQA.[21] |
| 2025-01-29 | Qwen2.5-VL | Enhanced vision-language model with 1-hour video comprehension and temporal reasoning.[29] |
| 2025-03-05 | QwQ-32B | Refined reasoning model; 90.6% MATH-500, 65.2% GPQA; Apache 2.0 license.[23] |
| 2025-03-26 | Qwen2.5-Omni | End-to-end omni-modal model with Thinker-Talker architecture; arXiv:2503.20215.[34] |
| 2025-04-28 | Qwen3 | Thinking/non-thinking modes, 36 trillion tokens, 119 languages, Apache 2.0.[25] |
| 2025-05-14 | Qwen3 (tech report) | arXiv:2505.09388 details hybrid thinking framework across 0.6B–235B scales.[39] |
| 2025-07-22 | Qwen3-Coder | 480B-A35B MoE for agentic coding, 256K context (1M with extrapolation).[30] |
| 2025-07-24 | Qwen-MT | Translation model supporting 92 languages.[31] |
| 2025-08-04 | Qwen-Image | 20B MMDiT image generation model with complex text rendering.[32] |
| 2025-08-19 | Qwen-Image-Edit | Image editing extension with precise text and appearance control.[32] |
| 2025-09-05 | Qwen3-Max-Preview | Trillion-parameter MoE preview on Alibaba Cloud and OpenRouter.[41] |
| 2025-09-11 | Qwen3-Next-80B-A3B | Hybrid Gated DeltaNet + sparse MoE, 10x throughput vs. Qwen3-32B.[40] |
| 2025-09-22 | Qwen3-Omni | Open-source omni-modal Thinker-Talker MoE; arXiv:2509.17765.[42] |
| 2025-09-23 | Qwen3-VL | 235B-A22B Instruct/Thinking vision-language models; smaller sizes followed in October.[35] |
| 2025-09-23 | Qwen3Guard | Safety guardrail model for real-time moderation.[33] |
| 2025-09-23 | Qwen3-Max | Flagship trillion-parameter MoE; closed-source via Alibaba Cloud.[41] |
| 2026-02-16 | Qwen3.5 | Qwen3.5-397B-A17B with hybrid Gated DeltaNet attention, 201 languages.[43] |
| 2026-03-30 | Qwen3.5-Omni | Proprietary multimodal extension of Qwen3.5 with 256K context.[44] |
| 2026-04-16 | Qwen3.6-35B-A3B | MoE focused on agentic coding and thinking preservation.[45] |
| 2026-04-20 | Qwen3.6-Max-Preview | Flagship proprietary preview with improved programming/knowledge benchmarks.[46] |
| 2026-04-22 | Qwen3.6-27B | Dense model optimized for coding and stability.[45] |
Qwen models are based on the Transformer architecture, the standard for modern LLMs. Key architectural features include:
Attention Mechanism: Uses self-attention with Group Query Attention (GQA) introduced in Qwen1.5 and expanded across the Qwen2 and later series. GQA groups multiple query heads under shared key-value heads, improving inference speed and reducing memory usage compared to standard multi-head attention.[14] Beginning with Qwen3-Next (September 2025) and continuing in Qwen3.5 (February 2026), the architecture adopts a hybrid of Gated Delta Networks (linear attention) and standard gated full attention for efficient long-context scaling.[40][43]
Tokenizer: Custom tokenizer with over 150,000 token vocabulary size, efficiently representing text from multiple languages and reducing token count for non-English text.[12]
Position Embeddings: Evolution from Absolute Position Embeddings (ALiBi) in early models to Rotary Position Embeddings (RoPE) for better long-context performance. Multimodal variants use M-RoPE (Multimodal Rotary Position Embedding) to decompose positional information into 1D textual, 2D visual, and 3D video components. Qwen2.5-Omni introduces TMRoPE (Time-aligned Multimodal RoPE) for synchronizing video and audio timestamps.[28][34]
Architecture Types: Both dense and Mixture of Experts (MoE) variants exist. MoE models activate only a subset of parameters (called "experts") per token, allowing larger total parameter counts with lower computational cost. For example, Qwen3-235B-A22B has 235 billion total parameters but activates just 22 billion per token, and Qwen3-Coder-480B-A35B activates 35 billion of its 480 billion parameters via 8 of 160 experts.[25][30]
The evolution of training data across generations demonstrates aggressive scaling:
| Generation | Training Tokens | Languages |
|---|---|---|
| Qwen (2023) | Not disclosed | Chinese, English, multilingual |
| Qwen2 (June 2024) | 7 trillion | 29 languages |
| Qwen2.5 (September 2024) | 18 trillion | 29+ core languages |
| Qwen2.5-Max (January 2025) | 20+ trillion | 29+ core languages |
| Qwen3 (April 2025) | 36 trillion | 119 languages and dialects |
| Qwen3-Max (September 2025) | ~36 trillion | 119+ languages |
| Qwen3.5 (February 2026) | Undisclosed | 201 languages and dialects |
The pre-training data includes high-quality Chinese language data, multilingual text, code, mathematics, and multimodal data, with extensive filtering and deduplication pipelines to ensure data quality.
| Model | Parameters | Type | Context Length | Key Feature |
|---|---|---|---|---|
| Qwen2-0.5B | 0.5B | Dense | 32K | Ultra-lightweight for edge devices |
| Qwen2-1.5B | 1.5B | Dense | 32K | Low-resource deployment |
| Qwen2-7B | 7B | Dense | 128K | General-purpose, long context |
| Qwen2-57B-A14B | 57B (14B active) | MoE | 64K | Efficient MoE, performance of 30B dense |
| Qwen2-72B | 72B | Dense | 128K | Flagship, topped Open LLM Leaderboard |
| Model | Parameters | Type | Context Length | Key Feature |
|---|---|---|---|---|
| Qwen2.5-0.5B | 0.5B | Dense | 128K | Smallest, for mobile/edge |
| Qwen2.5-1.5B | 1.5B | Dense | 128K | Light deployment |
| Qwen2.5-3B | 3B | Dense | 128K | New size tier |
| Qwen2.5-7B | 7B | Dense | 128K | Balanced performance/cost |
| Qwen2.5-14B | 14B | Dense | 128K | Mid-range |
| Qwen2.5-32B | 32B | Dense | 128K | High performance |
| Qwen2.5-72B | 72B | Dense | 128K | Flagship open-weight model |
| Model | Parameters | Type | Context Length | Key Feature |
|---|---|---|---|---|
| Qwen3-0.6B | 0.6B | Dense | 32K | Ultra-lightweight |
| Qwen3-1.7B | 1.7B | Dense | 32K | Edge deployment |
| Qwen3-4B | 4B | Dense | 32K | Mobile-friendly |
| Qwen3-8B | 8B | Dense | 128K | General-purpose |
| Qwen3-14B | 14B | Dense | 128K | Mid-range performance |
| Qwen3-32B | 32B | Dense | 128K | High performance |
| Qwen3-30B-A3B | 30B (3B active) | MoE | 128K | Efficient MoE for constrained hardware |
| Qwen3-235B-A22B | 235B (22B active) | MoE | 128K | Flagship, competitive with top proprietary models |
| Qwen3-Next-80B-A3B | 80B (3B active) | Hybrid MoE | 256K | Hybrid Gated DeltaNet, 10x throughput[40] |
| Qwen3-Coder-480B-A35B | 480B (35B active) | MoE (160 experts/8 active) | 256K (1M extrap.) | Agentic coding flagship[30] |
| Qwen3-Max | ~1T (proprietary) | MoE | API only | Closed-source flagship[41] |
| Model | Parameters | Type | Context Length | Release |
|---|---|---|---|---|
| Qwen3.5-0.8B | 0.8B | Dense | Standard | March 2, 2026 |
| Qwen3.5-2B | 2B | Dense | Standard | March 2, 2026 |
| Qwen3.5-4B | 4B | Dense | Standard | March 2, 2026 |
| Qwen3.5-9B | 9B | Dense | Standard | March 2, 2026 |
| Qwen3.5-27B | 27B | Dense | Long | February 24, 2026 |
| Qwen3.5-35B-A3B | 35B (3B active) | Hybrid MoE | Long | February 24, 2026 |
| Qwen3.5-122B-A10B | 122B (10B active) | Hybrid MoE | Long | February 24, 2026 |
| Qwen3.5-397B-A17B | 397B (17B active) | Hybrid MoE | 1M (Plus tier) | February 16, 2026 (flagship)[43] |
Qwen3 introduces a hybrid approach to problem-solving with two distinct inference modes:[25]
Thinking Mode: The model takes time to reason step by step before delivering the final answer, similar to the approach used by OpenAI's o1 and DeepSeek-R1. This mode is suitable for complex problems in mathematics, coding, and scientific reasoning.
Non-Thinking Mode: Provides quick, near-instant responses for simpler questions where speed is prioritized over depth of reasoning.
Thinking Budget: Users can control how much "thinking" the model performs, allowing fine-grained trade-offs between response quality and latency depending on the task at hand.
This dual-mode capability is available across all Qwen3 models and can be toggled within a single conversation. Subsequent generations (Qwen3-Next, Qwen3-Omni, Qwen3.5) released distinct "Instruct" (non-thinking) and "Thinking" model variants rather than always toggling within one checkpoint. Qwen3.6 introduces a "thinking preservation" mechanism to retain reasoning context across multi-turn conversations.[45]
The Qwen-VL series represents Qwen's multimodal models that process both text and images. Each generation has expanded the capabilities significantly.
Released on August 30, 2024, Qwen2-VL introduced several architectural innovations for vision-language understanding:[28]
Released on January 29, 2025, this version brought significant enhancements:[29]
Launched on September 23, 2025, the Qwen3-VL series initially shipped as Qwen3-VL-235B-A22B-Instruct and Qwen3-VL-235B-A22B-Thinking, with smaller 30B-A3B variants on October 4, 4B/8B on October 15, and 2B/32B on October 21.[35] The technical report appeared on November 27, 2025. Key features include:
The Qwen-Audio line provides audio understanding capabilities integrated with language models.
The original Qwen-Audio model, released on November 30, 2023, was a fundamental multi-task audio-language model supporting over 30 tasks across multiple audio types (human speech, natural sounds, music, and song). It achieved state-of-the-art results on Aishell1, CochlScene, ClothoAQA, and VocalSound benchmarks.[26]
Released on August 9, 2024, Qwen2-Audio introduced two key improvements:[27]
Released on March 26-27, 2025, with technical report on arXiv (2503.20215), Qwen2.5-Omni introduced a unique Thinker-Talker architecture:[34]
Released on September 22-23, 2025, with technical report (arXiv:2509.17765), Qwen3-Omni extended the Thinker-Talker paradigm into a Mixture-of-Experts architecture.[42] Three Apache 2.0 variants were released:
Streaming latency reaches 234 ms for audio and 547 ms for video. Qwen3-Omni reached state-of-the-art performance on 32 of 36 audio and audio-visual benchmarks, including outperforming Gemini-2.5-Pro, Seed-ASR, and GPT-4o-Transcribe on key tasks.
Released around March 30, 2026, Qwen3.5-Omni is a proprietary multimodal model supporting text, audio, image, and video understanding with real-time speech generation, providing a 256K-token context window.[44] Access is limited to Qwen Chat and Alibaba Cloud's API at launch.
QVQ-72B-Preview is an experimental research model for enhanced visual reasoning that scored 70.3% on MMMU (Multimodal Massive Multi-task Understanding), with superior performance on MathVision and OlympiadBench for advanced multidisciplinary understanding.[3]
Qwen's coding model lineup has evolved through three generations:
| Model | Release | Parameters | Training Data | Context | Performance |
|---|---|---|---|---|---|
| CodeQwen1.5-7B | April 2024 | 7B | 3T tokens of code | 64K | Strong on text-to-SQL and bug fixing[13] |
| Qwen2.5-Coder | September 2024 | 0.5B to 32B | 5.5T tokens of code | 128K | 88.2 on HumanEval; 32B matches GPT-4o[19] |
| Qwen3-Coder | July 2025 | 480B-A35B (MoE) | Extended code corpus | 256K (1M extrap.) | Agentic coding flagship comparable to Claude Sonnet 4[30] |
Qwen2.5-Coder represented a major leap, with six model sizes covering everything from lightweight on-device code completion (0.5B) to full-featured code generation, reasoning, and fixing at the 32B scale. The Qwen2.5-Coder-32B-Instruct model became the state-of-the-art open-source code LLM upon release, matching GPT-4o's coding abilities. Qwen3-Coder-480B-A35B-Instruct subsequently set new state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use benchmarks.[30]
| Model | Release | Parameters | Key Capabilities |
|---|---|---|---|
| Qwen2-Math | June 2024 | 1.5B, 7B, 72B | Chain-of-Thought mathematical reasoning |
| Qwen2.5-Math | September 2024 | 1.5B, 7B, 72B | CoT, PoT, and TIR; 72.0 on MathVista; surpasses GPT-4o[20] |
Qwen2.5-Math supports both Chinese and English and uses multiple reasoning approaches: Chain-of-Thought (natural language step-by-step), Program-of-Thought (generating code to solve problems), and Tool-Integrated Reasoning (combining language reasoning with computational tools). Even the small 1.5B variant achieves competitive performance against much larger general-purpose models.
| Variant | Release Date | Focus | Key Features |
|---|---|---|---|
| Qwen-MT | July 2025 | Translation | 92 languages covering 95% of global population; reinforcement learning for accuracy[31] |
| Qwen-Image | August 2025 | Image Generation | 20B MMDiT model; complex text rendering, multi-line layouts[32] |
| Qwen-Image-Edit | August 2025 | Image Editing | Precise text editing, semantic and appearance control[32] |
| Qwen3Guard | September 2025 | Safety | Real-time moderation, risk classification; state-of-the-art on multilingual safety benchmarks[33] |
The following table summarizes benchmark improvements across Qwen generations for flagship models:
| Benchmark | Qwen2-72B | Qwen2.5-72B-Instruct | Qwen3-235B-A22B |
|---|---|---|---|
| MMLU | 84.2 | 86.1 | 88.5 (MMLU-Redux) |
| MATH | 69.0 | 83.1 | 85.7 (AIME 2024) |
| LiveCodeBench | 32.2 | 55.5 | 70.7 |
| Arena-Hard | N/A | N/A | 95.6 |
| Benchmark | Qwen2.5-Max | DeepSeek V3 | Claude 3.5 Sonnet | GPT-4o |
|---|---|---|---|---|
| Arena-Hard | 89.4 | 85.5 | 85.2 | Comparable |
| MMLU-Pro | 76.1 | 75.9 | 78.0 | Comparable |
| GSM8K | 94.5 | 89.3 | N/A | N/A |
| LiveCodeBench | Leading | Below Qwen2.5-Max | N/A | N/A |
| GPQA-Diamond | Leading | Below Qwen2.5-Max | N/A | N/A |
| Benchmark | QwQ-32B | DeepSeek-R1 | OpenAI o1-preview |
|---|---|---|---|
| MATH-500 | 90.6% | Comparable | Below QwQ-32B |
| AIME 2024 | 50.0% | Comparable | Below QwQ-32B |
| GPQA | 65.2% | Comparable | Below QwQ-32B |
| LiveCodeBench | 50.0% | Comparable | N/A |
| Parameters | 32.5B | 671B | Proprietary |
QwQ-32B's strong performance at just 32.5 billion parameters, compared to DeepSeek-R1's 671 billion, highlights the efficiency gains achieved through Alibaba's reinforcement learning training methodology.
| Benchmark | Qwen3-235B-A22B | Qwen3-30B-A3B | QwQ-32B |
|---|---|---|---|
| Arena-Hard | 95.6 | 91.0 | 89.5 |
| AIME 2024 | 85.7 | 80.4 | 50.0 |
| LiveCodeBench | 70.7 | N/A | 50.0 |
| CodeForces Elo | N/A | 1974 | N/A |
Qwen models support a comprehensive array of tasks across multiple domains:
Multilingual Processing: Core models handle 29 languages, with Qwen3 extending to 119 languages and dialects, Qwen3.5 expanding to 201 languages and dialects, and Qwen-MT covering 92 languages for translation (representing 95% of the global population).[25][31][43]
Long-Context Understanding: 128K tokens across Qwen2/Qwen2.5/Qwen3, 256K tokens natively in Qwen3-Next and Qwen3-Coder (extrapolable to 1M for Qwen3-Coder), and a 1-million-token context window in the hosted Qwen3.5-Plus tier.[30][40][43]
Coding and Mathematics: Specialized models achieve state-of-the-art results, with Qwen2.5-Coder scoring 88.2 on HumanEval, Qwen2.5-Math achieving 72.0 on MathVista, and Qwen3-Coder reaching parity with Claude Sonnet 4 on agentic coding benchmarks.
Multimodal Tasks: Image understanding, video comprehension (up to 1+ hours), audio processing, image generation and editing, and cross-modal reasoning with low-latency speech generation in the Qwen-Omni line.
Safety and Moderation: Qwen3Guard provides real-time detection with categorized risk levels for content filtering.
Agentic and Reasoning: Models like QwQ-32B, Qwen3, and Qwen3-Coder support advanced chain-of-thought reasoning, tool use, multi-step tasks, and agentic workflows for autonomous task completion.
Structured Data Analysis: Enhanced capabilities for processing tables, forms, and structured documents, with models able to generate structured JSON output from visual inputs.
Real-time Interaction: Support for low-latency voice and video chat through Qwen2.5-Omni and Qwen3-Omni, with streaming latency as low as 234 ms for audio in Qwen3-Omni.[42]
Alibaba Cloud's approach to open-sourcing the Qwen family has evolved significantly over time, becoming increasingly permissive.
| Period | License | Scope |
|---|---|---|
| 2023 (Qwen) | Tongyi Qianwen LICENSE | Restricted; commercial use over 100M MAU requires approval |
| 2024 (Qwen1.5, Qwen2) | Mixed | Most models Apache 2.0; some larger models under Tongyi Qianwen license |
| 2024 (Qwen2.5) | Mostly Apache 2.0 | Most models Apache 2.0; select variants under Qianwen license |
| 2025 (Qwen3) | Apache 2.0 | All open-weight Qwen3 models released under Apache 2.0; Qwen3-Max closed-source |
| 2025 (Qwen3-Omni, Qwen3-Coder, Qwen3-Next, Qwen3-VL) | Apache 2.0 | All open-weight variants released for research and commercial use |
| 2026 (Qwen3.5) | Apache 2.0 (open variants) | Open-weight variants 0.8B–397B Apache 2.0; Qwen3.5-Plus hosted; Qwen3.5-Omni closed-source |
| 2026 (Qwen3.6) | Apache 2.0 (open variants) | Qwen3.6-27B and 35B-A3B open under Apache 2.0; Qwen3.6-Max closed-source |
The shift to Apache 2.0 licensing across most of the Qwen3 and Qwen3.5 lineups removed barriers for commercial adoption, allowing any organization to use, modify, and distribute the models without restrictions. This open approach has been a major driver of Qwen's rapid community adoption and the proliferation of derivative models, although Alibaba retains closed weights for its flagship "Max" and "Omni" hosted tiers in some generations.
Qwen models are distributed through multiple platforms:
Alibaba Cloud provides commercial access to Qwen models through several channels:[37]
The API provides access to models not available as open weights, including Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, and Qwen3.6-Max-Preview.
Qwen models support deployment through a variety of open-source inference frameworks:[36]
The Qwen model family has achieved remarkable adoption milestones since its initial open-source release in 2023:
| Metric | Value | Date |
|---|---|---|
| Cumulative downloads | ~700 million | January 2026 |
| Derivative models on Hugging Face | 100,000+ | Late 2025 |
| Including all tagged models | 200,000+ | Early 2026 |
| Most-downloaded LLM family on Hugging Face | Yes (surpassed LLaMA) | September 2025 |
| Top 10 Open LLM Leaderboard models built on Qwen | 10 out of 10 | February 2025 |
In December 2025, Qwen's single-month downloads exceeded the combined total of the next eight most popular model families (Meta, DeepSeek, OpenAI, Mistral, Nvidia, Zhipu.AI, Moonshot, and MiniMax).[4] Alibaba as an organization now has more derivative models on Hugging Face than both Google and Meta combined. Since January 2025, Chinese fine-tuned or derivative models accounted for 63% of all new fine-tuned or derivative models released on Hugging Face, with Qwen serving as the primary base.
The development trajectory reflects Alibaba's ambition to position Qwen as a foundational "operating system" for AI, analogous to Android in mobile computing.[38] Fine-tuned versions created by the community, such as "Liberated Qwen" by Abacus AI, have removed content restrictions for specialized use cases.[1] The breadth of community-built models spans applications in healthcare, legal, finance, education, customer service, and creative industries.
Qwen powers diverse applications across industries:
Enterprise AI Solutions: Document analysis, customer service automation, and business intelligence integrated across Alibaba's ecosystem of products.
Software Development: Code generation, debugging, code review, and agentic coding workflows through Qwen2.5-Coder and Qwen3-Coder.
Education: Personalized tutoring, especially in mathematics (via Qwen2.5-Math) and programming.
Healthcare: Medical document analysis, clinical note processing, and research assistance.
E-commerce: Product descriptions, customer support, and recommendation systems within Alibaba's retail platforms.
Creative Content: Story writing, article generation, image creation (Qwen-Image), and image editing.
Translation: Professional-grade translation across 92 languages through Qwen-MT.
Research: Academic paper analysis, scientific computing, and data analysis.
Qwen models provide extensive multilingual support, with Qwen3 supporting 119 languages and dialects and Qwen3.5 expanding to 201 languages and dialects.[25][43] Core language support includes:
While Qwen models demonstrate strong capabilities, they have known limitations:[39]
Language mixing: Models may unexpectedly switch between languages during generation, particularly in multilingual prompts.
Circular reasoning: Can get stuck in repetitive reasoning loops, particularly in complex multi-step problems when using thinking mode.
Safety concerns: Despite Qwen3Guard, production deployments require additional safety layers and content filtering.
Performance gaps: While strong in math and coding, improvements are still needed in common sense reasoning and nuanced cultural understanding.
Context limitations: Although supporting 128K to 1M token contexts depending on the variant, performance may degrade with extremely long inputs, especially for tasks requiring precise recall from the middle of long documents.
Computational requirements: Larger models (72B dense, 235B and 480B MoE, 397B MoE) require significant GPU resources. Even MoE models, while efficient at inference, still demand multi-GPU setups for self-hosted deployment.
API-only models: Some of the most capable models (Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, Qwen3.6-Max) are available only through Alibaba Cloud's API, limiting self-hosted deployment options for the highest-performing variants.