Qwen

Chinese AI Large Language Models Natural Language Processing Open Source AI

37 min read

Updated Jun 20, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 20, 2026

Fact-checked

In review queue

Sources

50 citations

Revision

v9 · 7,444 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Qwen is a family of open-source and proprietary large language models (LLMs) and multimodal models developed by Alibaba Cloud, the cloud computing division of Chinese technology company Alibaba Group, and is the most-downloaded open-weight LLM family in the world.^[1]^[3] As of January 2026, Qwen had surpassed approximately 700 million cumulative downloads on Hugging Face, overtook Meta's LLaMA as the most-downloaded open-source model family in 2025, and had spawned more than 180,000 derivative versions across nearly 400 released models in the Qwen lineup.^[3]^[47] Qwen is also called Tongyi Qianwen (Chinese: 通义千问; pinyin: Tongyì Qianwèn; literally "to comprehend the meaning, [and to answer] a thousand kinds of questions"), the Chinese brand name under which Alibaba Cloud's Qwen Team ships the models.^[2] All top 10 open-source LLMs on Hugging Face's Open LLM Leaderboard were trained and developed on updated open-source versions of Qwen as of February 2025.^[5]

A Qwen researcher summarized the team's strategy to Xinhua in January 2026: "Our core goal remains to keep pushing the performance frontier of LLMs while staying committed to open-source openness so that AI can truly help more people around the world."^[47]

What is Qwen?

Qwen spans dense and Mixture of Experts (MoE) language models from roughly 0.5 billion to over 1 trillion parameters, plus specialized variants for coding, mathematics, vision-language understanding, audio, image generation, translation, and safety. Most open-weight Qwen models since the Qwen3 generation are released under the permissive Apache 2.0 license, while the flagship "Max" and "Omni" hosted tiers are kept proprietary in some generations and served only through Alibaba Cloud's API. The models are widely cited as a foundation for the global open-source AI ecosystem: since January 2025, Chinese fine-tuned or derivative models accounted for 63% of all new fine-tuned or derivative models released on Hugging Face, with Qwen serving as the primary base.^[4]

History

Early Development

Alibaba first launched a beta version of Qwen on April 11, 2023, during the Alibaba Cloud Summit under the name Tongyi Qianwen.^[6] The initial architecture was based on the LLaMA framework developed by Meta AI.^[1] Initially, it was integrated into various Alibaba business applications, including the workplace collaboration tool DingTalk and the voice assistant Tmall Genie.^[7] The model received approval from the Chinese government and was publicly released in September 2023.^[8]

Open Source Release

In a significant move to foster a broader AI ecosystem, Alibaba Cloud began open-sourcing its models in August 2023. The first models released were Qwen-7B and its chat-fine-tuned variant, Qwen-7B-Chat.^[9] This was followed by the release of Qwen-1.8B in November 2023, aimed at low-latency and resource-constrained environments.^[10] In December 2023, Alibaba released the 72B parameter model, which demonstrated performance comparable to leading proprietary models like GPT-3.5 on several benchmarks.^[11]

Qwen1.5

Released on February 5, 2024, Qwen1.5 expanded the model lineup to include sizes ranging from 0.5B to 110B parameters, all supporting a 32K context window.^[12] This generation introduced Group Query Attention (GQA) across all model sizes, improving inference speed and reducing memory usage. The release also included CodeQwen1.5-7B, a code-specialized variant trained on 3 trillion tokens of code data, supporting a 64K context window for long code comprehension and generation.^[13]

Qwen2

On June 6, 2024, Alibaba Cloud released the Qwen2 series, representing a substantial leap in model quality and multilingual support.^[14] The Qwen2 family consists of five model sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B (a Mixture of Experts model), and Qwen2-72B. The MoE variant, Qwen2-57B-A14B, activates only 14 billion parameters per forward pass while maintaining the performance level of a 30-billion-parameter dense model, offering significant efficiency gains.^[15]

Qwen2 was trained on 7 trillion tokens and introduced support for 27 additional languages beyond English and Chinese, including German, Italian, Arabic, Persian, and Hebrew.^[14] The Qwen2-72B model topped the Hugging Face Open LLM Leaderboard for open-source models upon release.^[16] Qwen2-7B-Instruct and Qwen2-72B-Instruct both support extended context lengths of up to 128K tokens. The Qwen2 technical report, published on July 15, 2024, detailed architectural improvements including reduced Key-Value (KV) cache sizes compared to Qwen1.5, translating to a smaller memory footprint during long-context inference.^[15]

Qwen2.5

On September 19, 2024, Alibaba released the Qwen2.5 family, a major update encompassing over 100 open-source models across the language, coding, and mathematics domains.^[17] The base Qwen2.5 language models are available in seven sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, all supporting 128K token context windows and capable of generating up to 8K tokens in a single response.

Qwen2.5 was trained on 18 trillion tokens, a significant increase from Qwen2's 7 trillion tokens. This expanded training corpus resulted in substantial improvements across benchmarks. Qwen2.5-72B-Instruct achieved 86.1 on MMLU (up from Qwen2-72B's 84.2), 83.1 on MATH (up from 69.0), and 55.5 on LiveCodeBench (up from 32.2), even surpassing the much larger Llama-3.1-405B-Instruct on several critical benchmarks.^[17]^[18]

Alongside the base models, Alibaba released two specialized model families:

Qwen2.5-Coder: Available in six sizes (0.5B, 1.5B, 3B, 7B, 14B, and 32B), these models were trained on 5.5 trillion tokens of code-related data. Qwen2.5-Coder-32B-Instruct became the state-of-the-art open-source code LLM, with coding abilities matching those of GPT-4o on benchmarks including HumanEval (88.2) and MBPP.^[19]
Qwen2.5-Math: Available in 1.5B, 7B, and 72B sizes, these models specialize in mathematical reasoning using Chain-of-Thought (CoT), Program-of-Thought (PoT), and Tool-Integrated Reasoning (TIR) approaches. Qwen2.5-Math-72B-Instruct surpassed both Qwen2-Math-72B-Instruct and GPT-4o on mathematical benchmarks, achieving 83.1 on MATH and 72.0 on MathVista.^[20]

Qwen2.5-Max

On January 28, 2025, Alibaba released Qwen2.5-Max, a large-scale Mixture of Experts model pre-trained on over 20 trillion tokens and further post-trained using Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).^[21] Qwen2.5-Max outperformed DeepSeek V3 on multiple benchmarks, including Arena-Hard (89.4 vs. 85.5), LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results against Claude 3.5 Sonnet and GPT-4o. On the GSM8K mathematics benchmark, Qwen2.5-Max achieved 94.5, well ahead of DeepSeek V3 (89.3). Unlike the open-weight Qwen2.5 models, Qwen2.5-Max is available only through Alibaba Cloud's API service.

QwQ: Reasoning Models

Alibaba entered the reasoning model space, competing with OpenAI's o1 series and DeepSeek-R1, through the QwQ ("Qwen with Questions") line of models:

QwQ-32B-Preview (November 28, 2024): The first reasoning-focused model in the Qwen family, released as an open-source preview under the Apache 2.0 license. It demonstrated multi-step reasoning capabilities for math, coding, and scientific tasks.^[22]
QwQ-32B (March 5, 2025): A refined 32.5-billion-parameter reasoning model with a 131,072-token context window, developed using reinforcement learning with two training phases focused on math/coding skills and general reasoning. QwQ-32B achieved 90.6% on MATH-500, 50.0% on AIME 2024, and 65.2% on GPQA, outperforming OpenAI's o1-preview in mathematical and scientific reasoning benchmarks while requiring significantly less computational power than comparable models like DeepSeek-R1 (671B parameters).^[23] The release caused Alibaba's stock to jump more than 8%.^[24]

When was Qwen3 released?

Released on April 28-29, 2025, Qwen3 represents the third major generation of the Qwen model family.^[25] The release includes both dense models (0.6B, 1.7B, 4B, 8B, 14B, and 32B) and Mixture of Experts models (Qwen3-30B-A3B with 30B total/3B active parameters, and Qwen3-235B-A22B with 235B total/22B active parameters). All Qwen3 models were released under the Apache 2.0 license. The Qwen3 technical report was published on arXiv on May 14, 2025 (arXiv:2505.09388).^[39]

Qwen3 was trained on 36 trillion tokens across 119 languages and dialects, doubling the training data from Qwen2.5.^[25]^[39] The flagship Qwen3-235B-A22B achieved 95.6 on Arena-Hard, 85.7 on AIME 2024, and 70.7 on LiveCodeBench, placing it among the top-performing models globally.^[25] According to the Qwen3 technical report, the flagship "achieves competitive results in benchmark evaluations against other top-tier models" such as DeepSeek-R1, o1, o3-mini, and Gemini-2.5-Pro.^[25]^[39]

A defining feature of Qwen3 is its hybrid thinking/non-thinking mode capability, allowing users to control the depth of reasoning per query. In thinking mode, the model performs step-by-step chain-of-thought reasoning before delivering a final answer, suitable for complex problems. In non-thinking mode, it provides quick, concise responses for simpler questions. Users can also set a "thinking budget" to balance response quality against latency. The Qwen3 technical report describes this as "the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework," alongside a thinking budget mechanism "allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity."^[39]

Qwen3-Coder

Released on July 22, 2025, Qwen3-Coder is the code-specialized variant of the Qwen3 line. The flagship Qwen3-Coder-480B-A35B-Instruct is a 480-billion-parameter Mixture-of-Experts model with 35 billion active parameters, employing 160 specialized expert networks of which 8 are activated per query.^[30] It natively supports a 256K-token context window and can be extrapolated to 1 million tokens, targeting agentic coding, browser-use, and tool-use tasks. The model is released under the Apache 2.0 license and achieves performance comparable to Claude Sonnet 4 on agentic coding benchmarks.^[30]

Qwen3-Next

Announced on September 11, 2025, Qwen3-Next-80B-A3B introduced a new ultra-efficient architecture featuring a hybrid of Gated DeltaNet linear attention and standard gated full attention, paired with a highly sparse Mixture-of-Experts (only 3 billion of 80 billion parameters active per token).^[40] Released in both Instruct (non-thinking) and Thinking variants, Qwen3-Next provides over 10x throughput improvement over Qwen3-32B at long-context inference, supports 256K context natively, and is open-sourced under Apache 2.0 on Hugging Face, ModelScope, and Kaggle.^[40]

Qwen3-Max

Qwen3-Max is Alibaba's flagship proprietary model in the Qwen3 generation. Qwen3-Max-Preview debuted on September 5, 2025, on Alibaba Cloud and OpenRouter, and the full Qwen3-Max model was released on September 23-24, 2025.^[41] It is a trillion-parameter-class Mixture-of-Experts model with over 1 trillion parameters, trained on approximately 36 trillion tokens.^[41]^[48] On agentic and coding evaluations, Qwen3-Max-Instruct scored 69.6 on SWE-bench Verified and 74.8 on Tau2-Bench, with Alibaba reporting that the latter surpassed Claude Opus 4 and DeepSeek V3.1.^[48] Unlike the open-weight Qwen3 models, Qwen3-Max is closed-source and accessible only through Alibaba Cloud's API and Qwen Chat. It ranked among the top models on the LMArena text leaderboard at release.^[41]

Qwen3-VL

The vision-language extension of Qwen3, Qwen3-VL, launched on September 23, 2025, with the flagship Qwen3-VL-235B-A22B-Instruct and Qwen3-VL-235B-A22B-Thinking variants.^[35] Smaller variants followed in October 2025, including 30B-A3B (Instruct/Thinking) on October 4, 4B and 8B sizes on October 15, and 2B and 32B sizes on October 21. The Qwen3-VL technical report was released on November 27, 2025.^[35]

Qwen3-Omni

Released on September 22-23, 2025, Qwen3-Omni is a natively end-to-end omni-modal large language model that can understand text, audio, images, and video while generating real-time speech.^[42] It uses a Thinker-Talker Mixture-of-Experts architecture in which the Thinker handles reasoning and multimodal understanding while the Talker produces audio tokens from the Thinker's hidden representations. Three open-source variants were released under the Apache 2.0 license: Qwen3-Omni-30B-A3B-Instruct, Qwen3-Omni-30B-A3B-Thinking, and Qwen3-Omni-30B-A3B-Captioner. The model achieves streaming latency as low as 234 ms for audio and 547 ms for video, and reaches state-of-the-art performance across 32 of 36 audio and audio-visual benchmarks.^[42] The Qwen3-Omni technical report was published on arXiv on September 23, 2025 (arXiv:2509.17765).

Qwen3.5

Announced on February 16, 2026, Qwen3.5 represents a mid-cycle architectural refresh of the Qwen3 line.^[43] The initial release was Qwen3.5-397B-A17B, a Mixture-of-Experts model with 397 billion total parameters and 17 billion active parameters per token, accompanied by a hosted Qwen3.5-Plus API offering a 1-million-token context window. Alibaba reported that the hybrid linear-attention plus sparse MoE design delivers decoding throughput 8.6 to 19 times faster than Qwen3-Max while retaining native multimodal capability.^[43]^[49] Subsequent open-source releases followed on February 24, 2026 (Qwen3.5-122B-A10B, Qwen3.5-35B-A3B, and Qwen3.5-27B dense) and March 2, 2026 (Qwen3.5-9B, 4B, 2B, and 0.8B).

Qwen3.5 introduces several key architectural changes:^[43]

Hybrid attention: A 3:1 ratio combining Gated Delta Networks (linear attention) with standard full attention, with three of every four transformer blocks using linear attention for near-linear scaling on long contexts.
Unified vision-language foundation: Natively multimodal from the ground up, with early-fusion training on trillions of multimodal tokens. This removed the need for a separate Qwen3.5-VL line.
Scalable reinforcement learning: Training across million-agent environments with progressively complex task distributions.
Expanded language coverage: Support for 201 languages and dialects, up from Qwen3's 119.^[43]^[49]

Qwen3.5-Omni

Released around March 30, 2026, Qwen3.5-Omni is the multimodal extension of Qwen3.5, supporting text, audio, image, and video understanding alongside real-time speech generation.^[44] The model targets latency-sensitive multimodal interaction with a 256K context window. Unlike Qwen3-Omni, the initial Qwen3.5-Omni release was kept proprietary, with access restricted to the Qwen Chat interface and Alibaba Cloud's API.

Qwen3.6

In April 2026, the Qwen3.6 series began rolling out, emphasizing stability, agentic coding, and a new "thinking preservation" feature that maintains reasoning context across multi-turn conversations.^[45] The initial open-source releases are Qwen3.6-35B-A3B (an MoE model with 3B active parameters, April 16, 2026) and Qwen3.6-27B (a dense model, April 22, 2026). The flagship proprietary Qwen3.6-Max-Preview debuted on April 20, 2026, and at release claimed the top score on six major coding benchmarks (SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode), supporting a 256K-token context window and adding a preserve_thinking feature for multi-turn agentic workflows.^[46]^[50]

Development Timeline

Date (UTC)	Generation / Model	Key Details
2023-04-11	Tongyi Qianwen (beta)	Initial corporate announcement by Alibaba Cloud for a company-scale LLM initiative.^[6]
2023-08-03	Qwen-7B / Qwen-7B-Chat	First broadly distributed open weights via ModelScope and Hugging Face. Quantized INT4 chat variant followed on 2023-08-21.^[9]
2023-09-13	Qwen (public release)	Model approved by Chinese government for public release.^[8]
2023-11-30	Qwen-72B	72-billion parameter model released, competitive with GPT-3.5.^[11]
2023-11-30	Qwen-Audio	Multimodal audio-language model supporting 30+ tasks and multiple audio types.^[26]
2024-02-05	Qwen1.5	Models ranging from 0.5B to 110B parameters with 32K context window.^[12]
2024-04	CodeQwen1.5-7B	Code-specialized model trained on 3 trillion tokens with 64K context window.^[13]
2024-06-06	Qwen2	Dense and MoE models (0.5B to 72B) trained on 7 trillion tokens with 128K context support.^[14]
2024-07-15	Qwen2 (tech report)	Technical report details five model sizes and efficiency optimizations.^[15]
2024-08-09	Qwen2-Audio	Updated audio-language model supporting 8+ languages, outperforming Gemini-1.5-pro on AIR-Bench.^[27]
2024-08-30	Qwen2-VL	Vision-language series with dynamic resolution and M-RoPE; 2B, 7B, and 72B sizes.^[28]
2024-09-19	Qwen2.5	18 trillion token training, 128K context, seven model sizes (0.5B to 72B).^[17]
2024-09-19	Qwen2.5-Coder	Code-specialized models (0.5B to 32B) trained on 5.5T tokens; 32B matches GPT-4o.^[19]
2024-09-18	Qwen2.5-Math	Math-specialized instruction models (1.5B/7B/72B) surpassing GPT-4o on MathVista.^[20]
2024-11-28	QwQ-32B-Preview	Open-source reasoning model preview, competing with OpenAI's o1.^[22]
2025-01-28	Qwen2.5-Max	Large-scale MoE model outperforming DeepSeek V3 on Arena-Hard, LiveBench, and GPQA.^[21]
2025-01-29	Qwen2.5-VL	Enhanced vision-language model with 1-hour video comprehension and temporal reasoning.^[29]
2025-03-05	QwQ-32B	Refined reasoning model; 90.6% MATH-500, 65.2% GPQA; Apache 2.0 license.^[23]
2025-03-26	Qwen2.5-Omni	End-to-end omni-modal model with Thinker-Talker architecture; arXiv:2503.20215.^[34]
2025-04-28	Qwen3	Thinking/non-thinking modes, 36 trillion tokens, 119 languages, Apache 2.0.^[25]
2025-05-14	Qwen3 (tech report)	arXiv:2505.09388 details hybrid thinking framework across 0.6B-235B scales.^[39]
2025-07-22	Qwen3-Coder	480B-A35B MoE for agentic coding, 256K context (1M with extrapolation).^[30]
2025-07-24	Qwen-MT	Translation model supporting 92 languages.^[31]
2025-08-04	Qwen-Image	20B MMDiT image generation model with complex text rendering.^[32]
2025-08-19	Qwen-Image-Edit	Image editing extension with precise text and appearance control.^[32]
2025-09-05	Qwen3-Max-Preview	Trillion-parameter MoE preview on Alibaba Cloud and OpenRouter.^[41]
2025-09-11	Qwen3-Next-80B-A3B	Hybrid Gated DeltaNet + sparse MoE, 10x throughput vs. Qwen3-32B.^[40]
2025-09-22	Qwen3-Omni	Open-source omni-modal Thinker-Talker MoE; arXiv:2509.17765.^[42]
2025-09-23	Qwen3-VL	235B-A22B Instruct/Thinking vision-language models; smaller sizes followed in October.^[35]
2025-09-23	Qwen3Guard	Safety guardrail model for real-time moderation.^[33]
2025-09-23	Qwen3-Max	Flagship trillion-parameter MoE; closed-source via Alibaba Cloud.^[41]
2026-02-16	Qwen3.5	Qwen3.5-397B-A17B with hybrid Gated DeltaNet attention, 201 languages.^[43]
2026-03-30	Qwen3.5-Omni	Proprietary multimodal extension of Qwen3.5 with 256K context.^[44]
2026-04-16	Qwen3.6-35B-A3B	MoE focused on agentic coding and thinking preservation.^[45]
2026-04-20	Qwen3.6-Max-Preview	Flagship proprietary preview topping six coding benchmarks.^[46]^[50]
2026-04-22	Qwen3.6-27B	Dense model optimized for coding and stability.^[45]

Architecture and Technical Features

Core Architecture

Qwen models are based on the Transformer architecture, the standard for modern LLMs. Key architectural features include:

Attention Mechanism: Uses self-attention with Group Query Attention (GQA) introduced in Qwen1.5 and expanded across the Qwen2 and later series. GQA groups multiple query heads under shared key-value heads, improving inference speed and reducing memory usage compared to standard multi-head attention.^[14] Beginning with Qwen3-Next (September 2025) and continuing in Qwen3.5 (February 2026), the architecture adopts a hybrid of Gated Delta Networks (linear attention) and standard gated full attention for efficient long-context scaling.^[40]^[43]
Tokenizer: Custom tokenizer with over 150,000 token vocabulary size, efficiently representing text from multiple languages and reducing token count for non-English text.^[12]
Position Embeddings: Evolution from Absolute Position Embeddings (ALiBi) in early models to Rotary Position Embeddings (RoPE) for better long-context performance. Multimodal variants use M-RoPE (Multimodal Rotary Position Embedding) to decompose positional information into 1D textual, 2D visual, and 3D video components. Qwen2.5-Omni introduces TMRoPE (Time-aligned Multimodal RoPE) for synchronizing video and audio timestamps.^[28]^[34]
Architecture Types: Both dense and Mixture of Experts (MoE) variants exist. MoE models activate only a subset of parameters (called "experts") per token, allowing larger total parameter counts with lower computational cost. For example, Qwen3-235B-A22B has 235 billion total parameters but activates just 22 billion per token, and Qwen3-Coder-480B-A35B activates 35 billion of its 480 billion parameters via 8 of 160 experts.^[25]^[30]

Training Data Scale

The evolution of training data across generations demonstrates aggressive scaling:

Generation	Training Tokens	Languages
Qwen (2023)	Not disclosed	Chinese, English, multilingual
Qwen2 (June 2024)	7 trillion	29 languages
Qwen2.5 (September 2024)	18 trillion	29+ core languages
Qwen2.5-Max (January 2025)	20+ trillion	29+ core languages
Qwen3 (April 2025)	36 trillion	119 languages and dialects
Qwen3-Max (September 2025)	~36 trillion	119+ languages
Qwen3.5 (February 2026)	Undisclosed	201 languages and dialects

The pre-training data includes high-quality Chinese language data, multilingual text, code, mathematics, and multimodal data, with extensive filtering and deduplication pipelines to ensure data quality.

Model Sizes and Variants

Qwen2 Model Sizes

Model	Parameters	Type	Context Length	Key Feature
Qwen2-0.5B	0.5B	Dense	32K	Ultra-lightweight for edge devices
Qwen2-1.5B	1.5B	Dense	32K	Low-resource deployment
Qwen2-7B	7B	Dense	128K	General-purpose, long context
Qwen2-57B-A14B	57B (14B active)	MoE	64K	Efficient MoE, performance of 30B dense
Qwen2-72B	72B	Dense	128K	Flagship, topped Open LLM Leaderboard

Qwen2.5 Model Sizes

Model	Parameters	Type	Context Length	Key Feature
Qwen2.5-0.5B	0.5B	Dense	128K	Smallest, for mobile/edge
Qwen2.5-1.5B	1.5B	Dense	128K	Light deployment
Qwen2.5-3B	3B	Dense	128K	New size tier
Qwen2.5-7B	7B	Dense	128K	Balanced performance/cost
Qwen2.5-14B	14B	Dense	128K	Mid-range
Qwen2.5-32B	32B	Dense	128K	High performance
Qwen2.5-72B	72B	Dense	128K	Flagship open-weight model

Qwen3 Model Sizes

Model	Parameters	Type	Context Length	Key Feature
Qwen3-0.6B	0.6B	Dense	32K	Ultra-lightweight
Qwen3-1.7B	1.7B	Dense	32K	Edge deployment
Qwen3-4B	4B	Dense	32K	Mobile-friendly
Qwen3-8B	8B	Dense	128K	General-purpose
Qwen3-14B	14B	Dense	128K	Mid-range performance
Qwen3-32B	32B	Dense	128K	High performance
Qwen3-30B-A3B	30B (3B active)	MoE	128K	Efficient MoE for constrained hardware
Qwen3-235B-A22B	235B (22B active)	MoE	128K	Flagship, competitive with top proprietary models
Qwen3-Next-80B-A3B	80B (3B active)	Hybrid MoE	256K	Hybrid Gated DeltaNet, 10x throughput^[40]
Qwen3-Coder-480B-A35B	480B (35B active)	MoE (160 experts/8 active)	256K (1M extrap.)	Agentic coding flagship^[30]
Qwen3-Max	~1T (proprietary)	MoE	API only	Closed-source flagship^[41]

Qwen3.5 Model Sizes

Model	Parameters	Type	Context Length	Release
Qwen3.5-0.8B	0.8B	Dense	Standard	March 2, 2026
Qwen3.5-2B	2B	Dense	Standard	March 2, 2026
Qwen3.5-4B	4B	Dense	Standard	March 2, 2026
Qwen3.5-9B	9B	Dense	Standard	March 2, 2026
Qwen3.5-27B	27B	Dense	Long	February 24, 2026
Qwen3.5-35B-A3B	35B (3B active)	Hybrid MoE	Long	February 24, 2026
Qwen3.5-122B-A10B	122B (10B active)	Hybrid MoE	Long	February 24, 2026
Qwen3.5-397B-A17B	397B (17B active)	Hybrid MoE	1M (Plus tier)	February 16, 2026 (flagship)^[43]

What is the difference between Qwen's thinking and non-thinking modes?

Qwen3 introduces a hybrid approach to problem-solving with two distinct inference modes:^[25]

Thinking Mode: The model takes time to reason step by step before delivering the final answer, similar to the approach used by OpenAI's o1 and DeepSeek-R1. This mode is suitable for complex problems in mathematics, coding, and scientific reasoning.
Non-Thinking Mode: Provides quick, near-instant responses for simpler questions where speed is prioritized over depth of reasoning.
Thinking Budget: Users can control how much "thinking" the model performs, allowing fine-grained trade-offs between response quality and latency depending on the task at hand. The Qwen3 technical report frames the thinking budget as a mechanism "allowing users to allocate computational resources adaptively during inference."^[39]

This dual-mode capability is available across all Qwen3 models and can be toggled within a single conversation. Subsequent generations (Qwen3-Next, Qwen3-Omni, Qwen3.5) released distinct "Instruct" (non-thinking) and "Thinking" model variants rather than always toggling within one checkpoint. Qwen3.6 introduces a "thinking preservation" mechanism to retain reasoning context across multi-turn conversations.^[45]

Multimodal Capabilities

Qwen-VL Series (Vision-Language)

The Qwen-VL series represents Qwen's multimodal models that process both text and images. Each generation has expanded the capabilities significantly.

Qwen2-VL

Released on August 30, 2024, Qwen2-VL introduced several architectural innovations for vision-language understanding:^[28]

Naive Dynamic Resolution: Processes images of varying resolutions by mapping them into a dynamic number of visual tokens, maintaining consistency between model input and the inherent information in the image.
Multimodal Rotary Position Embedding (M-RoPE): Decomposes positional embedding to capture 1D textual, 2D visual, and 3D video positional information for more effective multimodal fusion.
Video Understanding: Supports videos over 20 minutes in length, enabling video-based question answering and content creation.
Multilingual OCR: Understands text in images across most European languages, Japanese, Korean, Arabic, and Vietnamese.
Available in 2B, 7B, and 72B parameter sizes, with the model achieving state-of-the-art results on MathVista, DocVQA, RealWorldQA, and MTVQA benchmarks.

Qwen2.5-VL

Released on January 29, 2025, this version brought significant enhancements:^[29]

Extended Video Comprehension: Can process videos over 1 hour in length, with the ability to pinpoint specific moments within videos down to the exact second.
Dynamic FPS Sampling: Extends dynamic resolution to the temporal dimension, enabling comprehension of videos at various sampling rates.
Enhanced OCR: Multi-scenario, multi-language, and multi-orientation text recognition capabilities.
Visual Localization: Bounding box generation for object detection and spatial understanding.
Structured Output: Generation of structured data from documents, forms, and tables.
Available in 3B, 7B, and 72B sizes.

Qwen3-VL

Launched on September 23, 2025, the Qwen3-VL series initially shipped as Qwen3-VL-235B-A22B-Instruct and Qwen3-VL-235B-A22B-Thinking, with smaller 30B-A3B variants on October 4, 4B/8B on October 15, and 2B/32B on October 21.^[35] The technical report appeared on November 27, 2025. Key features include:

Visual Agent capabilities for PC and mobile GUI operation
Advanced spatial perception and 3D grounding
DeepStack architecture for fine-grained visual detail capture
Text-Timestamp Alignment for precise video event localization
Both Instruct and Thinking inference variants

Qwen-Audio Series

The Qwen-Audio line provides audio understanding capabilities integrated with language models.

Qwen-Audio (November 2023)

The original Qwen-Audio model, released on November 30, 2023, was a fundamental multi-task audio-language model supporting over 30 tasks across multiple audio types (human speech, natural sounds, music, and song). It achieved state-of-the-art results on Aishell1, CochlScene, ClothoAQA, and VocalSound benchmarks.^[26]

Qwen2-Audio (August 2024)

Released on August 9, 2024, Qwen2-Audio introduced two key improvements:^[27]

Voice Chat Mode: For the first time, users can give voice instructions directly to the audio-language model without separate Automatic Speech Recognition (ASR) modules.
Audio Analysis Mode: Processes speech, sound, and music with text-based instructions.
Supports 8+ languages and dialects including Chinese, English, Cantonese, French, Italian, Spanish, German, and Japanese.
Qwen2-Audio outperformed previous state-of-the-art models, including Gemini-1.5-pro, on the AIR-Bench evaluation for audio-centric instruction-following capabilities.
Optimized for audio clips under 30 seconds.

Qwen2.5-Omni

Released on March 26-27, 2025, with technical report on arXiv (2503.20215), Qwen2.5-Omni introduced a unique Thinker-Talker architecture:^[34]

Simultaneous text and speech generation
Real-time voice and video chat support
TMRoPE (Time-aligned Multimodal RoPE) for synchronizing video and audio timestamps
Block-wise processing in audio and visual encoders for streaming
Sliding-window DiT for low-latency streaming audio token decoding
Processing of text, images, videos, and audio inputs
Bilingual support (English/Chinese) with low-latency interaction

Qwen3-Omni

Released on September 22-23, 2025, with technical report (arXiv:2509.17765), Qwen3-Omni extended the Thinker-Talker paradigm into a Mixture-of-Experts architecture.^[42] Three Apache 2.0 variants were released:

Qwen3-Omni-30B-A3B-Instruct: Standard non-thinking variant.
Qwen3-Omni-30B-A3B-Thinking: Reasoning-focused variant.
Qwen3-Omni-30B-A3B-Captioner: Specialized for visual captioning.

Streaming latency reaches 234 ms for audio and 547 ms for video. Qwen3-Omni reached state-of-the-art performance on 32 of 36 audio and audio-visual benchmarks, including outperforming Gemini-2.5-Pro, Seed-ASR, and GPT-4o-Transcribe on key tasks.^[42]

Qwen3.5-Omni

Released around March 30, 2026, Qwen3.5-Omni is a proprietary multimodal model supporting text, audio, image, and video understanding with real-time speech generation, providing a 256K-token context window.^[44] Access is limited to Qwen Chat and Alibaba Cloud's API at launch.

QVQ (Visual Reasoning Model)

QVQ-72B-Preview is an experimental research model for enhanced visual reasoning that scored 70.3% on MMMU (Multimodal Massive Multi-task Understanding), with superior performance on MathVision and OlympiadBench for advanced multidisciplinary understanding.^[3]

Specialized Models

Coding Models

Qwen's coding model lineup has evolved through three generations:

Model	Release	Parameters	Training Data	Context	Performance
CodeQwen1.5-7B	April 2024	7B	3T tokens of code	64K	Strong on text-to-SQL and bug fixing^[13]
Qwen2.5-Coder	September 2024	0.5B to 32B	5.5T tokens of code	128K	88.2 on HumanEval; 32B matches GPT-4o^[19]
Qwen3-Coder	July 2025	480B-A35B (MoE)	Extended code corpus	256K (1M extrap.)	Agentic coding flagship comparable to Claude Sonnet 4^[30]

Qwen2.5-Coder represented a major leap, with six model sizes covering everything from lightweight on-device code completion (0.5B) to full-featured code generation, reasoning, and fixing at the 32B scale. The Qwen2.5-Coder-32B-Instruct model became the state-of-the-art open-source code LLM upon release, matching GPT-4o's coding abilities. Qwen3-Coder-480B-A35B-Instruct subsequently set new state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use benchmarks.^[30]

Mathematics Models

Model	Release	Parameters	Key Capabilities
Qwen2-Math	June 2024	1.5B, 7B, 72B	Chain-of-Thought mathematical reasoning
Qwen2.5-Math	September 2024	1.5B, 7B, 72B	CoT, PoT, and TIR; 72.0 on MathVista; surpasses GPT-4o^[20]

Qwen2.5-Math supports both Chinese and English and uses multiple reasoning approaches: Chain-of-Thought (natural language step-by-step), Program-of-Thought (generating code to solve problems), and Tool-Integrated Reasoning (combining language reasoning with computational tools). Even the small 1.5B variant achieves competitive performance against much larger general-purpose models.

Other Specialized Models

Variant	Release Date	Focus	Key Features
Qwen-MT	July 2025	Translation	92 languages covering 95% of global population; reinforcement learning for accuracy^[31]
Qwen-Image	August 2025	Image Generation	20B MMDiT model; complex text rendering, multi-line layouts^[32]
Qwen-Image-Edit	August 2025	Image Editing	Precise text editing, semantic and appearance control^[32]
Qwen3Guard	September 2025	Safety	Real-time moderation, risk classification; state-of-the-art on multilingual safety benchmarks^[33]

Performance and Benchmarks

Cross-Generation Comparison

The following table summarizes benchmark improvements across Qwen generations for flagship models:

Benchmark	Qwen2-72B	Qwen2.5-72B-Instruct	Qwen3-235B-A22B
MMLU	84.2	86.1	88.5 (MMLU-Redux)
MATH	69.0	83.1	85.7 (AIME 2024)
LiveCodeBench	32.2	55.5	70.7
Arena-Hard	N/A	N/A	95.6

Qwen2.5-Max Benchmarks (January 2025)

Benchmark	Qwen2.5-Max	DeepSeek V3	Claude 3.5 Sonnet	GPT-4o
Arena-Hard	89.4	85.5	85.2	Comparable
MMLU-Pro	76.1	75.9	78.0	Comparable
GSM8K	94.5	89.3	N/A	N/A
LiveCodeBench	Leading	Below Qwen2.5-Max	N/A	N/A
GPQA-Diamond	Leading	Below Qwen2.5-Max	N/A	N/A

QwQ-32B Reasoning Benchmarks (March 2025)

Benchmark	QwQ-32B	DeepSeek-R1	OpenAI o1-preview
MATH-500	90.6%	Comparable	Below QwQ-32B
AIME 2024	50.0%	Comparable	Below QwQ-32B
GPQA	65.2%	Comparable	Below QwQ-32B
LiveCodeBench	50.0%	Comparable	N/A
Parameters	32.5B	671B	Proprietary

QwQ-32B's strong performance at just 32.5 billion parameters, compared to DeepSeek-R1's 671 billion, highlights the efficiency gains achieved through Alibaba's reinforcement learning training methodology.

Qwen3 Performance (April 2025)

Benchmark	Qwen3-235B-A22B	Qwen3-30B-A3B	QwQ-32B
Arena-Hard	95.6	91.0	89.5
AIME 2024	85.7	80.4	50.0
LiveCodeBench	70.7	N/A	50.0
CodeForces Elo	N/A	1974	N/A

Qwen3-Max Agentic Benchmarks (September 2025)

The trillion-parameter Qwen3-Max emphasized agentic coding and tool use. On the SWE-bench Verified benchmark of real GitHub issue resolution, Qwen3-Max-Instruct scored 69.6, and on Tau2-Bench (agentic tool use) it scored 74.8, which Alibaba reported as surpassing Claude Opus 4 and DeepSeek V3.1.^[48]

Benchmark	Qwen3-Max-Instruct	Comparison
SWE-bench Verified	69.6	Strong agentic coding among frontier models^[48]
Tau2-Bench	74.8	Reported to surpass Claude Opus 4 and DeepSeek V3.1^[48]

Capabilities

Qwen models support a comprehensive array of tasks across multiple domains:

Multilingual Processing: Core models handle 29 languages, with Qwen3 extending to 119 languages and dialects, Qwen3.5 expanding to 201 languages and dialects, and Qwen-MT covering 92 languages for translation (representing 95% of the global population).^[25]^[31]^[43]
Long-Context Understanding: 128K tokens across Qwen2/Qwen2.5/Qwen3, 256K tokens natively in Qwen3-Next and Qwen3-Coder (extrapolable to 1M for Qwen3-Coder), and a 1-million-token context window in the hosted Qwen3.5-Plus tier.^[30]^[40]^[43]
Coding and Mathematics: Specialized models achieve state-of-the-art results, with Qwen2.5-Coder scoring 88.2 on HumanEval, Qwen2.5-Math achieving 72.0 on MathVista, and Qwen3-Coder reaching parity with Claude Sonnet 4 on agentic coding benchmarks.
Multimodal Tasks: Image understanding, video comprehension (up to 1+ hours), audio processing, image generation and editing, and cross-modal reasoning with low-latency speech generation in the Qwen-Omni line.
Safety and Moderation: Qwen3Guard provides real-time detection with categorized risk levels for content filtering.
Agentic and Reasoning: Models like QwQ-32B, Qwen3, and Qwen3-Coder support advanced chain-of-thought reasoning, tool use, multi-step tasks, and agentic workflows for autonomous task completion.
Structured Data Analysis: Enhanced capabilities for processing tables, forms, and structured documents, with models able to generate structured JSON output from visual inputs.
Real-time Interaction: Support for low-latency voice and video chat through Qwen2.5-Omni and Qwen3-Omni, with streaming latency as low as 234 ms for audio in Qwen3-Omni.^[42]

Open-Source Strategy and Licensing

Alibaba Cloud's approach to open-sourcing the Qwen family has evolved significantly over time, becoming increasingly permissive.

Is Qwen open source?

Most open-weight Qwen models are open source. Since the Qwen3 generation, all open-weight Qwen3 models have been released under the permissive Apache 2.0 license, which allows any organization to use, modify, and distribute the models without restriction.^[25] Alibaba retains closed weights only for its flagship "Max" and "Omni" hosted tiers in some generations (for example Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, and Qwen3.6-Max), which are served exclusively through Alibaba Cloud's API. By January 2026, Alibaba had open-sourced nearly 400 models in the Qwen lineup, the open foundation behind more than 180,000 derivative versions on Hugging Face.^[47]

Licensing Timeline

Period	License	Scope
2023 (Qwen)	Tongyi Qianwen LICENSE	Restricted; commercial use over 100M MAU requires approval
2024 (Qwen1.5, Qwen2)	Mixed	Most models Apache 2.0; some larger models under Tongyi Qianwen license
2024 (Qwen2.5)	Mostly Apache 2.0	Most models Apache 2.0; select variants under Qianwen license
2025 (Qwen3)	Apache 2.0	All open-weight Qwen3 models released under Apache 2.0; Qwen3-Max closed-source
2025 (Qwen3-Omni, Qwen3-Coder, Qwen3-Next, Qwen3-VL)	Apache 2.0	All open-weight variants released for research and commercial use
2026 (Qwen3.5)	Apache 2.0 (open variants)	Open-weight variants 0.8B-397B Apache 2.0; Qwen3.5-Plus hosted; Qwen3.5-Omni closed-source
2026 (Qwen3.6)	Apache 2.0 (open variants)	Qwen3.6-27B and 35B-A3B open under Apache 2.0; Qwen3.6-Max closed-source

The shift to Apache 2.0 licensing across most of the Qwen3 and Qwen3.5 lineups removed barriers for commercial adoption, allowing any organization to use, modify, and distribute the models without restrictions. This open approach has been a major driver of Qwen's rapid community adoption and the proliferation of derivative models, although Alibaba retains closed weights for its flagship "Max" and "Omni" hosted tiers in some generations.

Distribution Channels

Qwen models are distributed through multiple platforms:

Hugging Face: Primary distribution platform for the international community^[2]
ModelScope: Alibaba's model hosting platform, popular in China^[2]
GitHub: Source code, training scripts, and documentation^[36]
Kaggle: Additional distribution for some Qwen3-Next and Qwen3-Coder variants^[40]

Deployment and Accessibility

Commercial API Services

Alibaba Cloud provides commercial access to Qwen models through several channels:^[37]

Alibaba Cloud Model Studio: A managed platform for deploying and fine-tuning Qwen models, offering both OpenAI-compatible APIs and the native DashScope SDK.
DashScope API: The native API interface providing the most complete set of features and parameters, with regional endpoints for China (Beijing), International, US, and Hong Kong.
Qwen Chat (chat.qwen.ai): A free web-based chat interface for interacting with the latest Qwen models directly.
OpenAI-Compatible API: Model Studio provides an OpenAI-compatible endpoint, allowing developers to switch from OpenAI to Qwen with minimal code changes.

The API provides access to models not available as open weights, including Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, and Qwen3.6-Max-Preview.

Deployment Frameworks

Qwen models support deployment through a variety of open-source inference frameworks:^[36]

vLLM: High-throughput inference with PagedAttention
SGLang: Large-scale deployment with structured generation
TensorRT-LLM: NVIDIA GPU optimization for production workloads
Ollama: Local deployment with simple setup for individual developers
llama.cpp: CPU and GPU inference using GGUF quantized formats
Integration with popular AI frameworks including LangChain, LlamaIndex, and Transformers

Community Adoption and Impact

The Qwen model family has achieved remarkable adoption milestones since its initial open-source release in 2023. In January 2026, China's official Xinhua news agency reported that Qwen "leads global open-source AI community with 700 million downloads," having overtaken Meta's Llama in cumulative downloads by October 2025.^[47]

How many times has Qwen been downloaded?

Metric	Value	Date
Cumulative downloads	~700 million	January 2026
Derivative versions on Hugging Face	180,000+	January 2026^[47]
Including all tagged models	200,000+	Early 2026
Models open-sourced in the Qwen lineup	Nearly 400	January 2026^[47]
Most-downloaded LLM family on Hugging Face	Yes (surpassed LLaMA)	2025
Top 10 Open LLM Leaderboard models built on Qwen	10 out of 10	February 2025

In December 2025, Qwen's single-month downloads exceeded the combined total of the next eight most popular model families (Meta, DeepSeek, OpenAI, Mistral, Nvidia, Zhipu.AI, Moonshot, and MiniMax).^[4]^[47] Alibaba as an organization now has more derivative models on Hugging Face than both Google and Meta combined. Since January 2025, Chinese fine-tuned or derivative models accounted for 63% of all new fine-tuned or derivative models released on Hugging Face, with Qwen serving as the primary base.^[4]

Ecosystem Influence

The development trajectory reflects Alibaba's ambition to position Qwen as a foundational "operating system" for AI, analogous to Android in mobile computing.^[38] Fine-tuned versions created by the community, such as "Liberated Qwen" by Abacus AI, have removed content restrictions for specialized use cases.^[1] The breadth of community-built models spans applications in healthcare, legal, finance, education, customer service, and creative industries.

Applications

Qwen powers diverse applications across industries:

Enterprise AI Solutions: Document analysis, customer service automation, and business intelligence integrated across Alibaba's ecosystem of products.
Software Development: Code generation, debugging, code review, and agentic coding workflows through Qwen2.5-Coder and Qwen3-Coder.
Education: Personalized tutoring, especially in mathematics (via Qwen2.5-Math) and programming.
Healthcare: Medical document analysis, clinical note processing, and research assistance.
E-commerce: Product descriptions, customer support, and recommendation systems within Alibaba's retail platforms.
Creative Content: Story writing, article generation, image creation (Qwen-Image), and image editing.
Translation: Professional-grade translation across 92 languages through Qwen-MT.
Research: Academic paper analysis, scientific computing, and data analysis.

How many languages does Qwen support?

Qwen models provide extensive multilingual support. Qwen2 supported 29 languages, Qwen3 supports 119 languages and dialects, and Qwen3.5 expands coverage to 201 languages and dialects, while the Qwen-MT translation model spans 92 languages representing roughly 95% of the global population.^[25]^[31]^[43] Core language support includes:

Chinese (Simplified and Traditional)
English
French
Spanish
Portuguese
German
Italian
Russian
Japanese
Korean
Vietnamese
Thai
Arabic
Turkish
Indonesian
Dutch
Polish
Swedish
Hindi
Hebrew
Finnish
Danish
Norwegian
Czech
Hungarian
Romanian
Greek
Bulgarian
Ukrainian

Limitations and Considerations

While Qwen models demonstrate strong capabilities, they have known limitations:^[39]

Language mixing: Models may unexpectedly switch between languages during generation, particularly in multilingual prompts.
Circular reasoning: Can get stuck in repetitive reasoning loops, particularly in complex multi-step problems when using thinking mode.
Safety concerns: Despite Qwen3Guard, production deployments require additional safety layers and content filtering.
Performance gaps: While strong in math and coding, improvements are still needed in common sense reasoning and nuanced cultural understanding.
Context limitations: Although supporting 128K to 1M token contexts depending on the variant, performance may degrade with extremely long inputs, especially for tasks requiring precise recall from the middle of long documents.
Computational requirements: Larger models (72B dense, 235B and 480B MoE, 397B MoE) require significant GPU resources. Even MoE models, while efficient at inference, still demand multi-GPU setups for self-hosted deployment.
API-only models: Some of the most capable models (Qwen2.5-Max, Qwen3-Max, Qwen3.5-Plus, Qwen3.5-Omni, Qwen3.6-Max) are available only through Alibaba Cloud's API, limiting self-hosted deployment options for the highest-performing variants.

References

"Qwen." Wikipedia. https://en.wikipedia.org/wiki/Qwen ↩
"Qwen (Qwen)." Hugging Face. https://huggingface.co/Qwen ↩
"Alibaba Cloud Unveils New Research Model for Enhanced Visual Reasoning." Alibaba Cloud Community, 2024. ↩
"State of Open Source on Hugging Face: Spring 2026." Hugging Face Blog, 2026. ↩
"All top 10 open-source LLMs on Hugging Face's Open LLM Leaderboard." Hugging Face, February 2025. ↩
"Alibaba Cloud Summit 2023: Tongyi Qianwen Announcement." Alibaba Group, April 2023. ↩
"Alibaba integrates Tongyi Qianwen into DingTalk and Tmall Genie." Alizila, 2023. ↩
"Qwen approved by Chinese government for public release." September 2023. ↩
"Qwen-7B open-source release." ModelScope/Hugging Face, August 2023. ↩
"Qwen-1.8B release for low-latency environments." Alibaba Cloud, November 2023. ↩
"Qwen-72B: competitive with GPT-3.5." Alibaba Cloud, December 2023. ↩
"Qwen1.5 release: 0.5B to 110B with 32K context." Alibaba Cloud, February 2024. ↩
"CodeQwen1.5-7B." Hugging Face. https://huggingface.co/Qwen/CodeQwen1.5-7B ↩
"Hello Qwen2." Qwen Blog. https://qwenlm.github.io/blog/qwen2/ ↩
"Qwen2 Technical Report." arXiv:2407.10671, July 2024. https://arxiv.org/abs/2407.10671 ↩
"Alibaba Cloud's Qwen2 with Enhanced Capabilities Tops LLM Leaderboard." Alizila, June 2024. ↩
"Qwen2.5: A Party of Foundation Models!" Qwen Blog. https://qwenlm.github.io/blog/qwen2.5/ ↩
"Qwen2.5-LLM: Extending the Boundary of LLMs." Alibaba Cloud Community, 2024. ↩
"Qwen2.5-Coder Series: Powerful, Diverse, Practical." Alibaba Cloud Community, 2024. ↩
"Qwen2.5-Math: The world's leading open-sourced mathematical LLMs." Qwen Blog. https://qwenlm.github.io/blog/qwen2.5-math/ ↩
"Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model." Qwen Blog. https://qwenlm.github.io/blog/qwen2.5-max/ ↩
"QwQ: Reflect Deeply on the Boundaries of the Unknown." Qwen Blog. https://qwenlm.github.io/blog/qwq-32b-preview/ ↩
"Alibaba Cloud Unveils QwQ-32B: A Compact Reasoning Model with Cutting-Edge Performance." Alibaba Cloud Community, March 2025. ↩
"Alibaba shares jump on new open-source QwQ-32B reasoning model." SiliconANGLE, March 2025. ↩
"Qwen3: Think Deeper, Act Faster." Qwen Blog. https://qwenlm.github.io/blog/qwen3/ ↩
"Qwen-Audio: A Versatile Audio Understanding Model." GitHub. https://github.com/QwenLM/Qwen-Audio ↩
"Qwen2-Audio Technical Report." arXiv:2407.10759, 2024. ↩
"Qwen2-VL: To See the World More Clearly." Qwen Blog. https://qwenlm.github.io/blog/qwen2-vl/ ↩
"Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL!" Qwen Blog. https://qwenlm.github.io/blog/qwen2.5-vl/ ↩
"Qwen3-Coder: Agentic Coding in the World." Qwen Blog. https://qwenlm.github.io/blog/qwen3-coder/ and GitHub. https://github.com/QwenLM/Qwen3-Coder ↩
"Qwen-MT: Where Speed Meets Smart Translation." Qwen Blog, July 24, 2025. ↩
"Qwen-Image: Crafting with Native Text Rendering." Qwen Blog, August 4, 2025. "Qwen-Image-Edit," August 19, 2025. ↩
"Qwen3Guard: Real-time Safety for Your Token Stream." Qwen Blog, September 23, 2025. ↩
"Qwen2.5-Omni Technical Report." arXiv:2503.20215. https://arxiv.org/abs/2503.20215 ↩
"Qwen3-VL." GitHub. https://github.com/QwenLM/Qwen3-VL ↩
"Qwen GitHub organization." https://github.com/QwenLM ↩
"Alibaba Cloud Model Studio documentation." https://www.alibabacloud.com/help/en/model-studio/ ↩
"Alibaba's ambition to position Qwen as a foundational OS for AI." Alizila, 2024. ↩
"Qwen3 Technical Report." arXiv:2505.09388, May 14, 2025. https://arxiv.org/abs/2505.09388 ↩
"Qwen3-Next: A New Generation of Ultra-Efficient Model Architecture." Alibaba Cloud Community, September 11, 2025. https://www.alibabacloud.com/blog/602536 ↩
"Alibaba Releases Qwen3-Max-Preview." Alibaba Cloud / Qwen Team, September 5, 2025; full Qwen3-Max release September 23-24, 2025. ↩
"Qwen3-Omni Technical Report." arXiv:2509.17765, September 23, 2025. https://arxiv.org/abs/2509.17765 ↩
"Qwen3.5: Towards Native Multimodal Agents." Qwen Team, February 16, 2026. https://qwen.ai/blog?id=qwen3.5 and https://github.com/QwenLM/Qwen3.5 ↩
"Alibaba Qwen Team Releases Qwen3.5-Omni." March 30, 2026. ↩
"Qwen3.6: Stability and Real-World Utility." GitHub. https://github.com/QwenLM/Qwen3.6 ↩
"Alibaba releases Qwen3.6-Max preview with stronger instruction-following capabilities." April 20, 2026. ↩
"Alibaba's Qwen leads global open-source AI community with 700 million downloads." Xinhua, January 13, 2026. https://english.news.cn/20260113/004b0522f987475cbf83ffc3a8d009aa/c.html ↩
"Alibaba's Qwen3-Max Joins the Frontier of Trillion-Parameter AI Models." AIwire / HPCwire, September 24, 2025; SWE-bench Verified 69.6 and Tau2-Bench 74.8 reported by Alibaba Cloud / Qwen Team. ↩
"Qwen3.5: Nobody Agrees on Attention Anymore." Maxime Labonne, Hugging Face Blog, February 2026. https://huggingface.co/blog/mlabonne/qwen35 ↩
"Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving." Qwen Team, April 20, 2026. https://qwen.ai/blog?id=qwen3.6-max-preview ↩

External Links

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

8 revisions by 1 contributors · full history

Suggest edit

What is Qwen?

History

Early Development

Open Source Release

Qwen1.5

Qwen2

Qwen2.5

Qwen2.5-Max

QwQ: Reasoning Models

When was Qwen3 released?

Qwen3-Coder

Qwen3-Next

Qwen3-Max

Qwen3-VL

Qwen3-Omni

Qwen3.5

Qwen3.5-Omni

Qwen3.6

Development Timeline

Architecture and Technical Features

Core Architecture

Training Data Scale

Model Sizes and Variants

Qwen2 Model Sizes

Qwen2.5 Model Sizes

Qwen3 Model Sizes

Qwen3.5 Model Sizes

What is the difference between Qwen's thinking and non-thinking modes?

Multimodal Capabilities

Qwen-VL Series (Vision-Language)

Qwen2-VL

Qwen2.5-VL

Qwen3-VL

Qwen-Audio Series

Qwen-Audio (November 2023)

Qwen2-Audio (August 2024)

Qwen2.5-Omni

Qwen3-Omni

Qwen3.5-Omni

QVQ (Visual Reasoning Model)

Specialized Models

Coding Models

Mathematics Models

Other Specialized Models

Performance and Benchmarks

Cross-Generation Comparison

Qwen2.5-Max Benchmarks (January 2025)

QwQ-32B Reasoning Benchmarks (March 2025)

Qwen3 Performance (April 2025)

Qwen3-Max Agentic Benchmarks (September 2025)

Capabilities

Open-Source Strategy and Licensing

Is Qwen open source?

Licensing Timeline

Distribution Channels

Deployment and Accessibility

Commercial API Services

Deployment Frameworks

Community Adoption and Impact

How many times has Qwen been downloaded?

Ecosystem Influence

Applications

How many languages does Qwen support?

Limitations and Considerations

See Also

References

External Links

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here (24 of 213)

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3