Qwen
Qwen (also called Tongyi Qianwen, Chinese: 通义千问; pinyin: Tōngyì Qiānwèn; literally "to comprehend the meaning, [and to answer] a thousand kinds of questions") is a family of large language models (LLMs) and multimodal models developed by Alibaba Cloud, the cloud computing division of Chinese technology company Alibaba Group.[1] The name "Qwen" is derived from the Chinese brand Tongyi Qianwen and refers to the large language model family built by Alibaba Cloud's Qwen Team.[2] As of February 2025, Qwen models have become one of the most widely adopted open-source models globally, with more than 78,000 derivative models developed on Hugging Face based on the Qwen family since it was first open-sourced in 2023, and over 40 million downloads.[3] All top 10 open-source LLMs on Hugging Face's Open LLM Leaderboard were trained and developed on updated open-source versions of Qwen.[4]
History
Early Development
Alibaba first launched a beta version of Qwen on April 11, 2023, during the Alibaba Cloud Summit under the name Tongyi Qianwen.[5] The initial architecture was based on the LLaMA framework developed by Meta AI.[1] Initially, it was integrated into various Alibaba business applications, including the workplace collaboration tool DingTalk and the voice assistant Tmall Genie.[6] The model received approval from the Chinese government and was publicly released in September 2023.[7]
Open Source Release
In a significant move to foster a broader AI ecosystem, Alibaba Cloud began open-sourcing its models in August 2023. The first models released were Qwen-7B and its chat-fine-tuned variant, Qwen-7B-Chat.[8] This was followed by the release of Qwen-1.8B in November 2023, aimed at low-latency and resource-constrained environments.[9] In December 2023, Alibaba released the 72B parameter model, which demonstrated performance comparable to leading proprietary models like GPT-3.5 on several benchmarks.[10]
Development Timeline
| Date (UTC) | Generation / model | Key details |
|---|---|---|
| 2023-04-11 | Tongyi Qianwen (beta) | Initial corporate announcement by Alibaba Cloud for a company-scale LLM initiative.[5] |
| 2023-08-03 | Qwen-7B / Qwen-7B-Chat | First broadly distributed open weights via ModelScope and Hugging Face. Quantized INT4 chat variant followed on 2023-08-21.[11] |
| 2023-09-13 | Qwen (public release) | Model approved by Chinese government for public release.[7] |
| 2023-11-30 | Qwen-72B | 72-billion parameter model released, competitive with GPT-3.5.[12] |
| 2024-02-05 | Qwen1.5 | Models ranging from 0.5B to 110B parameters with 32K context window.[13] |
| 2024-06-06 | Qwen2 | Enhanced capabilities with dense and sparse models, up to 72B parameters.[14] |
| 2024-07-15 | Qwen2 (tech report) | Technical report describes sizes from 0.5B to 72B and an MoE variant (57B-A14B) with efficiency optimizations and long-context support.[15] |
| 2024-09-19 | Qwen2.5 | 18 trillion token training, multilingual support, 128K context.[16] |
| 2024-09-18 | Qwen2.5-Math | Math-specialized instruction models (1.5B/7B/72B).[17] |
| 2024-09 | Qwen2-VL | Vision-language series with dynamic resolution tokenization strategy.[18] |
| 2024-11-28 | QwQ-32B-Preview | Open-source reasoning model designed to compete with OpenAI's o1.[19] |
| 2025-01-28 | Qwen2.5-Max | Large-scale MoE model claiming superiority over DeepSeek V3.[20] |
| 2025-01-29 | Qwen2.5-VL | Enhanced video understanding with temporal reasoning.[21] |
| 2025-04-28 | Qwen3 | Thinking/non-thinking modes, enhanced reasoning, 36 trillion tokens.[22] |
| 2025-07-23 | Qwen3-Coder | Open-source coding-focused models emphasizing agentic coding workflows.[23] |
| 2025-07-24 | Qwen-MT | Translation model supporting 92 languages.[24] |
| 2025-08-04 | Qwen-Image | Image generation model with complex text rendering.[25] |
| 2025-09-05 | Qwen3-Max | Flagship hosted model with competitive performance.[22] |
| 2025-09-11 | Qwen3-Next | Optimized model with >10x throughput improvement.[22] |
| 2025-09-23 | Qwen3Guard | Safety guardrail model for real-time moderation.[26] |
Architecture and Technical Features
Core Architecture
Qwen models are based on the Transformer architecture, which is the standard for modern LLMs. Key architectural features include:
- Attention Mechanism: Uses self-attention mechanism. The Qwen2 series introduced Group Query Attention (GQA) in larger models to improve inference speed and reduce memory usage.[14]
- Tokenizer: Custom tokenizer with over 150,000 token vocabulary size, efficiently representing text from multiple languages and reducing token count for non-English text.[13]
- Position Embeddings: Evolution from Absolute Position Embeddings (ALiBi) to Rotary Position Embeddings (RoPE) for better long-context performance, with multimodal variants using M-RoPE (Multimodal Rotary Position Embedding).[18]
- Architecture Types: Both dense and Mixture of Experts (MoE) variants, with MoE models activating only a subset of parameters per token for efficiency.[22]
Training Data Scale
The evolution of training data demonstrates significant scaling:
- Qwen: Initial training on multilingual datasets
- Qwen2: 7 trillion tokens[15]
- Qwen2.5: 18 trillion tokens[27]
- Qwen2.5-Max: Over 20 trillion tokens[20]
- Qwen3: 36 trillion tokens in 119 languages and dialects[1]
The pre-training data includes high-quality Chinese language data, multilingual text, code, mathematics, and multimodal data, supporting over 29 core languages and up to 92 languages in translation variants.
Model Sizes and Variants
Qwen3 Dense Models
Qwen3 MoE Models
- Qwen3-30B-A3B (30 billion total parameters, 3 billion activated)
- Qwen3-235B-A22B (235 billion total parameters, 22 billion activated)
Thinking and Non-Thinking Modes
Qwen3 introduces a hybrid approach to problem-solving with two distinct modes:[22]
- Thinking Mode: The model takes time to reason step by step before delivering the final answer, ideal for complex problems requiring deeper thought
- Non-Thinking Mode: Provides quick, near-instant responses suitable for simpler questions where speed is prioritized
- Thinking Budget: Users can control how much "thinking" the model performs based on the task at hand
Multimodal Capabilities
Qwen-VL Series
The Qwen-VL (Vision-Language) series represents Qwen's multimodal models that can process both text and images:
Qwen2-VL
Released in August 2024, featuring:[28]
- Naive Dynamic Resolution mechanism for processing images of varying resolutions
- Multimodal Rotary Position Embedding (M-RoPE) for effective fusion of positional information
- Support for videos over 20 minutes
- Understanding of images at various resolutions and ratios
- Fine-grained resolution and document parsing
Qwen2.5-VL
Released in January 2025, featuring:[21]
- Enhanced OCR recognition capabilities
- Multi-scenario, multi-language, and multi-orientation text recognition
- Visual localization with bounding boxes
- Structured output generation for documents, forms, and tables
- Temporal reasoning for video understanding
- Dynamic resolution and multi-image analysis
Qwen3-VL
The latest vision-language model featuring:[29]
- Visual Agent capabilities for PC/mobile GUI operation
- Advanced spatial perception and 3D grounding
- DeepStack architecture for fine-grained visual detail capture
- Text-Timestamp Alignment for precise video event localization
Qwen2.5-Omni
End-to-end multimodal model with unique capabilities:[30]
- Thinker-Talker architecture for simultaneous text and speech generation
- Real-time voice and video chat support
- TMRoPE (Time-aligned Multimodal RoPE) for synchronizing video and audio timestamps
- Processing of text, images, videos, and audio inputs
- Bilingual support (English/Chinese) with low-latency interaction
QVQ (Visual Reasoning Model)
QVQ-72B-Preview is an experimental research model for enhanced visual reasoning:[3]
- 70.3% on MMMU (Multimodal Massive Multi-task Understanding) benchmark
- Superior performance on MathVision and OlympiadBench
- Advanced multidisciplinary understanding and reasoning abilities
Qwen-Audio
Large language audio models supporting:[31]
- Cross-modal processing between text and audio
- Support for 30+ languages
- Speech recognition and sound interpretation
- Processing of audio up to 30 minutes
Specialized Models
| Variant | Release Date | Parameters | Focus | Key Features |
|---|---|---|---|---|
| Qwen2.5-Coder | September 2024 | 0.5B to 32B | Coding | Code generation, reasoning, fixing; 88.2 on HumanEval; state-of-the-art open-source code model[16] |
| Qwen2.5-Math | September 2024 | 1.5B to 72B | Mathematics | Mathematical reasoning with CoT, PoT, and TIR; 72.0 on MathVista; surpasses GPT-4o[17] |
| Qwen-MT | July 2025 | Based on Qwen3 | Translation | 92 languages covering 95% of global population; reinforcement learning for accuracy[24] |
| Qwen3Guard | September 2025 | Based on Qwen3 | Safety | Real-time moderation, risk classification; SOTA on multilingual safety benchmarks[26] |
| QwQ-32B-Preview | November 2024 | 32B | Reasoning | Enhanced multi-step reasoning; outperforms o1 in benchmarks; Apache 2.0 license[32] |
| Qwen-Image | August 2025 | 20B | Image Generation | Complex text rendering, multi-line layouts, high visual quality[25] |
| Qwen-Image-Edit | August 2025 | Based on Qwen-Image | Image Editing | Precise text editing, semantic/appearance control[25] |
Performance and Benchmarks
Qwen3 Performance
Qwen3-235B-A22B achieves competitive results compared to other top-tier models:[22]
| Benchmark | Qwen3-235B-A22B | Qwen3-30B-A3B | QwQ-32B |
|---|---|---|---|
| Arena-Hard | 95.6 | 91.0 | 89.5 |
| AIME 2024 | 85.7 | 80.4 | Superior |
| LiveCodeBench | 70.7 | - | Strong |
| CodeForces Elo | - | 1974 | - |
Comparative Benchmarks
| Model | MMLU-Redux (EN) | HumanEval (EN) | MathVista (EN) | Notes |
|---|---|---|---|---|
| Qwen3-235B-A22B | 88.5 | N/A | N/A | Highest in series for general knowledge |
| Qwen2.5-72B-Instruct | 68.8 (MMLU-Pro) | N/A | N/A | Instruction-tuned |
| Qwen2.5-Coder-32B | N/A | 88.2 | N/A | Coding SOTA |
| Qwen2.5-Math-32B | N/A | N/A | 72.0 | Math reasoning champion |
| Qwen2.5-Max | Comparable to GPT-4o | Surpasses DeepSeek V3 | N/A | Overall competitive |
Additional benchmarks include CMMLU for Chinese language understanding, GSM8K for math word problems, and BFCL for tool and function-calling capabilities.[33]
Capabilities
Qwen models support a comprehensive array of tasks:
- Multilingual Processing: Core models handle 29 languages, with Qwen-MT extending to 92 languages covering 95% of the global population[24]
- Long-Context Understanding: Up to 128K tokens, enabling analysis of extensive documents, codebases, or conversations
- Coding and Mathematics: Specialized models achieve state-of-the-art results with Qwen2.5-Coder scoring 88.2 on HumanEval and Qwen2.5-Math achieving 72.0 on MathVista
- Multimodal Tasks: Image generation/editing, video grounding, audio transcription, and multi-modal reasoning
- Safety and Moderation: Qwen3Guard provides real-time detection with categorized risk levels
- Agentic and Reasoning: Models like QwQ-32B support advanced chain-of-thought reasoning, tool use, and multi-step tasks
- Structured Data Analysis: Enhanced capabilities for processing tables, forms, and structured documents
- Real-time Interaction: Support for low-latency voice and video chat through Omni models
Deployment and Accessibility
Open Source Availability
Most Qwen models are released under the Apache 2.0 license, making them available for both research and commercial use.[22] However, some larger models like the original Qwen-72B were released under a more restrictive "Tongyi Qianwen LICENSE AGREEMENT" that requires approval for commercial applications with over 100 million monthly active users.[34]
Models are distributed through:
- Hugging Face[2]
- ModelScope[2]
- GitHub repositories[35]
Commercial Services
Alibaba Cloud provides commercial APIs through:[36]
- Alibaba Cloud Model Studio
- DashScope API service
- Qwen Chat web interface (chat.qwen.ai)
Deployment Frameworks
Qwen models support deployment through various frameworks:[35]
- vLLM for high-throughput inference
- SGLang for large-scale deployment
- TensorRT-LLM for NVIDIA GPU optimization
- Ollama for local deployment
- llama.cpp for CPU and GPU inference
- Integration with popular AI frameworks
Applications
Qwen powers diverse applications including:
- Enterprise AI Solutions: Document analysis, customer service automation, and business intelligence
- Software Development: Code generation, debugging, and agentic coding workflows through Qwen3-Coder
- Education: Personalized tutoring, especially in mathematics and programming
- Healthcare: Medical document analysis and research assistance
- E-commerce: Product descriptions, customer support, and recommendation systems
- Creative Content: Story writing, article generation, and image creation
- Research: Academic paper analysis and scientific computing
Fine-tuned versions, such as "Liberated Qwen" by Abacus AI, remove content restrictions for specialized use cases.[1] The models support real-time interactions and are used in agentic frameworks for autonomous tasks.
Languages Supported
Qwen models provide extensive multilingual support, with Qwen3 supporting 119 languages and dialects.[1] Core language support includes:
- Chinese (Simplified and Traditional)
- English
- French
- Spanish
- Portuguese
- German
- Italian
- Russian
- Japanese
- Korean
- Vietnamese
- Thai
- Arabic
- Turkish
- Indonesian
- Dutch
- Polish
- Swedish
- Hindi
- Hebrew
- Finnish
- Danish
- Norwegian
- Czech
- Hungarian
- Romanian
- Greek
- Bulgarian
- Ukrainian
Limitations and Considerations
While Qwen models demonstrate strong capabilities, they have known limitations:[37]
- Language mixing: Models may unexpectedly switch between languages during generation
- Circular reasoning: Can get stuck in repetitive reasoning loops, particularly in complex multi-step problems
- Safety concerns: Require stronger safety features and guardrails for production deployment
- Performance gaps: While strong in math and coding, improvements needed in common sense reasoning
- Context limitations: Although supporting long contexts, performance may degrade with extremely long inputs
- Computational requirements: Larger models require significant GPU resources for deployment
Impact and Adoption
Since its initial open-source release in 2023, the Qwen family has achieved significant adoption milestones:
- Over 40 million downloads across platforms[3]
- More than 78,000 derivative models developed on Hugging Face[3]
- Powers all top 10 open-source LLMs on Hugging Face's Open LLM Leaderboard as of February 2025[4]
- Integrated into numerous commercial applications across Alibaba's ecosystem
- Adopted by enterprises globally for various AI applications
The development reflects Alibaba's ambition to position Qwen as a foundational "operating system" for AI, akin to Android in mobile computing.[38]
See also
- Large language model
- Multimodal learning
- Transformer (machine learning model)
- Mixture of Experts
- Alibaba Cloud
- Alibaba Group
- DeepSeek
- LLaMA
- GPT-4
- Artificial intelligence in China
- Reinforcement learning
References
- ↑ 1.0 1.1 1.2 1.3 1.4 https://en.wikipedia.org/wiki/Qwen
- ↑ 2.0 2.1 2.2 https://huggingface.co/Qwen
- ↑ 3.0 3.1 3.2 3.3 https://www.alibabacloud.com/blog/alibaba-cloud-unveils-new-research-model-for-enhanced-visual-reasoning_601914
- ↑ 4.0 4.1 https://www.scmp.com/tech/big-tech/article/3298233/alibabas-qwen-powers-top-10-open-source-models-china-ai-know-how-goes-beyond-deepseek
- ↑ 5.0 5.1 https://www.cnbc.com/2023/04/11/alibaba-launches-tongyi-qianwen-ai-model-similar-to-gpt.html
- ↑ https://www.reuters.com/technology/alibaba-cloud-launches-its-chatgpt-rival-tongyi-qianwen-2023-04-11/
- ↑ 7.0 7.1 https://www.scmp.com/tech/big-tech/article/3234661/alibabas-ai-model-tongyi-qianwen-approved-public-release-china
- ↑ https://www.reuters.com/technology/alibaba-launches-open-source-ai-model-qwen-2023-08-03/
- ↑ https://huggingface.co/Qwen/Qwen-1_8B
- ↑ https://www.chinadaily.com.cn/a/202312/01/WS6569a8a7a31090682a5f5b5e.html
- ↑ https://github.com/QwenLM/Qwen
- ↑ https://venturebeat.com/ai/alibaba-clouds-new-qwen-72b-llm-tops-open-source-rankings-and-rivals-gpt-4/
- ↑ 13.0 13.1 https://qwenlm.github.io/blog/qwen1.5/
- ↑ 14.0 14.1 https://qwenlm.github.io/blog/qwen2/
- ↑ 15.0 15.1 https://arxiv.org/abs/2407.10671
- ↑ 16.0 16.1 https://qwenlm.github.io/blog/qwen2.5/
- ↑ 17.0 17.1 https://arxiv.org/html/2409.12122v1
- ↑ 18.0 18.1 https://arxiv.org/abs/2409.12191
- ↑ https://venturebeat.com/ai/alibaba-releases-qwen-with-questions-an-open-reasoning-model-that-beats-o1-preview/
- ↑ 20.0 20.1 https://qwenlm.github.io/blog/qwen2.5-max/
- ↑ 21.0 21.1 https://qwenlm.github.io/blog/qwen2.5-vl/
- ↑ 22.0 22.1 22.2 22.3 22.4 22.5 22.6 https://qwenlm.github.io/blog/qwen3/
- ↑ https://www.reuters.com/world/china/alibaba-launches-open-source-ai-coding-model-touted-its-most-advanced-date-2025-07-23/
- ↑ 24.0 24.1 24.2 https://qwenlm.github.io/blog/qwen-mt/
- ↑ 25.0 25.1 25.2 https://qwenlm.github.io/blog/qwen-image/
- ↑ 26.0 26.1 https://qwenlm.github.io/blog/qwen3guard/
- ↑ https://arxiv.org/abs/2412.15115
- ↑ https://arxiv.org/abs/2409.12191
- ↑ https://github.com/QwenLM/Qwen3-VL
- ↑ https://github.com/QwenLM/Qwen2.5-Omni
- ↑ https://www.alibabacloud.com/help/en/model-studio/what-is-qwen-llm
- ↑ https://www.alibabacloud.com/blog/alibaba-cloud-unveils-open-source-ai-reasoning-model-qwq-and-new-image-editing-tool_601813
- ↑ https://qwenlm.github.io/share/leaderboard/
- ↑ https://www.infoq.com/news/2023/12/alibaba-qwen-72b-llm/
- ↑ 35.0 35.1 https://github.com/QwenLM/Qwen3
- ↑ https://www.alibabacloud.com/help/en/model-studio/models
- ↑ https://www.datacamp.com/blog/qwq-32b-preview
- ↑ https://www.reddit.com/r/LocalLLaMA/comments/1nq182d/alibaba_just_unveiled_their_qwen_roadmap_the/