Qwen3.6
Last reviewed
Jun 2, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 ยท 1,968 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 ยท 1,968 words
Add missing citations, update stale details, or suggest a clearer explanation.
Qwen3.6 is a generation of large language models from the Qwen team at Alibaba, released in April 2026 as the successor to Qwen3.5. [1][2] The line keeps the hybrid architecture introduced with Qwen3.5, pairing Gated DeltaNet linear attention with a sparse mixture of experts and a native vision encoder, but reorients the release around agentic coding and real-world tool use. [2][3] It spans hosted proprietary endpoints (Qwen3.6-Plus, Qwen3.6-Flash, and the Qwen3.6-Max-Preview flagship) alongside two open-weight checkpoints, Qwen3.6-35B-A3B and Qwen3.6-27B, both published under the Apache 2.0 license. [1][3][4]
Qwen3.6 is an incremental but coding-focused update within the broader Qwen family. Where Qwen3.5 had launched under the banner "Towards Native Multimodal Agents," the Qwen team framed Qwen3.6-Plus as a step "Towards Real World Agents," emphasizing reliability on long-horizon software tasks rather than a new architecture. [3][5] The team describes the generation as prioritizing stability and practical utility, shaped by community feedback on the previous release. [2]
Two themes run through the launch. The first is agentic coding: the models are tuned to plan, write, test, and iteratively debug code, and to operate inside third-party coding agents such as Claude Code, Cline, and Alibaba's own Qwen Code. [2][5] The second is visual reasoning carried over from the native multimodal foundation, so the models can read screenshots, hand-drawn wireframes, and design mockups and turn them into front-end code. [5][6] A smaller but notable addition is "thinking preservation," a feature that retains a model's reasoning context across turns of a conversation rather than discarding it after each response. [2][7]
Qwen3.6 is best understood as a point update on the Qwen3.5 platform rather than a clean-sheet generation. Qwen3.5 had debuted on February 16, 2026 with a 397-billion-parameter flagship and established the family's defining engineering choices: a hybrid attention stack that interleaves Gated DeltaNet with periodic full attention, a sparse MoE feed-forward design, and early-fusion multimodal pretraining. [8][9] Qwen3.6 inherits all three. Its open-weight model cards describe the same interleave pattern, three Gated DeltaNet blocks for every one gated-attention block, and the same image-text-to-text framing as the Qwen3.5 checkpoints. [4][7]
The clearest contrasts are in emphasis and packaging. Qwen3.5 led with the largest possible open model and a sweep of eight sizes down to 0.8B; Qwen3.6 leads with hosted agentic endpoints and ships a narrower open-weight lineup aimed at coding. [2][8] The generation before both, Qwen3, used conventional dense and MoE transformers with full softmax attention, so the Gated DeltaNet hybrid and native vision encoder are the features that separate the 3.5 and 3.6 lines from the original Qwen3 series and its specialized spin-offs like Qwen3-VL. [8][9]
Alibaba announced Qwen3.6-Plus on April 2, 2026 from Hangzhou, positioning it as a proprietary, hosted upgrade over the Qwen3.5-Plus endpoint. [1][5] The hosted lineup then expanded with Qwen3.6-Max-Preview on April 20, 2026, billed by the team as the most capable model it had shipped to date. [10] A speed-tier Qwen3.6-Flash rounds out the hosted family. [10]
The open-weight checkpoints followed the hosted endpoints. According to the Qwen GitHub release notes, Qwen3.6-35B-A3B reached Hugging Face Hub and ModelScope on April 16, 2026, and the dense Qwen3.6-27B arrived on April 22, 2026. [1] Alibaba said it would continue to release selected Qwen3.6 models to the open-source community even as the top-tier Plus and Max-Preview endpoints stayed closed. [5][10]
The Qwen3.6 release is split between proprietary hosted endpoints and open-weight checkpoints. The hosted tier follows Alibaba's usual naming (Max for the flagship, Plus for the balanced workhorse, Flash for the low-latency option), while the open releases are named by parameter count, with an "A3B" suffix marking 3 billion active parameters on the mixture-of-experts model. [1][3][10]
| Model | Type | Total / active params | Availability | License | Released |
|---|---|---|---|---|---|
| Qwen3.6-Max-Preview | Hosted (closed) | Not disclosed | API only | Proprietary | Apr 20, 2026 |
| Qwen3.6-Plus | Hosted (closed) | Not disclosed | API only | Proprietary | Apr 2, 2026 |
| Qwen3.6-Flash | Hosted (closed) | Not disclosed | API only | Proprietary | Apr 2026 |
| Qwen3.6-35B-A3B | Open weight (MoE) | 35B / 3B | HF Hub, ModelScope | Apache 2.0 | Apr 16, 2026 |
| Qwen3.6-27B | Open weight (dense) | 27B | HF Hub, ModelScope | Apache 2.0 | Apr 22, 2026 |
Sources: Qwen GitHub release notes, the official Hugging Face model cards, and Alibaba Cloud's launch announcement. [1][4][5][7][10]
The two open-weight model cards spell out the architecture in detail; Alibaba has not published comparable internals for the hosted Plus, Flash, or Max-Preview endpoints. [4][7]
Qwen3.6-35B-A3B is a sparse mixture-of-experts model with 35 billion total parameters and roughly 3 billion active per token, built from 40 layers with a hidden dimension of 2,048. [4] Its MoE blocks hold 256 experts, of which a router activates 8 routed experts plus 1 always-on shared expert per token. [4] Attention follows the Qwen3.5 hybrid template: the layers are arranged as 10 repeated blocks, each laid out as three Gated DeltaNet sublayers followed by one gated-attention sublayer, with a MoE feed-forward network after every sublayer. [4][9] The Gated DeltaNet sublayers use 32 linear-attention heads for values and 16 for queries and keys (head dimension 128), while the gated-attention sublayers use 16 query heads and 2 key/value heads (head dimension 256) with a 64-dimensional rotary position embedding. [4] Gated DeltaNet is a linear-attention mechanism whose memory cost stays constant as a sequence grows, which is what lets the model serve long contexts cheaply, while the periodic full-attention layers preserve precise long-range recall. [4][9]
The dense Qwen3.6-27B uses the same hybrid pattern at a different scale: 64 layers organized as 16 blocks of three Gated DeltaNet sublayers plus one gated-attention sublayer, a hidden dimension of 5,120, and standard feed-forward networks in place of the MoE blocks. [7] Both checkpoints are described as causal language models with a vision encoder, exposed as image-text-to-text systems, and both are trained with multi-token prediction to support speculative decoding. [4][7]
Both open models carry a native context window of 262,144 tokens, which the model cards say can be extended toward roughly 1,010,000 tokens with YaRN rope scaling. [4][7] The hosted Qwen3.6-Plus endpoint is offered with a 1-million-token context window by default. [6]
The Qwen3.6 launch centers on agentic software work. Alibaba describes the models as optimized for a "capability loop" of perceiving, reasoning, and acting within a single workflow, and says they autonomously plan, test, and iterate on code toward production-ready solutions. [6] The open checkpoints are tuned for front-end workflows and repository-level reasoning, and they support tool use and function calling so they can drive an AI coding agent end to end. [2][4] Alibaba lists compatibility with third-party agents including OpenClaw, Claude Code, and Cline, in addition to its own Qwen Code terminal agent. [5][6]
Because the models inherit the early-fusion vision encoder from Qwen3.5, visual reasoning is a first-class capability rather than an add-on. Qwen3.6-Plus is described as handling high-density document parsing, physical-world visual analysis, and long-form video reasoning, and as able to interpret a user-interface screenshot, hand-drawn wireframe, or product prototype and generate working front-end code from it. [5][6] The open checkpoints are likewise multimodal, accepting text, image, and video input. [4][7]
The new "thinking preservation" feature lets a model keep its chain-of-thought context across the messages in a conversation instead of resetting it each turn, which the team presents as helping continuity on multi-step agentic sessions. [2][7]
Quantitative results are published for the open-weight checkpoints on the Hugging Face model cards; Alibaba reported relative gains for the hosted Max-Preview without a full table. Coding and agent scores are the headline numbers, and both open models also post strong knowledge and vision-language results.
| Benchmark | Qwen3.6-27B (dense) | Qwen3.6-35B-A3B (MoE) |
|---|---|---|
| SWE-bench Verified | 77.2 | 73.4 |
| SWE-bench Pro | 53.5 | not reported |
| Terminal-Bench 2.0 | 59.3 | 51.5 |
| MMLU-Pro | 86.2 | 85.2 |
| MMLU-Redux | 93.5 | 93.3 |
| GPQA Diamond | 87.8 | not reported |
| AIME 2026 | 94.1 | not reported |
| MMMU (vision) | 82.9 | not reported |
| RealWorldQA (vision) | not reported | 85.3 |
Sources: the Qwen3.6-27B and Qwen3.6-35B-A3B Hugging Face model cards. [4][7]
For the hosted flagship, Alibaba said Qwen3.6-Max-Preview topped six coding-oriented benchmarks (including SWE-bench Verified variants, Terminal-Bench 2.0, SkillsBench, and SciCode), reported a roughly 2.3 percent gain on the SuperGPQA reasoning test and about 5.3 percent on a Chinese-language benchmark over Qwen3.6-Plus, and claimed first place on a tool-calling instruction-following test ahead of Claude. [10] These figures come from Alibaba's own announcement and have not been independently audited. The company has compared Qwen3.6-Plus's coding ability to Claude Opus 4.5 on SWE-bench. [5]
The open-weight checkpoints are released under the permissive Apache 2.0 license, with license files in their respective Hugging Face repositories; the hosted endpoints are proprietary and reachable only through Alibaba's API. [2][5][7]
| Channel | Models | Access | License |
|---|---|---|---|
| Hugging Face Hub | Qwen3.6-35B-A3B, Qwen3.6-27B | Open download | Apache 2.0 |
| ModelScope | Qwen3.6-35B-A3B, Qwen3.6-27B | Open download | Apache 2.0 |
| Alibaba Cloud Model Studio | Qwen3.6-Plus, Flash, Max-Preview | Hosted API | Proprietary |
| Qwen Chat | Qwen3.6-Plus and others | Web chat | Proprietary |
The open checkpoints run under common serving stacks including vLLM, SGLang, Hugging Face Transformers, llama.cpp, and MLX. [1][4]
Coverage focused on Qwen3.6 as another entry in a fast-moving wave of Chinese open-weight releases, and on the contrast between Alibaba's open and closed offerings. Reporting noted that Alibaba shipped capable open checkpoints in the 27B and 35B-A3B range while keeping its strongest model, Qwen3.6-Max-Preview, behind an API. [10][11] Commentators highlighted the small active-parameter footprint of the 35B-A3B model, which activates only about 3 billion parameters per token yet posts agentic-coding scores competitive with much larger systems, as evidence of how far sparse MoE plus linear-attention designs had progressed. [11][12] Several outlets placed the release alongside other 2026 open-weight coders and read it as the open-weight field continuing to close the gap with frontier proprietary models. [12][13]
Most published internals and benchmarks cover only the two open-weight checkpoints; Alibaba has disclosed little about the parameter counts or architecture of the hosted Plus, Flash, and Max-Preview endpoints, and the Max-Preview figures come from the company's own announcement without independent verification. [4][7][10] The headline agentic-coding gains are concentrated in software and tool-use tasks, so they do not necessarily transfer to other domains. The long-context claim of roughly one million tokens depends on YaRN rope scaling rather than native training at that length, a method that can degrade quality on the longest inputs. [4][7] As an incremental update on the Qwen3.5 platform, Qwen3.6 introduces no new base architecture, and its open lineup is narrower than the eight-size Qwen3.5 sweep, with no very large or very small open checkpoints released. [2][8]