Qwen3.5

Chinese AI Large Language Models Open Source AI

11 min read

Updated Jun 9, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 9, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v2 · 2,247 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Qwen3.5 is a family of open-weight large language models developed by the Qwen team at Alibaba, the successor to the Qwen3 series. ^[1]^[2] The line debuted on February 16, 2026 with a flagship model, Qwen3.5-397B-A17B, that pairs a sparse mixture of experts (397 billion total parameters, about 17 billion active per token) with linear attention via Gated DeltaNet, and ships as a natively multimodal system covering 201 languages and dialects. ^[1]^[3]^[4] Like earlier Qwen flagships, the open-weight checkpoints are released under the permissive Apache 2.0 license. ^[4]^[5]

Overview

Qwen3.5 is the fourth major generation of the Qwen model family. The Qwen team frames it under the tagline "Towards Native Multimodal Agents," signaling two priorities: folding vision into the model from the start rather than bolting it on afterward, and pushing agentic, tool-using behavior. ^[2]^[6] Every model in the family, from the 397-billion-parameter flagship down to a sub-billion-parameter checkpoint, is built on the same hybrid architecture and the same early-fusion vision-language foundation, so even the smallest variants accept image and video input alongside text. ^[2]^[7]

The generation's defining engineering choice is a hybrid attention stack. Instead of using full softmax attention in every layer, Qwen3.5 interleaves three blocks of Gated DeltaNet, a form of linear attention with constant memory cost per token, for every one block of conventional gated attention. ^[3]^[8] Combined with a sparse MoE feed-forward design, this lets the larger models hold a native 262,144-token context and decode long sequences far faster than the previous Qwen3-Max flagship while activating only a small fraction of their weights on any given token. ^[3]^[4]

Development and release

Alibaba's Qwen team rolled out the Qwen3.5 family in three waves over roughly two weeks, leading with the largest model and filling in smaller sizes afterward. ^[1]^[2]^[6]

The flagship Qwen3.5-397B-A17B arrived first, on February 16, 2026, accompanied by the hosted Qwen3.5-Plus endpoint on Alibaba Cloud's Model Studio. ^[1]^[9] Mid-size checkpoints (Qwen3.5-122B-A10B, Qwen3.5-35B-A3B, and Qwen3.5-27B) followed on February 24, 2026, and a set of smaller models (Qwen3.5-9B, Qwen3.5-4B, Qwen3.5-2B, and Qwen3.5-0.8B) shipped on March 2, 2026 to Hugging Face and ModelScope. ^[2]^[6] The release landed in a crowded stretch of Chinese open-weight launches, and commentators read it as evidence that the open-weight race was not slowing down. ^[5]

Qwen3.5 was a relatively short-lived flagship. The Qwen team moved on to a Qwen3.6 generation in April 2026, but Qwen3.5 remains the reference point for the architecture, since Qwen3.6 inherited the same hybrid Gated DeltaNet plus MoE design. ^[6]

Model lineup and sizes

The Qwen3.5 open-weight release spans eight checkpoints. The four largest use a sparse MoE feed-forward network and carry an "A" suffix denoting active parameters (for example, "A17B" means 17 billion active per token); the smaller models replace the MoE blocks with dense feed-forward layers but keep the same 3:1 Gated DeltaNet to gated attention pattern. ^[3]^[4]^[7] All variants are natively multimodal and share a native 262,144-token context that can be extended toward roughly one million tokens using RoPE scaling. ^[4]^[7]

Model	Total params	Active params	Experts (total / routed + shared)	Layers	FFN type	Release
Qwen3.5-397B-A17B	397B	17B	512 / 10 + 1	60	MoE	Feb 16, 2026
Qwen3.5-122B-A10B	122B	10B	256 / 8 + 1	48	MoE	Feb 24, 2026
Qwen3.5-35B-A3B	35B	3B	256 / 8 + 1	40	MoE	Feb 24, 2026
Qwen3.5-27B	27B	27B (dense)	n/a	64	Dense	Feb 24, 2026
Qwen3.5-9B	9B	9B (dense)	n/a	32	Dense	Mar 2, 2026
Qwen3.5-4B	4B	4B (dense)	n/a	n/a	Dense	Mar 2, 2026
Qwen3.5-2B	2B	2B (dense)	n/a	n/a	Dense	Mar 2, 2026
Qwen3.5-0.8B	0.8B	0.8B (dense)	n/a	n/a	Dense	Mar 2, 2026

Sources: official Hugging Face model cards and the Qwen GitHub release notes. ^[4]^[6]^[7]^[10]^[11]

Architecture

Qwen3.5 is built around a hybrid attention design that the model cards describe as an interleave of Gated DeltaNet and gated attention. ^[3]^[4] In the flagship, the 60 layers are organized as 15 repeated blocks, each laid out as three Gated DeltaNet sublayers followed by one gated-attention sublayer, with a MoE feed-forward network after every sublayer. ^[4]^[8] Gated DeltaNet is a linear-attention mechanism whose memory footprint does not grow with sequence length, which is what makes very long contexts cheap to serve; the periodic full-attention layers preserve the precise long-range recall that pure linear attention tends to lose. ^[3]^[8]

In the 397B flagship the Gated DeltaNet sublayers use 64 linear-attention heads for values and 16 for queries and keys (head dimension 128), while the gated-attention sublayers use 32 query heads and 2 key/value heads (head dimension 256) with a 64-dimensional rotary position embedding. ^[4] The hidden dimension is 4,096. The MoE layers hold 512 experts, of which a router activates 10 plus 1 always-on shared expert per token, so 11 of 512 experts fire on any forward pass, which is how a 397-billion-parameter network runs with only about 17 billion active parameters. ^[3]^[4] The mid-size MoE models reuse the same template at smaller scale: the 122B-A10B and 35B-A3B each use 256 experts with 8 routed plus 1 shared, differing in layer count, hidden size, and active-parameter budget. ^[10]^[11]

A second pillar is native multimodality. Rather than attaching a vision adapter to a finished text model, Qwen3.5 was pretrained with early fusion on trillions of multimodal tokens, so a vision encoder feeds the language model from the outset; the model cards expose the checkpoints as image-text-to-text systems and note that a flag can run them in text-only mode. ^[2]^[7]^[10] The Qwen team reports that this approach reached near-parity in training efficiency with text-only pretraining, and that the resulting models match or beat the dedicated Qwen3-VL vision models on visual reasoning while still improving on text. ^[2]^[7] The flagship is trained with multi-token prediction, which supports speculative decoding for faster generation. ^[4]

Training data and context length

Alibaba has not published a full breakdown of the Qwen3.5 pretraining corpus, but the team states that the models were trained on trillions of multimodal tokens spanning text, images, and video, with a post-training stage that scales reinforcement learning for reasoning and agentic behavior. ^[2]^[7] The tokenizer vocabulary was enlarged from roughly 150,000 entries in Qwen3 to about 250,000, part of the work that widened language coverage from 119 languages to 201. ^[3]^[12]

Every Qwen3.5 model has a native context window of 262,144 tokens (256K). ^[4]^[7] Using RoPE-based length extrapolation such as YaRN, the open weights can be pushed toward roughly 1,010,000 tokens, and the hosted Qwen3.5-Plus endpoint exposes a one-million-token context window by default. ^[4]^[7]^[9] The hybrid attention stack is what makes contexts this large economical: because most layers use linear attention, per-token cost stays roughly flat as the sequence grows instead of scaling quadratically.

Capabilities and multilingual coverage

Qwen3.5 is positioned as a generalist agentic model. The Qwen team highlights long-horizon reasoning, tool use, and multimodal understanding spanning images, documents, charts, and video, alongside strong coding and software-engineering performance. ^[2]^[7] The larger models support a "thinking" mode, enabled by default, in which the model produces an internal reasoning trace before its final answer; this can be switched off for latency-sensitive use. ^[10]

The headline change for breadth is language coverage. Qwen3.5 supports 201 languages and dialects, up from 119 in Qwen3, which the team presents as one of the generation's main selling points. ^[2]^[3] This shows up in benchmark results, where the flagship's multilingual scores are among its strongest relative to Western frontier models. ^[3]^[4]

The efficiency story is equally central. Because of the hybrid attention design and sparse activation, the Qwen team reports decoding-throughput gains of roughly 8.6x at 32K context and up to 19x at 256K context relative to Qwen3-Max, which independent coverage echoed alongside claims of materially lower serving cost. ^[3]^[9]

Benchmark performance

Qwen reports that the flagship Qwen3.5-397B-A17B is competitive with leading proprietary frontier models of the period, including GPT-5.2, Gemini 3 Pro, and Claude Opus 4.5, leading on multilingual and several agentic and vision tasks while trailing on some math-competition and long-context evaluations. ^[3]^[5] The figures below are drawn from the official model cards. As with any vendor-reported numbers, scores reflect the publisher's own evaluation setup and are best treated as indicative.

Benchmark	Qwen3.5-397B-A17B	Qwen3.5-27B
MMLU-Pro	87.8	86.1
GPQA	88.4	n/a
AIME 2026	91.3	n/a
LiveCodeBench v6	83.6	n/a
SWE-bench Verified	76.4	72.4
Tau2-bench	86.7	n/a
IFEval	92.6	95.0
LongBench v2	63.2	60.6
MMMLU (multilingual)	88.5	n/a

Reported vision-language results for the flagship include 85.0 on MMMU, 79.0 on MMMU-Pro, 88.6 on MathVision, 90.8 on OmniDocBench v1.5, and 87.5 on Video-MME with subtitles, which the Qwen team cites as evidence that the unified model surpasses the earlier dedicated Qwen3-VL line on visual reasoning. ^[4]^[7] Independent hands-on testing by analysts at the time described a model with balanced strength across reasoning, agentic execution, coding, and multimodal understanding rather than a single standout axis. ^[13]

Vision-language benchmark	Qwen3.5-397B-A17B
MMMU	85.0
MMMU-Pro	79.0
MathVision	88.6
OmniDocBench v1.5	90.8
Video-MME (with subtitles)	87.5
VideoMMMU	84.7

Sources: official Qwen3.5-397B-A17B and Qwen3.5-27B model cards. ^[4]^[7]

License and availability

The Qwen3.5 open-weight checkpoints are released under Apache 2.0, which permits commercial use, modification, and redistribution. ^[4]^[5] Weights are distributed through Hugging Face and ModelScope, and the family is also reachable as a hosted service. The Qwen3.5-Plus API on Alibaba Cloud's Model Studio serves the flagship-class model with a one-million-token default context and accepts text, image, and video input. ^[7]^[9]

Channel	What is offered	License / terms
Hugging Face	All eight open-weight checkpoints	Apache 2.0
ModelScope	All eight open-weight checkpoints	Apache 2.0
Qwen3.5-Plus (Alibaba Cloud Model Studio)	Hosted flagship-class API, 1M context, multimodal	Commercial API (usage-based)
Qwen Chat	Consumer chat interface	Service terms

Published API rates for the hosted endpoint vary by source and over time, and Alibaba's documentation describes tiered billing based on prompt size, so exact per-token prices are best taken from the current Model Studio pricing page rather than secondary write-ups. ^[9]^[14]

Reception

Coverage of Qwen3.5 centered on three points: the unusually high ratio of total to active parameters, the hybrid linear-attention architecture, and the decision to ship a strong multimodal flagship as free open weights. The Decoder framed the release as a sign that China's open-weight push was "far from slowing down," noting the jump to 201 languages, the larger 250,000-token vocabulary, and benchmark results that put the model close to GPT-5.2 on agentic tasks while leading on instruction following and multilingual work. ^[5] DeepLearning.AI's The Batch described the model as a hybrid of linear attention via Gated DeltaNet with sparse MoE that is "competitive with GPT-4, Claude Opus 4.5, and Gemini Pro 3" on knowledge, coding, reasoning, and multimodal tasks, while trailing the leaders on math competitions and long-context evaluations. ^[3] Technical outlets emphasized the throughput gains over Qwen3-Max and the fact that even sub-billion-parameter members of the family are natively multimodal. ^[9]^[13]

Limitations

The published results and commentary point to a few consistent caveats. On the capability side, the flagship trails the strongest proprietary systems on some math-competition benchmarks and on certain long-context evaluations, even as it leads on multilingual and instruction-following tasks. ^[3]^[5] On the engineering side, the hybrid linear-attention design is comparatively new, and aggressive sparsity (only about 11 of 512 experts active per token in the flagship) and low-precision serving involve quality-versus-efficiency tradeoffs that are still being characterized in practice. ^[13] As with other open-weight releases, the reported benchmark figures are vendor-published and reflect Alibaba's own evaluation setup, so independent reproduction is needed to gauge real-world performance. Finally, Alibaba has not disclosed the full composition of the training data, which limits outside scrutiny of the corpus. ^[7]

References

MarkTechPost, "Alibaba Qwen Team Releases Qwen3.5-397B MoE Model with 17B Active Parameters and 1M Token Context for AI agents," February 16, 2026. https://www.marktechpost.com/2026/02/16/alibaba-qwen-team-releases-qwen3-5-397b-moe-model-with-17b-active-parameters-and-1m-token-context-for-ai-agents/ ↩
Qwen team (Alibaba), Qwen GitHub repository (README, Qwen3.5 release notes and introduction). https://github.com/QwenLM/Qwen3.5 ↩
DeepLearning.AI, The Batch, "Qwen announces new open-weights flagship update." https://www.deeplearning.ai/the-batch/qwen-announces-new-open-weights-flagship-update ↩
Hugging Face, "Qwen/Qwen3.5-397B-A17B" model card. https://huggingface.co/Qwen/Qwen3.5-397B-A17B ↩
The Decoder, "Alibaba's free Qwen3.5 signals that China's open-weight model race is far from slowing down." https://the-decoder.com/alibabas-free-qwen3-5-signals-that-chinas-open-weight-ai-race-is-far-from-slowing-down/ ↩
Qwen team (Alibaba), GitHub release notes listing Qwen3.5 model dates and lineup. https://github.com/QwenLM/Qwen3.5/blob/main/README.md ↩
Hugging Face, "Qwen/Qwen3.5-9B" model card. https://huggingface.co/Qwen/Qwen3.5-9B ↩
Modular, "Qwen3.5 397B Inference, Latest Alibaba Vision MoE." https://www.modular.com/models/qwen3-5-397b-a17b ↩
Analytics Vidhya, "We Tested The New Qwen3.5 Open Weight, Qwen3.5-Plus AI Models in Real Hands-on Tests," February 2026. https://www.analyticsvidhya.com/blog/2026/02/qwen3-5-open-weight-qwen3-5-plus/ ↩
Hugging Face, "Qwen/Qwen3.5-122B-A10B" model card. https://huggingface.co/Qwen/Qwen3.5-122B-A10B ↩
Hugging Face, "Qwen/Qwen3.5-35B-A3B" model card. https://huggingface.co/Qwen/Qwen3.5-35B-A3B ↩
Hugging Face, "Qwen/Qwen3.5-27B" model card. https://huggingface.co/Qwen/Qwen3.5-27B ↩
ChatForest, "Qwen 3.5 Review: Alibaba's Native Multimodal Agent Family." https://chatforest.com/reviews/alibaba-qwen-3-5-multimodal-agent-llm-review/ ↩
Alibaba Cloud, "Model Studio model pricing." https://www.alibabacloud.com/help/en/model-studio/model-pricing ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Best Open-Source LLMs Best Small Language Models DeepSeek vs Llama vs Qwen LLM Size and Parameter Comparison Qwen3-VL Qwen3.6

Overview

Development and release

Model lineup and sizes

Architecture

Training data and context length

Capabilities and multilingual coverage

Benchmark performance

License and availability

Reception

Limitations

References

Improve this article

Related Articles

Qwen

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here

Related Articles

Qwen

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

What links here