Qwen3-Coder-Next
Last reviewed
Jun 3, 2026
Sources
4 citations
Review status
Source-backed
Revision
v1 · 1,455 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
4 citations
Review status
Source-backed
Revision
v1 · 1,455 words
Add missing citations, update stale details, or suggest a clearer explanation.
Qwen3-Coder-Next is an open-weight code generation model released by Alibaba's Qwen team on February 3, 2026, under the Apache 2.0 license. It is an ultra-sparse Mixture of Experts (MoE) model with roughly 80 billion total parameters but only about 3 billion activated per token, built on the Qwen3-Next-80B-A3B architecture and tuned for agentic, repository-level coding. Alibaba positions it as a model that delivers performance close to far larger systems while running cheaply enough for local and self-hosted deployment, and claims up to 10 times higher throughput on repository-scale tasks than dense models of comparable total size [1][2]. The model weights are distributed on Hugging Face in several quantization variants, alongside a technical report describing its training pipeline [3][4].
Qwen3-Coder-Next is the latest member of the Qwen3-Coder family, the coding-focused branch of Alibaba's Qwen3 model series. The line opened in July 2025 with Qwen3-Coder-480B-A35B-Instruct, a 480 billion parameter MoE model that activated 35 billion parameters per token and natively handled a 256,000 token context [3]. Where that flagship chased absolute capability at a large active footprint, Qwen3-Coder-Next moves in the opposite direction. It keeps the 80 billion parameter total of its Qwen3-Next base but cuts the active parameter count to around 3 billion, trading some peak accuracy for much lower inference cost. The result is a model aimed squarely at the kind of high-volume, long-running agent loops that have become common in tools such as Claude Code, Qwen Code, and Cline, where every saved token of compute matters.
The "Next" suffix refers to the underlying Qwen3-Next architecture rather than a simple version bump. It signals that this is the coding specialization of Alibaba's newer efficiency-oriented backbone, not an incremental update to the earlier 480B model.
The model inherits the hybrid design of Qwen3-Next-80B-A3B-Base. It has 48 layers and a hidden dimension of 2,048, and it interleaves two attention mechanisms in a 3-to-1 pattern: three blocks using Gated DeltaNet followed by one block using Gated Attention, repeated twelve times [4]. Gated DeltaNet is a linear-complexity alternative to standard softmax attention, which lets the model carry state across its long context without the quadratic cost that normally grows with sequence length. The periodic Gated Attention layers restore some of the expressiveness that pure linear attention gives up. This mix is what makes the 256K window practical to run rather than just nominal.
The MoE component is where the "ultra-sparse" label comes from. The feed-forward layers hold 512 routed experts plus one shared expert, and the router activates only 10 of the routed experts (plus the shared one) for any given token [4]. Because the bulk of the network sits idle on each forward pass, the model touches roughly 3 billion of its 79 billion non-embedding parameters at inference time. That sparsity is the basis for Alibaba's throughput claim: with so few parameters active per token, the model can be served at much higher token rates than a dense 80 billion parameter model, which Alibaba quantifies as up to a 10x improvement on repository-level workloads [1][2]. The model runs in a non-thinking mode and does not emit separate reasoning blocks, a choice the model card frames as appropriate for tool-driven coding agents [4].
| Specification | Value |
|---|---|
| Developer | Alibaba (Qwen team) |
| Release date | February 3, 2026 |
| License | Apache 2.0 |
| Total parameters | ~80 billion |
| Active parameters per token | ~3 billion |
| Layers | 48 |
| Routed experts / activated | 512 / 10 (plus 1 shared) |
| Attention | Gated DeltaNet + Gated Attention (3:1 hybrid) |
| Native context | 262,144 tokens (256K) |
| Inference mode | Non-thinking |
Qwen3-Coder-Next has a native context window of 262,144 tokens, the same 256K figure carried by earlier Qwen3-Coder releases [4]. Several write-ups and the model documentation note that the context window can be pushed further toward one million tokens using length-extrapolation techniques such as YaRN, echoing how the 480B flagship reached 1M tokens [1][3]. The point of a window this large is to load whole repositories, including cross-file dependencies, into a single prompt rather than relying on chunking or retrieval workarounds. Alibaba reports that around 600 billion tokens of repository-level data were used during mid-training specifically to strengthen reasoning over cross-file dependencies [2][4].
The model's training emphasizes executable, verifiable coding rather than static code-text prediction. According to the technical report, the team synthesized roughly 800,000 verifiable coding tasks, many of them real bug-fixing scenarios mined from GitHub pull requests and paired with fully executable environments [2][4]. Training combined continued pretraining on code and agent data, supervised fine-tuning on agent trajectories, training of several domain-specialized expert models, and reinforcement learning in live environments. The reported expert specializations cover areas such as web development, tool-calling user experience, single-turn question answering, and general software engineering, which were then distilled into the single released model [4].
To run this at scale, Alibaba built an orchestration system called MegaFlow on top of Alibaba Cloud Kubernetes, expressing each agentic task as a three-stage workflow of rollout, evaluation, and post-processing [2][4]. One emphasis of the report is endurance over long agent loops: the model is designed to keep improving as the number of agent turns grows, with experiments scaling to as many as 300 turns on a single task [4]. That long-horizon behavior is the main reason the model is pitched at agentic AI coding workflows rather than one-shot completion.
On SWE-bench Verified, a benchmark of real GitHub issue resolution, Qwen3-Coder-Next scores 70.6% using the SWE-Agent scaffold, and slightly higher with other harnesses, reaching 71.1% with mini-SWE-agent and 71.3% with OpenHands [4]. It posts 56.2% on SWE-bench Pro, 62.8% on SWE-bench Multilingual, and 36.2% on TerminalBench 2.0 with the Terminus2 setup [4].
Where the model stands out is security. On SecCodeBench, which measures vulnerability-aware code generation and repair, it scores 61.2%, ahead of Claude Opus 4.5 at 52.5%, and on the CWEval func-sec@1 metric it reaches 56.32%, above DeepSeek V3.2 and GLM-4.7 [2][4]. The model card notes these scores held up even without explicit security hints, which the team attributes to its agentic training on executable tasks [2].
| Benchmark | Qwen3-Coder-Next | Comparison |
|---|---|---|
| SWE-bench Verified (SWE-Agent) | 70.6% | DeepSeek V3.2 70.2%; GLM-4.7 74.2%; Claude Opus 4.5 78.2% |
| SWE-bench Verified (OpenHands) | 71.3% | -- |
| SWE-bench Pro | 56.2% | -- |
| SWE-bench Multilingual | 62.8% | -- |
| TerminalBench 2.0 (Terminus2) | 36.2% | -- |
| SecCodeBench | 61.2% | Claude Opus 4.5 52.5% |
The headline of the comparison is efficiency rather than a clean win. On SWE-bench Verified, Qwen3-Coder-Next's 70.6% trails GLM-4.7 at 74.2% and Claude Opus 4.5 at 78.2%, and it sits essentially level with DeepSeek V3.2 at 70.2% [4]. What makes that competitive standing notable is the active parameter gap: DeepSeek V3.2 and GLM-4.7 activate on the order of hundreds of billions of parameters per token, while Qwen3-Coder-Next uses about 3 billion. Alibaba frames this as reaching SWE-bench Pro performance comparable to models that activate 10 to 20 times more parameters [2]. The practical payoff is cost and speed. Early community testing reported the native FP8 build running at roughly 43 tokens per second on a single NVIDIA DGX Spark, which puts capable agentic coding within reach of local hardware [2].
VentureBeat described the release as giving "vibe coders" a powerful open-source option with much higher throughput for repository tasks, and several outlets cast it as a direct challenge to proprietary coding models from Anthropic and OpenAI, since the Apache 2.0 weights allow self-hosting, commercial use, and full control over data [1][2]. For teams weighing the broader trade-off between closed and open systems, the model is a useful data point in the proprietary versus open source LLM debate: it does not top the frontier leaderboards, but it narrows the gap enough that the cost and licensing advantages become hard to ignore. The honest read is that for the hardest issues, frontier proprietary models still lead, while for high-volume agent loops and security-sensitive code, an open-weight model running on your own hardware starts to look like the better deal.