Qwen3-Coder
Last reviewed
May 17, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 2,993 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 2,993 words
Add missing citations, update stale details, or suggest a clearer explanation.
Qwen3-Coder is a family of large language models specialized for software engineering, developed by Alibaba's Tongyi Lab as the coding-focused branch of the Qwen3 series. The flagship release, Qwen3-Coder-480B-A35B-Instruct, was announced on July 22, 2025, and is a 480 billion parameter Mixture of Experts (MoE) model that activates 35 billion parameters per forward pass. It natively supports a 256,000 token context window and scales to one million tokens using YaRN extrapolation, which Alibaba positions as a key feature for whole-repository code understanding [1][2]. The model is released under the Apache 2.0 license on Hugging Face and GitHub, and ships alongside a companion command-line agent called Qwen Code that adapts the Gemini CLI codebase for Qwen-style tool calling [3][4].
Qwen3-Coder is the second major coding-specialized release from the Qwen team, following Qwen2.5-Coder in late 2024. Where the earlier model focused on raw code completion and infill performance, Qwen3-Coder centers its design on agentic coding: long-horizon tasks where the model invokes tools, runs commands, edits files across a repository, and recovers from execution errors. Alibaba reports that the 480B variant achieves state-of-the-art results among open-weight models on agentic coding benchmarks, with the company claiming parity with Claude Sonnet 4 on several internal evaluations [2][5]. A more compact Qwen3-Coder-30B-A3B-Instruct followed in August 2025, and a hybrid-attention successor, Qwen3-Coder-Next, was released in early 2026 with markedly higher SWE-Bench scores despite running only three billion active parameters [6][7].
The Qwen team at Alibaba Cloud has shipped coding-specialized models since CodeQwen-7B in early 2024. The Qwen2.5-Coder line, released across multiple sizes from 0.5B to 32B in November 2024, drew attention for matching closed-source coding APIs on HumanEval and MBPP at the 32B size, and the 7B and 14B variants became common defaults for self-hosted coding assistants. With the broader Qwen3 family launching in April 2025 under a unified hybrid reasoning architecture, a coding-specialized successor was widely expected.
Qwen3-Coder was announced on the official @Alibaba_Qwen account on X on July 22, 2025 with the line: "Qwen3-Coder is here! We're releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date." The post emphasized two design choices that distinguish the model from earlier Qwen coding releases: native long-context training rather than position-encoding tricks at inference time, and a heavy investment in long-horizon reinforcement learning rather than supervised fine-tuning alone [1][2]. The release coincided with a wave of strong open-weight coding models that summer, including Moonshot's Kimi K2, Z.AI's GLM-4.5, and DeepSeek-V3.1, which together pushed open-source coding benchmarks within striking distance of frontier closed APIs.
The model family expanded over the following months. Qwen3-Coder-30B-A3B-Instruct, a smaller MoE variant aimed at single-GPU and consumer-hardware deployment, was published on Hugging Face in August 2025, and quantized GGUF, FP8, AWQ, and NVFP4 distributions appeared from community and vendor partners shortly after [3]. Qwen3-Coder-Next, an 80B total / 3B active hybrid-attention model built on the Qwen3-Next architecture, followed in early 2026 and became the high-water mark for the family's efficiency frontier [6][7].
Qwen3-Coder is published as several open-weight checkpoints with shared training recipes but different scales and architectures.
| Variant | Total parameters | Active parameters | Architecture | Native context | Release |
|---|---|---|---|---|---|
| Qwen3-Coder-480B-A35B-Instruct | 480B | 35B | MoE, 160 experts, 8 active | 256K (1M with YaRN) | July 22, 2025 |
| Qwen3-Coder-30B-A3B-Instruct | 30B | 3.3B | MoE | 256K | August 2025 |
| Qwen3-Coder-480B-A35B-Instruct-FP8 | 480B | 35B | MoE, FP8 quantization | 256K | July 2025 |
| Qwen3-Coder-Next (Qwen3-Coder-80B-A3B) | 80B | 3B | Hybrid attention + sparse MoE, 512 experts, 10 active | 256K | Early 2026 |
The 480B model is the headline release and the only variant Alibaba evaluates against closed APIs in its launch materials. The 30B model is the practical default for local inference, since it fits in roughly 15 GB of RAM at 4-bit quantization and runs comfortably on a single 24 GB consumer GPU [3]. Qwen3-Coder-Next is a distinct architectural lineage: it inherits the linear-attention plus gated-attention hybrid stack from Qwen3-Next-80B-A3B-Base and was post-trained explicitly for coding agents and local development [6].
All Qwen3-Coder checkpoints ship as Instruct variants only. Unlike the open Qwen3 base models, which expose a single weight set that toggles between thinking and non-thinking modes through a chat-template flag, Qwen3-Coder does not generate <think></think> blocks. Alibaba describes it as a non-thinking model on the grounds that long internal monologues hurt latency and tool-use throughput in coding workflows [3].
The flagship Qwen3-Coder-480B-A35B-Instruct uses a sparse MoE transformer with 62 decoder layers and grouped-query attention. Each multi-head attention block has 96 query heads and 8 key-value heads, and the MoE feed-forward layers contain 160 experts of which 8 are routed per token, giving the 35 billion active parameter count [3]. The model uses BF16 precision in its reference distribution, with FP8 and 4-bit quantizations published by Alibaba and community packagers for memory-constrained inference.
| Specification | Qwen3-Coder-480B-A35B |
|---|---|
| Total parameters | 480B |
| Active parameters per token | 35B |
| Decoder layers | 62 |
| Attention heads (query / key-value) | 96 / 8 |
| MoE experts | 160 |
| Active experts per token | 8 |
| Native context length | 262,144 tokens |
| Extended context (YaRN) | up to 1,048,576 tokens |
| Tokenizer | Qwen3 BPE (same as Qwen3 base) |
| Precision | BF16 (FP8 variant also released) |
| License | Apache 2.0 |
The attention pattern is dense across all layers; Qwen3-Coder does not use sliding-window or interleaved local-global attention for its primary checkpoint, which is one reason the model is heavy at inference time relative to its active parameter count. The successor Qwen3-Coder-Next switches to a hybrid linear-plus-gated-attention design that trades some attention precision for much lower KV cache requirements and roughly 10x higher throughput on repository-scale prompts [6].
Alibaba describes the Qwen3-Coder training pipeline as a two-stage process: a coding-heavy pretraining phase and a multi-component post-training phase centered on reinforcement learning.
The pretraining corpus totals 7.5 trillion tokens, with code making up roughly 70 percent of the mix [2][5]. The remainder is general text and mathematics, included to preserve the natural-language reasoning and instruction-following capabilities the Qwen3 base models are known for. Alibaba reports using Qwen2.5-Coder as a quality filter and rewriter for noisy or low-quality code, an example of the synthetic data bootstrapping pattern that became common in 2025 frontier training. The training data spans dozens of programming languages, but Python, JavaScript and TypeScript, Java, Go, Rust, and C++ are the languages the team explicitly tunes for.
Long-context capability is baked in during pretraining rather than added later. The native 256K context is achieved with extended-RoPE position encodings during pretraining, and the one-million-token figure is obtained by applying YaRN at inference time without further weight updates [3].
The post-training stage is where Qwen3-Coder diverges most sharply from earlier coding models. Alibaba describes two parallel reinforcement learning tracks:
The first is what the team calls Code RL: execution-driven reinforcement learning on a broad set of real-world coding tasks, including issue-fixing benchmarks, build-system tasks, and refactoring problems. Reward signals come from running the produced code against unit tests and reference solutions rather than from human preference labels. Alibaba argues that unlike most domains, code has a natural, automatic reward signal, which makes large-scale RL practical [2].
The second is long-horizon RL, sometimes called Agent RL, where the model is trained to act as an autonomous coding agent across many turns. According to Alibaba, training was run in more than 20,000 parallel environments where the model used tools, browsed file systems, and interacted with mock browsers and shells to complete multi-step tasks [2][5]. The reward signal for these rollouts comes from final task completion, with shaping terms for tool-call validity and intermediate checkpoint success. This emphasis on agentic post-training is the main reason the model performs well on benchmarks like SWE-Bench Verified, where the underlying task requires editing several files in a real repository and then passing a hidden test suite.
Alibaba's launch announcement claims that Qwen3-Coder-480B-A35B-Instruct sets new state-of-the-art results among open models on agentic coding, agentic browser use, and agentic tool use, and is comparable to Claude Sonnet 4 on real-world software engineering benchmarks [2]. Independent reproductions and third-party leaderboards reported a range of scores for SWE-Bench Verified depending on the agent scaffold used.
| Benchmark | Qwen3-Coder-480B-A35B | Notes |
|---|---|---|
| SWE-Bench Verified | 67.0 to 70.6 | Varies by scaffold (OpenHands, SWE-Agent, MiniSWE-Agent) |
| SWE-Bench Pro | 38.7 | ScaleAI internal evaluation |
| Terminal-Bench 2.0 | 23.9 | Harbor framework |
| LiveCodeBench | Competitive with GPT-4o and Claude Sonnet 4 | Per Alibaba launch blog |
The SWE-Bench Verified score depends heavily on the agent harness used, since the benchmark measures end-to-end repository edits rather than single-shot code completion. Nebius's OpenHands deployment of Qwen3-Coder-480B reported roughly 67 percent at launch, while later runs with tuned scaffolds reached the low 70s [8]. By comparison, Claude Sonnet 4.5 reaches about 77 percent and GPT-5 Codex sits near 80 percent on the same benchmark with their respective official agents.
Qwen3-Coder-Next, despite running only 3 billion active parameters, is reported to reach 70.6 percent on SWE-Bench Verified with SWE-Agent, 71.1 percent with MiniSWE-Agent, and 71.3 percent with OpenHands, matching or exceeding the original 480B flagship on the same scaffolds [6]. This is the headline efficiency claim of the Qwen3-Coder line: that an 80B / 3B-active hybrid model can match or beat a 480B / 35B-active dense-attention MoE on the most-watched coding benchmark.
Alongside the model weights, Alibaba released Qwen Code, an open-source command-line agent designed to run Qwen3-Coder locally or against a remote API. The tool is adapted from Google's Gemini CLI codebase under its Apache 2.0 license, with the function-calling format reworked to match Qwen3-Coder's tool-call schema and the parser tightened around Qwen's structured output [4][9]. The CLI exposes the model as an interactive shell assistant that can read and modify files in the working directory, run commands, and chain tool calls into multi-step workflows.
Qwen Code targets the same developer niche as Claude Code, GitHub Copilot CLI, and OpenAI's Codex CLI, but its open-source license and compatibility with self-hosted Qwen3-Coder endpoints make it the default agent shell for users who want to run a coding assistant against on-premise hardware. Integrations exist for VS Code, Zed, and JetBrains IDEs through plugin layers maintained by the community, and the CLI can also be pointed at OpenRouter, Alibaba Cloud's Model Studio API, or any OpenAI-compatible endpoint serving Qwen3-Coder weights [4].
The Qwen team also published explicit support for popular third-party coding agents. CLINE, OpenHands, Aider, and Cursor's terminal mode all expose Qwen3-Coder as a first-class backend, and the model's tool-calling format was designed to be compatible with the JSON tool-call conventions those agents already used [3].
Qwen3-Coder is most often compared with Claude Sonnet 4, GPT-5 Codex, and other open coding models such as Kimi K2 and GLM-4.5. The picture that emerged across third-party evaluations during late 2025 and into 2026 is one where Qwen3-Coder consistently lands in the second tier of coding capability behind the leading closed APIs, but distinguishes itself on cost, openness, and local-inference viability.
| Model | SWE-Bench Verified | Context | License | Approx. input cost per million tokens |
|---|---|---|---|---|
| Claude Sonnet 4.5 | ~77% | 200K | Closed | $3.00 |
| GPT-5 Codex | ~80% | 400K | Closed | $1.75 |
| Qwen3-Coder-480B-A35B | ~67 to 71% | 256K (1M extrap) | Apache 2.0 | $0.18 (OpenRouter) |
| Qwen3-Coder-Next | ~70 to 74% | 256K | Apache 2.0 | $0.10 (OpenRouter) |
| DeepSeek-V3.2 | ~70% | 128K | MIT | $0.14 |
| GLM-4.7 | ~74% | 200K | MIT | $0.20 |
The cost gap is the easiest pitch for Qwen3-Coder. At roughly one-tenth the per-token cost of GPT-5 Codex on third-party hosts like OpenRouter, the model is the default choice for high-volume agentic workflows where each task may consume hundreds of thousands of tokens [10]. Practical reviews suggest that Claude Sonnet 4 and GPT-5 Codex still produce more complete and reliable implementations on complex refactors, while Qwen3-Coder is competitive on isolated function-level tasks and noticeably faster per token [11][12].
Within the open-weight coding tier, the comparisons are tighter. Kimi K2 from Moonshot AI, released in mid-2025, also targets agentic coding at a similar scale, and the two models trade wins on different SWE-Bench scaffolds. GLM-4.5 and the later GLM-4.7 from Z.AI tend to score slightly higher on SWE-Bench Verified, but Qwen3-Coder's tooling ecosystem, especially the Qwen Code CLI and the strong third-party agent integrations, has made it the most commonly deployed open coder in practice.
Qwen3-Coder-Next is a successor model released in early 2026 that pushes the family further toward efficient sparse architectures. It uses the Qwen3-Next-80B-A3B base, a hybrid model that combines linear attention with traditional softmax attention in alternating layers and replaces the 160-expert MoE of the 480B flagship with a 512-expert sparse network that routes 10 experts per token. The result is 80 billion total parameters with only 3 billion activated per forward pass [6][7].
Winbuzzer's coverage of the release framed it as an inflection point for local-first coding: the model fits within the memory budget of high-end consumer hardware while reportedly matching or exceeding Claude Sonnet 4 and GPT-4.1 on SWE-Bench Verified and SWE-Bench Pro [7]. Alibaba's benchmark table for the model claims SWE-Bench Verified scores between 70.6 and 71.3 percent depending on agent harness, SWE-Bench Pro of about 44.3 percent, and competitive numbers on SWE-Bench Multilingual, Terminal-Bench 2.0, and Aider edit benchmarks [6].
Qwen3-Coder-Next is also notable for its emphasis on adaptability to different agent scaffolds. The Qwen team's blog post on the release describes work to keep the model performant under a wide range of prompt templates rather than tuning it to a single agent, citing the practical observation that real users run the same model through Cursor, Cline, OpenHands, and custom scripts that all phrase tool calls slightly differently.
Qwen3-Coder is published under the Apache 2.0 license, which permits commercial use, modification, and redistribution without copyleft requirements [3]. Weights for the 480B, 30B, and Next variants are hosted on Hugging Face under the official Qwen organization, with quantized distributions in GGUF, FP8, AWQ, GPTQ Int4, and NVFP4 formats available from community and vendor accounts. The reference code repository is QwenLM/Qwen3-Coder on GitHub, and the Qwen Code CLI is hosted at QwenLM/qwen-code [1][4].
For users who do not want to run the weights themselves, Qwen3-Coder is accessible through Qwen Chat (Alibaba's web UI), Alibaba Cloud's Model Studio API, and a long list of third-party inference providers including OpenRouter, Together AI, Fireworks AI, DeepInfra, and Hyperbolic. OpenRouter exposes a free tier for the 480B variant subject to rate limits, and a paid tier priced at roughly $0.18 per million input tokens at launch [10]. Alibaba Cloud's own pricing for the model on Model Studio is slightly higher but includes enterprise SLAs and is the preferred channel for production deployment in the Asia-Pacific region.
Qwen3-Coder was generally well received in the open-source AI community on release. Hacker News commenters highlighted the model as the first open-weight coder to feel competitive with Claude on real agentic workflows, and the immediate availability of a usable CLI agent at launch was widely noted as a differentiator from earlier open releases that shipped weights without first-party tooling. Coverage in AI press tended to frame the release as an escalation in the open-versus-closed coding model competition, with VentureBeat and MarkTechPost both running prominent stories on the 480B flagship and the later Qwen3-Coder-Next [7][13].
Critical responses focused on three main issues. First, the 480B model is expensive to host: even with sparse activation, it requires several hundred gigabytes of memory in BF16 and is impractical for individual developers to self-host without aggressive quantization. Second, despite Alibaba's parity claims, side-by-side comparisons with Claude Sonnet 4 on complex multi-file refactors generally favored Claude for reliability, even when Qwen3-Coder produced acceptable code more often than baseline open models. Third, the Qwen Code CLI's adaptation from Gemini CLI inherited some of the rough edges of the original tool, and users reported the experience was less polished than Claude Code at launch.
The release of Qwen3-Coder-30B and later Qwen3-Coder-Next addressed the first issue directly. The 30B model fit on consumer hardware, and the 80B / 3B-active Next variant fit on workstation-class machines while approaching the benchmark performance of the flagship, which prompted a second wave of community deployment.