Qwen3-Coder

Chinese AI Developer Tools Large Language Models Open Source AI

16 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v2 · 3,207 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Qwen3-Coder is a family of open-weight large language models specialized for software engineering, developed by Alibaba's Qwen team (Tongyi Lab) and released under the Apache 2.0 license. The flagship release, Qwen3-Coder-480B-A35B-Instruct, was announced on July 22, 2025; it is a 480 billion parameter Mixture of Experts (MoE) model that activates 35 billion parameters per token, natively supports a 256,000 token context window (expandable to one million tokens with YaRN), and is positioned by Alibaba as its "most agentic code model to date" ^[1]. At launch it set the state of the art among open models on SWE-bench Verified with a score of 69.6 percent, performance Alibaba describes as "comparable to Claude Sonnet 4," making it the first open-weight coding agent to rival a frontier closed API on real-world repository tasks ^[1]^[2]. It ships alongside Qwen Code, an open-source command-line agent forked from Google's Gemini CLI, and supports 358 programming languages ^[2]^[3].

Qwen3-Coder is the second major coding-specialized release from the Qwen team, following Qwen2.5-Coder in late 2024. Where the earlier model focused on raw code completion and infill performance, Qwen3-Coder centers its design on agentic coding: long-horizon tasks where the model invokes tools, runs commands, edits files across a repository, and recovers from execution errors. Alibaba reports that the 480B variant achieves state-of-the-art results among open-weight models on agentic coding benchmarks, with the company claiming parity with Claude Sonnet 4 on several internal evaluations ^[1]^[5]. A more compact Qwen3-Coder-30B-A3B-Instruct followed in August 2025, and a hybrid-attention successor, Qwen3-Coder-Next, was released in early 2026 with markedly higher SWE-Bench scores despite running only three billion active parameters ^[6]^[7].

Background and development

The Qwen team at Alibaba Cloud has shipped coding-specialized models since CodeQwen-7B in early 2024. The Qwen2.5-Coder line, released across multiple sizes from 0.5B to 32B in November 2024, drew attention for matching closed-source coding APIs on HumanEval and MBPP at the 32B size, and the 7B and 14B variants became common defaults for self-hosted coding assistants. With the broader Qwen3 family launching in April 2025 under a unified hybrid reasoning architecture, a coding-specialized successor was widely expected.

Qwen3-Coder was announced on the official @Alibaba_Qwen account on X on July 22, 2025 with the line: "Qwen3-Coder is here! We're releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date." The release blog framed the model's purpose directly: "We're announcing Qwen3-Coder, our most agentic code model to date," available in multiple sizes but led by the 480B flagship ^[1]. The post emphasized two design choices that distinguish the model from earlier Qwen coding releases: native long-context training rather than position-encoding tricks at inference time, and a heavy investment in long-horizon reinforcement learning rather than supervised fine-tuning alone ^[1]^[5]. The release coincided with a wave of strong open-weight coding models that summer, including Moonshot's Kimi K2, Z.AI's GLM-4.5, and DeepSeek-V3.1, which together pushed open-source coding benchmarks within striking distance of frontier closed APIs.

The model family expanded over the following months. Qwen3-Coder-30B-A3B-Instruct, a smaller MoE variant aimed at single-GPU and consumer-hardware deployment, was published on Hugging Face in August 2025, and quantized GGUF, FP8, AWQ, and NVFP4 distributions appeared from community and vendor partners shortly after ^[3]. Qwen3-Coder-Next, an 80B total / 3B active hybrid-attention model built on the Qwen3-Next architecture, followed in early 2026 and became the high-water mark for the family's efficiency frontier ^[6]^[7].

What are the Qwen3-Coder variants?

Qwen3-Coder is published as several open-weight checkpoints with shared training recipes but different scales and architectures.

Variant	Total parameters	Active parameters	Architecture	Native context	Release
Qwen3-Coder-480B-A35B-Instruct	480B	35B	MoE, 160 experts, 8 active	256K (1M with YaRN)	July 22, 2025
Qwen3-Coder-30B-A3B-Instruct	30B	3.3B	MoE	256K	August 2025
Qwen3-Coder-480B-A35B-Instruct-FP8	480B	35B	MoE, FP8 quantization	256K	July 2025
Qwen3-Coder-Next (Qwen3-Coder-80B-A3B)	80B	3B	Hybrid attention + sparse MoE, 512 experts, 10 active	256K	Early 2026

The 480B model is the headline release and the only variant Alibaba evaluates against closed APIs in its launch materials. The 30B model is the practical default for local inference, since it fits in roughly 15 GB of RAM at 4-bit quantization and runs comfortably on a single 24 GB consumer GPU ^[3]. Qwen3-Coder-Next is a distinct architectural lineage: it inherits the linear-attention plus gated-attention hybrid stack from Qwen3-Next-80B-A3B-Base and was post-trained explicitly for coding agents and local development ^[6].

All Qwen3-Coder checkpoints ship as Instruct variants only. Unlike the open Qwen3 base models, which expose a single weight set that toggles between thinking and non-thinking modes through a chat-template flag, Qwen3-Coder does not generate <think></think> blocks. Alibaba describes it as a non-thinking model on the grounds that long internal monologues hurt latency and tool-use throughput in coding workflows ^[3].

Architecture

The flagship Qwen3-Coder-480B-A35B-Instruct uses a sparse MoE transformer with 62 decoder layers and grouped-query attention. Each multi-head attention block has 96 query heads and 8 key-value heads, and the MoE feed-forward layers contain 160 experts of which 8 are routed per token, giving the 35 billion active parameter count ^[3]. The model uses BF16 precision in its reference distribution, with FP8 and 4-bit quantizations published by Alibaba and community packagers for memory-constrained inference.

Specification	Qwen3-Coder-480B-A35B
Total parameters	480B
Active parameters per token	35B
Decoder layers	62
Attention heads (query / key-value)	96 / 8
MoE experts	160
Active experts per token	8
Native context length	262,144 tokens
Extended context (YaRN)	up to 1,048,576 tokens
Tokenizer	Qwen3 BPE (same as Qwen3 base)
Precision	BF16 (FP8 variant also released)
License	Apache 2.0

The attention pattern is dense across all layers; Qwen3-Coder does not use sliding-window or interleaved local-global attention for its primary checkpoint, which is one reason the model is heavy at inference time relative to its active parameter count. The successor Qwen3-Coder-Next switches to a hybrid linear-plus-gated-attention design that trades some attention precision for much lower KV cache requirements and roughly 10x higher throughput on repository-scale prompts ^[6].

How was Qwen3-Coder trained?

Alibaba describes the Qwen3-Coder training pipeline as a two-stage process: a coding-heavy pretraining phase and a multi-component post-training phase centered on reinforcement learning.

Pretraining

The pretraining corpus totals 7.5 trillion tokens, with code making up roughly 70 percent of the mix ^[1]^[5]. The remainder is general text and mathematics, included to preserve the natural-language reasoning and instruction-following capabilities the Qwen3 base models are known for. Alibaba reports using Qwen2.5-Coder as a quality filter and rewriter for noisy or low-quality code, an example of the synthetic data bootstrapping pattern that became common in 2025 frontier training. The training data spans 358 programming languages, but Python, JavaScript and TypeScript, Java, Go, Rust, and C++ are the languages the team explicitly tunes for ^[2].

Long-context capability is baked in during pretraining rather than added later. The native 256K context is achieved with extended-RoPE position encodings during pretraining, and the one-million-token figure is obtained by applying YaRN at inference time without further weight updates ^[3].

Post-training: Code RL and Agent RL

The post-training stage is where Qwen3-Coder diverges most sharply from earlier coding models. Alibaba describes two parallel reinforcement learning tracks:

The first is what the team calls Code RL: execution-driven reinforcement learning on a broad set of real-world coding tasks, including issue-fixing benchmarks, build-system tasks, and refactoring problems. Reward signals come from running the produced code against unit tests and reference solutions rather than from human preference labels. As Alibaba puts it, "unlike the common belief that RL works best on hard, verifiable problems, our focus was on scaling Code RL on a broader set of real-world coding tasks," because code has a natural, automatic reward signal that makes large-scale RL practical ^[1].

The second is long-horizon RL, which the team calls Agent RL, where the model is trained to act as an autonomous coding agent across many turns. According to Alibaba, training was run in "20,000 independent environments in parallel" on Alibaba Cloud infrastructure, where the model used tools, browsed file systems, and interacted with mock browsers and shells to complete multi-step tasks ^[1]^[5]. The reward signal for these rollouts comes from final task completion, with shaping terms for tool-call validity and intermediate checkpoint success. This emphasis on agentic post-training is the main reason the model performs well on benchmarks like SWE-bench Verified, where the underlying task requires editing several files in a real repository and then passing a hidden test suite.

How does Qwen3-Coder score on benchmarks?

Alibaba's launch announcement claims that Qwen3-Coder-480B-A35B-Instruct sets new state-of-the-art results among open models on agentic coding, agentic browser use, and agentic tool use, and is comparable to Claude Sonnet 4 on real-world software engineering benchmarks ^[1]. The headline figure at launch was 69.6 percent on SWE-bench Verified, the best result reported for any open-weight model at the time, alongside 49.9 percent on WebArena for agentic browser use ^[2]. Independent reproductions and third-party leaderboards reported a range of SWE-bench Verified scores depending on the agent scaffold used.

Benchmark	Qwen3-Coder-480B-A35B	Notes
SWE-bench Verified (official)	69.6%	Alibaba launch figure, best open model at release
SWE-bench Verified (third-party)	67.0 to 70.6	Varies by scaffold (OpenHands, SWE-Agent, MiniSWE-Agent)
WebArena (agentic browser use)	49.9%	Alibaba launch figure
SWE-Bench Pro	38.7	ScaleAI internal evaluation
Terminal-Bench 2.0	23.9	Harbor framework
LiveCodeBench	Competitive with GPT-4o and Claude Sonnet 4	Per Alibaba launch blog

The SWE-bench Verified score depends heavily on the agent harness used, since the benchmark measures end-to-end repository edits rather than single-shot code completion. Nebius's OpenHands deployment of Qwen3-Coder-480B reported roughly 67 percent at launch, while later runs with tuned scaffolds reached the low 70s ^[8]. By comparison, Claude Sonnet 4.5 reaches about 77 percent and GPT-5 Codex sits near 80 percent on the same benchmark with their respective official agents.

Qwen3-Coder-Next, despite running only 3 billion active parameters, is reported to reach 70.6 percent on SWE-bench Verified with SWE-Agent, 71.1 percent with MiniSWE-Agent, and 71.3 percent with OpenHands, matching or exceeding the original 480B flagship on the same scaffolds ^[6]. This is the headline efficiency claim of the Qwen3-Coder line: that an 80B / 3B-active hybrid model can match or beat a 480B / 35B-active dense-attention MoE on the most-watched coding benchmark.

What is the Qwen Code CLI?

Alongside the model weights, Alibaba released Qwen Code, an open-source command-line agent designed to run Qwen3-Coder locally or against a remote API. The tool is adapted from Google's Gemini CLI codebase under its Apache 2.0 license, with "customized prompts and function calling protocols" reworked to match Qwen3-Coder's tool-call schema and the parser tightened around Qwen's structured output ^[1]^[4]^[9]. The CLI exposes the model as an interactive shell assistant that can read and modify files in the working directory, run commands, and chain tool calls into multi-step workflows.

Qwen Code targets the same developer niche as Claude Code, GitHub Copilot CLI, and OpenAI's Codex CLI, but its open-source license and compatibility with self-hosted Qwen3-Coder endpoints make it the default agent shell for users who want to run a coding assistant against on-premise hardware. Integrations exist for VS Code, Zed, and JetBrains IDEs through plugin layers maintained by the community, and the CLI can also be pointed at OpenRouter, Alibaba Cloud's Model Studio API, or any OpenAI-compatible endpoint serving Qwen3-Coder weights ^[4].

The Qwen team also published explicit support for popular third-party coding agents. CLINE, OpenHands, Aider, and Cursor's terminal mode all expose Qwen3-Coder as a first-class backend, and the model's tool-calling format was designed to be compatible with the JSON tool-call conventions those agents already used ^[3].

How does Qwen3-Coder compare with frontier coding models?

Qwen3-Coder is most often compared with Claude Sonnet 4, GPT-5 Codex, and other open coding models such as Kimi K2 and GLM-4.5. The picture that emerged across third-party evaluations during late 2025 and into 2026 is one where Qwen3-Coder consistently lands in the second tier of coding capability behind the leading closed APIs, but distinguishes itself on cost, openness, and local-inference viability.

Model	SWE-bench Verified	Context	License	Approx. input cost per million tokens
Claude Sonnet 4.5	~77%	200K	Closed	$3.00
GPT-5 Codex	~80%	400K	Closed	$1.75
Qwen3-Coder-480B-A35B	~67 to 71%	256K (1M extrap)	Apache 2.0	$0.18 (OpenRouter)
Qwen3-Coder-Next	~70 to 74%	256K	Apache 2.0	$0.10 (OpenRouter)
DeepSeek-V3.2	~70%	128K	MIT	$0.14
GLM-4.7	~74%	200K	MIT	$0.20

The cost gap is the easiest pitch for Qwen3-Coder. At roughly one-tenth the per-token cost of GPT-5 Codex on third-party hosts like OpenRouter, the model is the default choice for high-volume agentic workflows where each task may consume hundreds of thousands of tokens ^[10]. Practical reviews suggest that Claude Sonnet 4 and GPT-5 Codex still produce more complete and reliable implementations on complex refactors, while Qwen3-Coder is competitive on isolated function-level tasks and noticeably faster per token ^[11]^[12].

Within the open-weight coding tier, the comparisons are tighter. Kimi K2 from Moonshot AI, released in mid-2025, also targets agentic coding at a similar scale, and the two models trade wins on different SWE-bench scaffolds. GLM-4.5 and the later GLM-4.7 from Z.AI tend to score slightly higher on SWE-bench Verified, but Qwen3-Coder's tooling ecosystem, especially the Qwen Code CLI and the strong third-party agent integrations, has made it the most commonly deployed open coder in practice.

Qwen3-Coder-Next

Qwen3-Coder-Next is a successor model released in early 2026 that pushes the family further toward efficient sparse architectures. It uses the Qwen3-Next-80B-A3B base, a hybrid model that combines linear attention with traditional softmax attention in alternating layers and replaces the 160-expert MoE of the 480B flagship with a 512-expert sparse network that routes 10 experts per token. The result is 80 billion total parameters with only 3 billion activated per forward pass ^[6]^[7].

Winbuzzer's coverage of the release framed it as an inflection point for local-first coding: the model fits within the memory budget of high-end consumer hardware while reportedly matching or exceeding Claude Sonnet 4 and GPT-4.1 on SWE-bench Verified and SWE-Bench Pro ^[7]. Alibaba's benchmark table for the model claims SWE-bench Verified scores between 70.6 and 71.3 percent depending on agent harness, SWE-Bench Pro of about 44.3 percent, and competitive numbers on SWE-Bench Multilingual, Terminal-Bench 2.0, and Aider edit benchmarks ^[6].

Qwen3-Coder-Next is also notable for its emphasis on adaptability to different agent scaffolds. The Qwen team's blog post on the release describes work to keep the model performant under a wide range of prompt templates rather than tuning it to a single agent, citing the practical observation that real users run the same model through Cursor, Cline, OpenHands, and custom scripts that all phrase tool calls slightly differently.

Is Qwen3-Coder open source?

Qwen3-Coder is published under the Apache 2.0 license, which permits commercial use, modification, and redistribution without copyleft requirements ^[3]. Weights for the 480B, 30B, and Next variants are hosted on Hugging Face under the official Qwen organization, with quantized distributions in GGUF, FP8, AWQ, GPTQ Int4, and NVFP4 formats available from community and vendor accounts. The reference code repository is QwenLM/Qwen3-Coder on GitHub, and the Qwen Code CLI is hosted at QwenLM/qwen-code ^[1]^[4].

For users who do not want to run the weights themselves, Qwen3-Coder is accessible through Qwen Chat (Alibaba's web UI), Alibaba Cloud's Model Studio API, and a long list of third-party inference providers including OpenRouter, Together AI, Fireworks AI, DeepInfra, and Hyperbolic. OpenRouter exposes a free tier for the 480B variant subject to rate limits, and a paid tier priced at roughly $0.18 per million input tokens at launch ^[10]. Alibaba Cloud's own pricing for the model on Model Studio is slightly higher but includes enterprise SLAs and is the preferred channel for production deployment in the Asia-Pacific region.

Reception

Qwen3-Coder was generally well received in the open-source AI community on release. Hacker News commenters highlighted the model as the first open-weight coder to feel competitive with Claude on real agentic workflows, and the immediate availability of a usable CLI agent at launch was widely noted as a differentiator from earlier open releases that shipped weights without first-party tooling. VentureBeat's launch coverage led with the framing that Qwen3-Coder "might be the best coding model yet," and AI press more broadly treated the release as an escalation in the open-versus-closed coding model competition, with VentureBeat and MarkTechPost both running prominent stories on the 480B flagship and the later Qwen3-Coder-Next ^[7]^[13]^[14].

Critical responses focused on three main issues. First, the 480B model is expensive to host: even with sparse activation, it requires several hundred gigabytes of memory in BF16 and is impractical for individual developers to self-host without aggressive quantization. Second, despite Alibaba's parity claims, side-by-side comparisons with Claude Sonnet 4 on complex multi-file refactors generally favored Claude for reliability, even when Qwen3-Coder produced acceptable code more often than baseline open models. Third, the Qwen Code CLI's adaptation from Gemini CLI inherited some of the rough edges of the original tool, and users reported the experience was less polished than Claude Code at launch.

The release of Qwen3-Coder-30B and later Qwen3-Coder-Next addressed the first issue directly. The 30B model fit on consumer hardware, and the 80B / 3B-active Next variant fit on workstation-class machines while approaching the benchmark performance of the flagship, which prompted a second wave of community deployment.

References

Qwen Team. "Qwen3-Coder: Agentic Coding in the World." Qwen Blog, July 22, 2025. https://qwenlm.github.io/blog/qwen3-coder/ ↩
Alibaba Cloud. "Alibaba Unveils Cutting-Edge AI Coding Model Qwen3-Coder." Alibaba Cloud Press Room, July 2025. https://www.alibabacloud.com/en/press-room/alibaba-unveils-cutting-edge-ai-coding-model-qwen3 ↩
Qwen Team. "Qwen3-Coder-480B-A35B-Instruct." Hugging Face model card. https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct ↩
QwenLM. "qwen-code: An open-source AI agent that lives in your terminal." GitHub repository. https://github.com/QwenLM/qwen-code ↩
Data Science Dojo. "Qwen3 Coder: The Open-Source AI Coding Model Redefining Code Generation." 2025. https://datasciencedojo.com/blog/qwen3-coder/ ↩
Qwen Team. "Qwen3-Coder-Next: Pushing Small Hybrid Models." Qwen Blog. https://qwen.ai/blog?id=qwen3-coder-next ↩
Winbuzzer. "Alibaba's Qwen3-Coder-Next Activates Just 3B of 80B Parameters For Improved Efficiency." February 4, 2026. https://winbuzzer.com/2026/02/04/alibaba-qwen3-coder-next-open-source-sparse-moe-coding-model-xcxwbn/ ↩
Nebius. "OpenHands trajectories with Qwen3-Coder-480B-A35B-Instruct." Nebius Blog. https://nebius.com/blog/posts/openhands-trajectories-with-qwen3-coder-480b ↩
DataCamp. "Qwen Code CLI: A Guide With Examples." https://www.datacamp.com/tutorial/qwen-code ↩
OpenRouter. "Qwen3 Coder 480B A35B - API Pricing & Benchmarks." https://openrouter.ai/qwen/qwen3-coder ↩
Daily Dose of DS. "Compare Qwen 3 Coder vs. Sonnet 4 for Code Generation." https://blog.dailydoseofds.com/p/compare-qwen-3-coder-vs-sonnet-4 ↩
LLM Stats. "GPT-5.3 Codex vs Qwen3-Coder Comparison." https://llm-stats.com/models/compare/gpt-5.3-codex-vs-qwen3-coder ↩
MarkTechPost. "Qwen Team Releases Qwen3-Coder-Next: An Open-Weight Language Model Designed Specifically for Coding Agents and Local Development." February 3, 2026. https://www.marktechpost.com/2026/02/03/qwen-team-releases-qwen3-coder-next-an-open-weight-language-model-designed-specifically-for-coding-agents-and-local-development/ ↩
VentureBeat. "Qwen3-Coder-480B-A35B-Instruct launches and it 'might be the best coding model yet'." July 2025. https://venturebeat.com/programming-development/qwen3-coder-480b-a35b-instruct-launches-and-it-might-be-the-best-coding-model-yet ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Best Small Language Models DeepSeek-Coder Qwen Qwen2.5-Coder Qwen3 Embedding Qwen3-Coder-Next Qwen3-VL Qwen3.7-Max The Stack (BigCode dataset)

Background and development

What are the Qwen3-Coder variants?

Architecture

How was Qwen3-Coder trained?

Pretraining

Post-training: Code RL and Agent RL

How does Qwen3-Coder score on benchmarks?

What is the Qwen Code CLI?

How does Qwen3-Coder compare with frontier coding models?

Qwen3-Coder-Next

Is Qwen3-Coder open source?

Reception

See also

References

Improve this article

Related Articles

ModelScope

Qwen

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

What links here

Related Articles

ModelScope

Qwen

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

What links here