Qwen3-Max

AI Models Chinese AI Large Language Models Mixture of Experts

16 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

21 citations

Revision

v2 · 3,141 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Qwen3-Max is the flagship large language model in Alibaba's Qwen series and the first Qwen model to cross one trillion parameters, released in preview on September 5, 2025 and formally launched at the Apsara Conference on September 24, 2025. Alibaba positions it as its most capable proprietary LLM to date. The model uses a Mixture of Experts (MoE) architecture, was pre-trained on approximately 36 trillion tokens, and supports a 262,144 token context window ^[1]^[2]^[3]^[21]. It ships in two variants: Qwen3-Max-Instruct, a non-thinking model optimized for general chat, coding, and tool use; and Qwen3-Max-Thinking, a reasoning variant that emits explicit chain-of-thought and integrates code execution and search ^[1]^[2]^[3].

Qwen3-Max represents a notable strategic shift for the Qwen team. While earlier releases in the Qwen3 family were published under permissive open weights on Hugging Face, the Max tier is closed-weight and available only through Alibaba Cloud's Model Studio API, the Qwen Chat web interface, and a few partner platforms such as OpenRouter ^[4]^[5]. The preview version reached the top of several public leaderboards within days of release, and the production version unveiled at Apsara emphasized day-one performance on coding and agentic benchmarks, with Alibaba reporting 69.6 on SWE-bench Verified and 74.8 on Tau2-Bench at launch ^[2]^[6].

Background and development

The Qwen series is developed by the Tongyi Lab at Alibaba Cloud and has grown rapidly since the first Qwen-7B release in August 2023. By 2025 the family already included dense models from 0.6 billion to 32 billion parameters, MoE models like Qwen3-235B-A22B, and specialized variants for coding (Qwen2.5-Coder) and vision-language tasks. The "Max" tier had existed since Qwen-Max in early 2024 and Qwen2.5-Max in January 2025, but those earlier flagships were always closed-source and intended as the commercial counterpart to the open Qwen line ^[4]^[7].

Qwen3-Max-Preview was announced on X by the official @Alibaba_Qwen account on September 5, 2025 with the line: "Introducing Qwen3-Max-Preview (Instruct), our biggest model yet, with over 1 trillion parameters" ^[8]. The post claimed the model beat Qwen3-235B-A22B-2507 across internal evaluations. Within roughly three weeks, on September 24, 2025, Alibaba used its annual Apsara Conference in Hangzhou to formally graduate the model from preview status and to announce the existence of a separate Thinking variant. The launch was framed as part of a broader "artificial superintelligence" roadmap that Alibaba Cloud sketched out at the event, alongside related releases including Qwen3-VL-235B-A22B for vision-language and a trio of Qwen3Guard safety models ^[2]^[9].

The Thinking variant was originally released as an intermediate checkpoint while training continued. Alibaba Cloud's social channels described it as "an early preview of Qwen3-Max-Thinking, an intermediate checkpoint still in training," and the team said tool use and large-scale test-time compute were essential ingredients in the early demonstrations ^[10]. A more polished release of Qwen3-Max-Thinking followed on January 25, 2026 under the model ID qwen3-max-2026-01-23, with Alibaba claiming performance comparable to GPT-5.2-Thinking, Claude Opus 4.7, and Gemini 3 Pro across a suite of benchmarks ^[11]^[12]^[20].

When was Qwen3-Max released?

Qwen3-Max moved from preview to general availability over roughly five months across three milestones. The table below collects the dates Alibaba has confirmed for each stage.

Date	Milestone
September 5, 2025	Qwen3-Max-Preview (Instruct) announced on X; available via Qwen Chat and the Alibaba Cloud API ^[8]
September 24, 2025	Production Qwen3-Max launched at the Apsara Conference in Hangzhou; Thinking variant announced ^[2]^[9]
October 2025	Early intermediate checkpoint of Qwen3-Max-Thinking surfaced via Alibaba Cloud social channels ^[10]
January 25, 2026	Polished Qwen3-Max-Thinking (`qwen3-max-2026-01-23`) released in Qwen Chat and Model Studio ^[11]^[20]

What is the architecture of Qwen3-Max?

Qwen3-Max is a sparse Mixture of Experts model with more than one trillion total parameters. Alibaba has not published a full technical report for Qwen3-Max specifically, so several architectural details that are standard for smaller Qwen3 models, in particular the exact number of experts, the number of experts activated per token, and the precise active parameter count, have not been disclosed for the Max variant ^[1]^[13]. The Qwen3 technical report on arXiv covers the open models in the family (such as the 235B-A22B MoE and the dense 0.6B through 32B models) but does not include Max-specific architectural numbers ^[13].

Context window and modality

The Model Studio listing for qwen3-max reports a total context window of 262,144 tokens, with an input limit of 258,048 tokens and a maximum output of 32,768 tokens ^[1]^[3]. The Thinking variant adds a separate budget of up to 81,920 tokens for chain-of-thought reasoning ^[3]^[14]. Qwen3-Max is text-only at the Max tier; multimodal capabilities are handled by sibling models such as Qwen3-VL ^[9].

Instruct and Thinking modes

Unlike the smaller Qwen3 open models, which expose a single checkpoint that toggles between thinking and non-thinking behavior through a enable_thinking flag, Qwen3-Max ships the two modes as separate models: qwen3-max for the Instruct variant and qwen3-max-preview / qwen3-max-thinking for the reasoning variant. The Thinking endpoint only operates with streaming incremental output enabled, which means clients must set incremental_output=true on the API call to receive the reasoning trace ^[2]. In Thinking mode the model can be combined with tools such as a code interpreter and web search, and Alibaba uses parallel test-time computation to push scores on the hardest math benchmarks toward saturation ^[6]^[11]^[20].

The split between Instruct and Thinking is partly a product decision and partly a latency decision. Instruct returns a final answer with no visible chain-of-thought, so it is faster and cheaper per query. Thinking emits a long internal reasoning trace before the answer, often consuming tens of thousands of tokens for a single hard problem; Artificial Analysis reported that the Thinking checkpoint consumed roughly 86 million output tokens during its full Intelligence Index run, of which about 79 million were reasoning rather than final-answer tokens ^[16]. That makes Thinking dramatically more expensive to use at scale, even though the per-token rate is the same as Instruct.

Long context behavior

Alibaba advertises 262,144 tokens as the native context window, but the model card and Model Studio documentation note that a larger 1 million token configuration is technically possible with YaRN-style extension and dramatically more GPU memory. The 1M configuration is not exposed through the standard API and is referenced more as a research capability of the underlying base than as a product offering ^[3]^[13]. For most practical use Qwen3-Max is treated as a 256K-class long-context model, comparable to GPT-5 and Claude Opus 4.7 in the same range.

How was Qwen3-Max trained?

Alibaba says Qwen3-Max was pre-trained on approximately 36 trillion tokens, roughly double the corpus used for Qwen2.5, with an emphasis on multilingual data, code, and STEM reasoning ^[1]^[6]. The pre-training run reportedly used a "global-batch load balancing loss" for the MoE routing and achieved about a 30% improvement in model FLOPs utilization compared with the earlier Qwen2.5-Max-Base run, although Alibaba has not published independently verifiable numbers for either claim outside of blog posts and conference talks ^[1]. Post-training combined supervised fine-tuning with reinforcement learning, and the Thinking variant was further refined with techniques described by Alibaba as "adaptive tool use" and "test-time scaling" ^[11]^[20].

The corpus emphasizes Chinese, English, and a long tail of Southeast and East Asian languages, with the Qwen team claiming support for over 100 languages and dialects in total. Code and mathematical reasoning data were upsampled during the later stages of pre-training, which is consistent with the model's relatively strong performance on coding-focused benchmarks compared with general knowledge benchmarks ^[1]^[6]. Beyond those high-level claims, however, Alibaba has not published a Qwen3-Max-specific technical report. Researchers looking for architectural and training detail have to triangulate from the broader Qwen3 technical report on arXiv, which covers the open weights up to 235B-A22B but stops short of Max ^[13].

For the Thinking variant, the Qwen team describes a multi-stage process: a base Instruct model trained as above, then a separate reasoning fine-tune that learns to emit chain-of-thought, then a tool-use stage where the model is trained against environments containing code interpreters, search tools, and other agentic actions. Alibaba's blog summarizes this as combining "expanded capacity, large-scale computing resources, and reinforcement learning" but does not disclose the specific RL algorithm, the reward models used, or the size of the post-training datasets ^[11]^[20].

How good is Qwen3-Max on benchmarks?

The table below collects officially reported scores from Alibaba's launch materials and the most widely cited third-party evaluations. Where two scores are given for the same benchmark, the first is for Qwen3-Max-Instruct and the second is for Qwen3-Max-Thinking. Benchmarks that Alibaba has not published numbers for are omitted rather than inferred.

Benchmark	Qwen3-Max-Instruct	Qwen3-Max-Thinking	Notes
AIME 2025 (math)	80.6 / 81.6 (vendor-reported)	100 (with tools, parallel test-time compute)	^[1]^[6]^[11]^[20]
HMMT February 2025	not reported	98.0 (no tools); 100 (with tools)	^[11]^[12]^[20]
GPQA Diamond	not reported	92.8 (final release)	^[12]^[20]
SWE-bench Verified	69.6	75.3	^[2]^[6]^[12]
Tau2-Bench (agent tool use)	74.8	82.1	^[2]^[12]
LiveCodeBench v6	57.5 to 74.8 across reports	91.4 (final release)	^[1]^[6]^[12]^[20]
Arena-Hard v2	78.9 to 86.1 across reports	90.2	^[1]^[12]^[15]
SuperGPQA	64.6 to 65.1	not reported	^[1]^[6]
LiveBench (2024-11-25)	79.3	not reported	^[1]
MMLU-Pro	not reported	85.7	^[12]
MMLU-Redux	not reported	92.8	^[12]
Humanity's Last Exam	not reported	36.5 (no tools); 58.3 (with tools)	^[12]^[16]^[20]
Artificial Analysis Intelligence Index	26 (Preview) / 31 (final Instruct)	40	^[15]^[17]

The spread on LiveCodeBench v6, Arena-Hard v2, and AIME 2025 reflects the fact that different secondary reports captured the model at different points in its post-training cycle. The preview version that landed on LMArena in early September 2025 reached the third position on that public leaderboard and briefly led several open-source competitors and Claude Opus 4 across coding and agentic categories ^[1]^[8]. The Thinking variant later reported triple-digit accuracy on AIME 2025 and HMMT, although those scores depend on tool use (notably a Python interpreter) and large-scale parallel sampling, both of which Alibaba has been explicit about ^[10]^[11]^[20].

The AIME 2025 and HMMT perfect scores were the first 100-point results on those benchmarks from any Chinese AI lab. On Humanity's Last Exam, Alibaba's January 2026 release reported that Qwen3-Max-Thinking with adaptive tool use scored 58.3, ahead of GPT-5.2-Thinking (45.5), Gemini 3 Pro (45.8), and Claude Opus 4.5 (43.2) on that configuration; without tools the model scored 36.5 ^[12]^[20]. Independent analysts at Artificial Analysis verified an Intelligence Index of 40 for the same Thinking checkpoint, placing it close to but not clearly ahead of comparable Western reasoning models ^[12]^[15]^[16].

How do you access Qwen3-Max, and is it open source?

Qwen3-Max is closed-weight. There is no open release of the model on Hugging Face or anywhere else; only the smaller Qwen3 models (up to 235B-A22B) are available for download. Access is gated behind Alibaba's commercial channels.

Variant	Model ID	Primary use
Qwen3-Max-Instruct	`qwen3-max`	General chat, coding, tool use, structured outputs
Qwen3-Max-Preview	`qwen3-max-preview`	Early Instruct preview, deprecated after Apsara launch
Qwen3-Max-Thinking	`qwen3-max-thinking` / `qwen3-max-2026-01-23`	Reasoning, agentic workflows, math and science problems

Developers can reach the model through three official routes. The Qwen Chat web interface at chat.qwen.ai exposes Qwen3-Max for free use by individual users. The Alibaba Cloud Model Studio API (also branded as Bailian in mainland China) is the primary route for commercial integration and supports an OpenAI-compatible request schema. Third-party gateways including OpenRouter and Hugging Face's AnyCoder also surface the model under Alibaba's branding ^[4]^[5]^[2].

Pricing varies by deployment region. The international Singapore deployment lists $1.20 per million input tokens and $6.00 per million output tokens for the first 32K of context, rising to $3.00 input and $15.00 output once the context exceeds 128K. The Beijing and Frankfurt deployments are cheaper, at roughly $0.36 input and $1.43 output per million tokens in the lowest tier, reflecting Alibaba's regional pricing strategy ^[3]^[14].

Context range	Input ($/M tokens, Singapore)	Output ($/M tokens, Singapore)
0 to 32K	1.20	6.00
32K to 128K	2.40	12.00
128K to 252K	3.00	15.00

How does Qwen3-Max compare to GPT-5, Claude, and Gemini?

The reception of Qwen3-Max has split along familiar lines. Inside the open-source community, the move to closed weights for the Max tier drew complaints, since one of the things that made Qwen3 popular in the first place was the willingness to publish frontier weights under an Apache-style license. Several commentators pointed out that this puts Qwen3-Max closer in spirit to GPT-5 and Claude Opus 4 than to DeepSeek V3, which remains downloadable ^[4]^[5].

On benchmarks, the picture is genuinely competitive but not dominant. Qwen3-Max-Instruct sits at or near the frontier for non-reasoning models on coding and agentic tasks, and its 69.6 on SWE-bench Verified at launch placed it within a few points of the best closed models at the time ^[2]^[6]. The Thinking variant is where Alibaba has been most aggressive in framing comparisons, with the January 2026 release claiming the lead over GPT-5.2-Thinking, Claude Opus 4.5, and Gemini 3 Pro on Humanity's Last Exam with tools ^[11]^[12]^[20]. Independent measurements by Artificial Analysis put the Thinking Intelligence Index at 40, which is close to but not clearly ahead of comparable Western reasoning models, suggesting the picture is closer to "competitive with the frontier" than "new state of the art" ^[15]^[16].

Where Qwen3-Max has a clear edge is price and Chinese-language performance. The input pricing is roughly half of GPT-5's at comparable context lengths, and on Chinese-focused evaluations such as QwenChineseBench the Qwen models consistently lead the field, which matters for enterprises in mainland China and Southeast Asia that need to operate primarily in Mandarin, Japanese, Korean, or Malay ^[3]^[6]^[9]. Reporters at VentureBeat highlighted output speed (around 50 tokens per second for the Preview Instruct version) as another differentiator, although the Thinking variant is meaningfully slower per output token because of its long reasoning traces ^[15]^[17].

Alibaba's larger framing at Apsara 2025 was that Qwen3-Max is the first concrete step on what its Cloud unit called a multi-year roadmap toward artificial general intelligence and ultimately artificial superintelligence. That framing has been read both as ambition and as marketing. The model itself, with its closed weights, paid API, and incremental jumps over Qwen3-235B-A22B, is more conservative than the rhetoric around it ^[9]^[2].

Reporters also noted that Qwen3-Max sits inside a unique geopolitical position. It is a Chinese model with a credible claim to being competitive at the global frontier, which is rare; previous Chinese frontier candidates such as DeepSeek V3 either traded weights for the spotlight or accepted a slight benchmark deficit in exchange for openness. Qwen3-Max takes the opposite route and looks more like an American frontier lab in its product strategy: keep the weights, charge for the API, and run a large research lab on the revenue. Whether enterprises outside China are willing to standardize on a model hosted on Alibaba Cloud is a separate question that several InfoWorld analysts said depends on data isolation, system logs, model update policy, and cross-border data movement rules in each customer's home jurisdiction ^[11].

What is Qwen3-Max used for?

Alibaba positions Qwen3-Max-Instruct as a general-purpose model for chat, coding, structured data extraction, and tool use, and the Thinking variant for agentic workflows that need explicit reasoning. Early adopters reported on it most often for three things. The first is enterprise coding assistance, where the SWE-bench Verified score and the long context window make it a credible alternative to closed Western models for repository-scale tasks. The second is Chinese-language and multilingual workloads, where the model's Mandarin and Asian-language capabilities outperform models built primarily on English data. The third is mathematical and scientific problem solving in research settings where Thinking mode plus a code interpreter can be allowed to spend tens of minutes on a single problem ^[6]^[11]^[18].

The main limitations are familiar for a closed proprietary model: there are no open weights, so private deployment is impossible; the model is not multimodal, so vision and audio tasks have to be routed to sibling Qwen models; and outside of Alibaba's own infrastructure, latency and reliability depend on what each gateway provider has provisioned. The Thinking variant in particular is meaningfully slower than Instruct, which limits how easily it can be dropped into latency-sensitive applications ^[16]^[17].

References

Cogni Down Under. "Alibaba's Trillion-Parameter Giant: Why Qwen 3 Max Feels Like the Future." Medium, September 2025. https://medium.com/@cognidownunder/alibabas-trillion-parameter-giant-why-qwen-3-max-feels-like-the-future-picture-a-model-so-a4b1d961a95b ↩
Asif Razzaq. "Alibaba's Qwen3-Max: Production-Ready Thinking Mode, 1T+ Parameters, and Day-One Coding/Agentic Bench Signals." MarkTechPost, September 24, 2025. https://www.marktechpost.com/2025/09/24/alibabas-qwen3-max-production-ready-thinking-mode-1t-parameters-and-day-one-coding-agentic-bench-signals/ ↩
Alibaba Cloud Model Studio. "Supported Models and Capabilities Overview." Alibaba Cloud Documentation. https://www.alibabacloud.com/help/en/model-studio/models ↩
Skywork.ai. "Qwen3-Max-Preview: A Deep Dive into Alibaba's Trillion-Parameter Powerhouse." Skywork.ai blog. https://skywork.ai/blog/qwen3-max-preview-a-deep-dive-into-alibabas-trillion-parameter-powerhouse/ ↩
Barnacle Goose. "Qwen-3-Max-Preview: Alibaba's Trillion-Parameter LLM." Medium, September 2025. https://medium.com/@leucopsis/qwen-3-max-preview-alibabas-trillion-parameter-llm-e9cb6f982042 ↩
Patrick McGuinness. "Qwen Goes to the Max." Substack, September 2025. https://patmcguinness.substack.com/p/qwen-goes-to-the-max ↩
Asif Razzaq. "Alibaba AI Unveils Qwen3-Max Preview: A Trillion-Parameter Qwen Model with Super Fast Speed and Quality." MarkTechPost, September 6, 2025. https://www.marktechpost.com/2025/09/06/alibaba-ai-unveils-qwen3-max-preview-a-trillion-parameter-qwen-model-with-super-fast-speed-and-quality/ ↩
@Alibaba_Qwen. "Big news: Introducing Qwen3-Max-Preview (Instruct)." X (formerly Twitter), September 5, 2025. https://x.com/Alibaba_Qwen/status/1963991502440562976 ↩
Alibaba Cloud. "Alibaba Cloud Unveils Strategic Roadmaps for the Next Generation AI Innovations." Alibaba Cloud Community blog, September 24, 2025. https://www.alibabacloud.com/blog/alibaba-cloud-unveils-strategic-roadmaps-for-the-next-generation-ai-innovations_602560 ↩
Alibaba Cloud SEA. "Early preview of Qwen3-Max-Thinking." Facebook post, October 2025. https://www.facebook.com/AlibabaCloudSEA/posts/848784724382762 ↩
HowAIWorks. "Qwen3-Max-Thinking: A New Era for Reasoning Models." January 2026. https://howaiworks.ai/blog/qwen3-max-thinking-announcement-2026 ↩
Sharon Goldman. "Qwen3-Max Thinking beats Gemini 3 Pro and GPT-5.2 on Humanity's Last Exam (with search)." VentureBeat, January 2026. https://venturebeat.com/technology/qwen3-max-thinking-beats-gemini-3-pro-and-gpt-5-2-on-humanitys-last-exam ↩
Qwen Team. "Qwen3 Technical Report." arXiv:2505.09388, May 15, 2025. https://arxiv.org/pdf/2505.09388 ↩
czmilo. "Qwen3-Max 2025 Complete Release Analysis: In-Depth Review of Alibaba's Most Powerful AI Model." DEV Community. https://dev.to/czmilo/qwen3-max-2025-complete-release-analysis-in-depth-review-of-alibabas-most-powerful-ai-model-3j7l ↩
Artificial Analysis. "Qwen3 Max (Preview): Intelligence, Performance & Price Analysis." https://artificialanalysis.ai/models/qwen3-max-preview ↩
Artificial Analysis. "Qwen3 Max Thinking Benchmarks and Analysis." https://artificialanalysis.ai/articles/qwen3-max-thinking-everything-you-need-to-know ↩
Carl Franzen. "Qwen3-Max arrives in preview with 1 trillion parameters, blazing fast response speed, and API availability." VentureBeat. https://venturebeat.com/ai/qwen3-max-arrives-in-preview-with-1-trillion-parameters-blazing-fast ↩
DataCamp. "Qwen3-Max-Thinking: Hands-On With the Largest LLM in the World." DataCamp Tutorial. https://www.datacamp.com/tutorial/qwen3-max-thinking ↩
apidog. "Can Qwen3-Max Outperform Leading AI Models in Coding and Reasoning?" apidog blog. https://apidog.com/blog/qwen3-max/
Qwen Team. "Pushing Qwen3-Max-Thinking Beyond its Limits." Qwen blog (qwen.ai), January 25, 2026. https://qwen.ai/blog?id=qwen3-max-thinking ↩
Qwen Team. "Qwen3-Max: Just Scale it." Qwen blog (qwen.ai), September 2025. https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2777d ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Best Open-Source LLMs DeepSeek vs Llama vs Qwen GLM-4.6 LLM API Pricing Comparison LLM Size and Parameter Comparison Mistral Large 3 Qwen Qwen3 Qwen3-Coder Qwen3-VL Qwen3.5 Qwen3.7-Max

Background and development

When was Qwen3-Max released?

What is the architecture of Qwen3-Max?

Context window and modality

Instruct and Thinking modes

Long context behavior

How was Qwen3-Max trained?

How good is Qwen3-Max on benchmarks?

How do you access Qwen3-Max, and is it open source?

How does Qwen3-Max compare to GPT-5, Claude, and Gemini?

What is Qwen3-Max used for?

See also

References

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here