Qwen3.7-Max
Last reviewed
Jun 2, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 ยท 1,837 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 ยท 1,837 words
Add missing citations, update stale details, or suggest a clearer explanation.
Qwen3.7-Max is a closed-weight frontier large language model developed by Alibaba's Qwen team, announced in May 2026 as the flagship of the Qwen3.7 generation. Alibaba positions it as a proprietary model "designed for the agent era," built less as a conversational chatbot and more as a foundation for autonomous agents that write code, call tools, and run long-horizon tasks with limited human supervision. [1][2] The model supports extended thinking and a context window of up to one million tokens, and it is offered through Alibaba Cloud's Model Studio platform rather than as a public weight release, a notable departure from the open-weight lineage that made the Qwen brand widely known. [2][3][4]
Qwen3.7-Max sits at the top of Alibaba's 2026 model lineup as the company's most capable proprietary system. Where most of the Qwen family has shipped with downloadable weights under permissive licenses, the "Max" tier is the line Alibaba keeps closed and serves only through its cloud API. [3][5] The Qwen3.7-Max release continued that pattern and pushed it further: the model is text-only, reasoning-first, and tuned for agentic workflows such as software engineering, tool use, and office automation, rather than for one-shot chat. [1][2]
Independent evaluation placed it among the strongest models available at launch. On the Artificial Analysis Intelligence Index, Qwen3.7-Max scored 56.6, ranking fifth overall and making it the highest-placed Chinese model on that leaderboard, ahead of Google's Gemini 3.5 Flash. [3][6] That score trailed only a handful of Western frontier systems, including OpenAI's GPT-5.5, Anthropic's Claude Opus 4.7, and Google's Gemini 3.1 Pro Preview. [6]
Understanding Qwen3.7-Max requires separating it from the broader Qwen catalog, which is split between open and closed tiers. Alibaba Cloud has released a large number of open-weight Qwen models, including dense and mixture-of-experts variants, multimodal systems, and coding-focused models such as Qwen3-Coder. The "Max" models are different. They are the company's proprietary flagships, never published as weights and accessible only via Model Studio. The earlier Qwen3-Max followed this closed pattern, and Qwen3.7-Max is its successor. [3][5]
Qwen3.7-Max is therefore distinct from the open-weight Qwen3.x releases and from the intermediate closed flagships that preceded it. The Qwen3.7 series advanced beyond Qwen3.6 and Qwen3.5: Alibaba reported that the Intelligence Index rose by 4.8 points relative to the Qwen3.6 Max preview, from 51.8 to 56.6, and that the context window expanded from the 256K tokens of the Qwen3.6 generation to one million. [3][4] As of late May 2026, no Qwen3.7 weights had been published on Hugging Face, confirming the model's closed status at launch. [3]
The table below summarizes how the Max tier differs from the rest of the family.
| Attribute | Open Qwen3.x models | Qwen3.7-Max |
|---|---|---|
| Weights | Published (Hugging Face, ModelScope) | Closed, not released [3] |
| Access | Self-host or API | Alibaba Cloud Model Studio API only [4] |
| Positioning | General models, coding, multimodal | Proprietary agent-first flagship [1][2] |
| Example members | Qwen3, Qwen3-Coder, Qwen3-VL | Qwen3-Max, Qwen3.7-Max [3][5] |
Alibaba's Qwen team published the technical announcement, titled "Qwen3.7: The Agent Frontier," on the Alibaba Cloud blog on May 21, 2026, describing Qwen3.7-Max as "our latest proprietary model designed for the agent era." [2] The model was presented publicly at the 2026 Alibaba Cloud Summit, held in Hangzhou around May 20, 2026, with the commercial API going live on Model Studio shortly before the event, on May 19. [3][4]
A preview phase preceded the formal launch. Two preview entries, reported as Qwen3.7-Max-Preview and a Qwen3.7-Plus-Preview, appeared on a public model arena leaderboard in mid-May, and the production release dropped the "-Preview" suffix once the model became generally available. [4] Several third-party hosts, including OpenRouter and Together AI, cross-listed the model at or near launch. [3][5]
Alibaba disclosed very little about the internal design of Qwen3.7-Max. The official announcement does not state a parameter count, an expert configuration for any mixture-of-experts layout, an activation size, or attention details, and several reviewers noted the absence of a full technical report. [2][7] What is documented is operational rather than structural:
preserve_thinking API parameter that carries the model's internal thinking content across turns in a conversation, which Alibaba recommends for agentic tasks. [2]Because Alibaba has not released weights or a detailed model card, claims about the underlying architecture remain unverifiable, and this article does not assert specifics that the company has not confirmed.
Qwen3.7-Max is a reasoning model: it produces an internal chain-of-thought before emitting a final answer, a mode Alibaba describes as extended thinking. [3][2] The design goal stated in the announcement is sustained, long-horizon autonomy rather than short conversational turns. [1][2]
Long context is a central feature. The one-million-token window lets the model hold large codebases, long document sets, or extended agent trajectories in a single session, and it roughly quadruples the 256K limit of the prior Qwen3.6 generation. [3][4] Vendors describe the model as built for agentic workloads: coding and debugging, tool use, office and productivity automation, and tasks that span hundreds or thousands of steps. [1][5]
Alibaba's headline demonstration of long-horizon behavior was a kernel-optimization run. According to the announcement, the model executed roughly 35 hours of continuous autonomous work, performing 432 kernel evaluations across 1,158 tool calls, and achieved a 10.0x geometric-mean speedup over a reference implementation on previously unseen hardware. [2] The company framed this as evidence that the model maintains a coherent strategy across more than a thousand tool calls without losing context. [2] These figures come from Alibaba's own internal testing and have not been independently reproduced.
For integration, Qwen3.7-Max is served through an OpenAI-compatible chat-completions endpoint and an Anthropic-compatible protocol, and Alibaba states it works with agent frameworks including Claude Code and Qwen Code, with native support for the Model Context Protocol. [2][8]
Two categories of benchmark data circulated at launch: Alibaba's own self-reported scores from the announcement, and the independent Intelligence Index from Artificial Analysis. They are presented separately below because they come from different methodologies and should not be conflated.
The independent composite score is the cleanest single number. Artificial Analysis aggregates ten evaluations, including GDPval-AA, Terminal-Bench Hard, SciCode, AA-Omniscience, Humanity's Last Exam, and GPQA Diamond, into one index. [6]
| Metric (independent) | Qwen3.7-Max | Source |
|---|---|---|
| Artificial Analysis Intelligence Index | 56.6 | [6] |
| Rank at launch | #5 overall, highest Chinese model | [3][6] |
| Comparison point | Ahead of Gemini 3.5 Flash (55.3) | [6] |
| Output verbosity (eval) | ~97M tokens generated vs ~24M median | [3][6] |
Alibaba's own reported figures, drawn from the launch post, emphasize coding, agents, and reasoning. These are vendor-reported and were footnoted at a 256K context setting. [2]
| Benchmark (Alibaba-reported) | Qwen3.7-Max | Source |
|---|---|---|
| SWE-bench Verified | 80.4 | [2] |
| SWE-Pro | 60.6 | [2] |
| Terminal-Bench 2.0 | 69.7 | [2] |
| MCP-Mark | 60.8 | [2] |
| GPQA Diamond | 92.4 | [2] |
| HMMT 2026 | 97.1 | [2] |
| SpreadSheetBench-v1 | 87.0 | [2] |
Reviewers flagged a few caveats in the independent numbers. On AA-Omniscience, a knowledge-and-abstention test, the model's raw accuracy dropped relative to the prior Max preview while its abstention rate rose, and Artificial Analysis observed unusually high token generation during evaluation, marking the model as verbose. [7][6]
Qwen3.7-Max is API-only, delivered through Alibaba Cloud Model Studio, with no consumer weight download. [4][8] The official launch post stated that pricing would follow, and Alibaba published the rate card on Model Studio in the days after the summit; the same rates were mirrored by third-party hosts such as OpenRouter and Together AI. [2][3][5]
| Item | Detail | Source |
|---|---|---|
| Access | Alibaba Cloud Model Studio (API only) | [4][8] |
| API compatibility | OpenAI-compatible and Anthropic-compatible endpoints | [2][8] |
| Input price | $2.50 per 1M tokens | [3][5] |
| Output price | $7.50 per 1M tokens | [3][5] |
| Cached input | $0.25 per 1M tokens (about 90% off input) | [3][9] |
| Launch promotion | 50% off, listed near $1.25 input / $3.75 output | [9] |
| Third-party hosts | OpenRouter, Together AI | [3][5][9] |
Commentators noted that the per-token rate landed at roughly half the price of comparable Western flagships, which several framed as aggressive pricing for a model near the top of the leaderboard. [3][5]
Coverage treated Qwen3.7-Max as a significant entry from China's most active frontier-model shipper of 2026, and the closed-weight decision drew particular attention given Alibaba's open-source reputation. [3][7] The reasoning-agent framing was widely echoed: outlets described the model as an "agent frontier" system aimed at autonomous, long-running work rather than chat, and highlighted the one-million-token context and the multi-hour kernel-optimization demo as the standout claims. [1][7][5] The fifth-place finish on the independent Intelligence Index, and the status as the top-ranked Chinese model, were the most cited results. [3][6]
Several limitations were noted at and after launch. The model is text-only, so it cannot natively process images or other modalities. [1][8] Alibaba did not publish a full technical report or the model weights, leaving architecture and training details undisclosed and many claims dependent on the company's own statements. [2][7] Independent evaluators flagged high verbosity, which can raise effective costs in long agentic sessions despite the favorable per-token rate, and a drop in raw knowledge accuracy paired with higher abstention on at least one benchmark. [6][7] As a proprietary, API-gated model, it also cannot be self-hosted, audited at the weight level, or run offline, in contrast to the open Qwen3.x releases. [3][5]