MiniMax M3
Last reviewed
Jun 2, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 ยท 2,231 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 ยท 2,231 words
Add missing citations, update stale details, or suggest a clearer explanation.
MiniMax M3 is a large language model developed by the Shanghai-based AI company MiniMax, released on June 1, 2026 as the next flagship in the company's M-series. [1][2] MiniMax positions it as a single model combining frontier-level coding and agentic performance, a context window of up to one million tokens, and native multimodality, built on a new sparse attention design the company calls MiniMax Sparse Attention (MSA). [1][3] At launch the model went live through MiniMax's API and the company's coding product, while the open weights and a technical report were pledged for release on Hugging Face and GitHub within roughly ten days rather than at launch. [1][4][5]
M3 is the successor to MiniMax M2 and is marketed primarily as a coding and agent model. [1][2] Its headline claim is that it is the first model to bring together three capabilities that had usually appeared separately: frontier coding and agentic behavior, a one-million-token context window, and native understanding of images alongside text, all in one system. [1][3] The defining engineering change versus M2 is the attention mechanism. Where M2 used full attention, M3 introduces MSA, a sub-quadratic sparse-attention scheme that MiniMax says cuts per-token compute at one-million-token context to about one twentieth of the previous generation while preserving quality. [1][6]
On its own benchmarks MiniMax reports that M3 surpasses GPT-5.5 and Gemini 3.1 Pro on several coding and agentic tasks and approaches, though does not match, the strongest proprietary models from Anthropic. [2][4] The model is also priced aggressively: MiniMax and outside reporting put its API cost at roughly 8 to 20 percent of comparable leading U.S. proprietary models. [2][4] As with prior MiniMax releases, the company did not disclose the model's parameter count at launch, and at the time of release the benchmark figures were vendor-reported and had not been independently reproduced. [5][7]
MiniMax is a Chinese AI company headquartered in Shanghai, known for a line of open-weight foundation models and consumer AI products. The M-series is its family of large reasoning and agentic models, preceding releases of which include MiniMax M1, released in mid-2025 with a one-million-token training context and the company's "Lightning Attention" mechanism, and MiniMax M2, released in late 2025 as a mixture-of-experts model that the company published with open weights. [6][8] M3 continues this lineage, and MiniMax framed it explicitly as the next flagship after M2. [1][2]
A notable thread across the series is MiniMax's stated philosophy on attention. The company previously published a piece titled "Why Did M2 End Up as a Full Attention Model?", arguing for stabilizing a model with mainstream methods first and only swapping in a more exotic attention design once that design is mature. [6] M3 represents the point where MiniMax judged its sparse-attention work ready to ship in a flagship.
MiniMax announced M3 on June 1, 2026, with the company describing it as "officially released today." [1] VentureBeat reported the launch as taking place on the evening of Sunday, June 1, U.S. Eastern time. [2] At release the model was reachable in three ways: through MiniMax Code, the company's agentic coding product, which was updated to run on M3; through MiniMax's token-plan subscriptions; and through the standard API on the MiniMax platform, under the model identifier MiniMax-M3. [1][9]
Crucially, the open weights and the accompanying technical report were not part of the launch. MiniMax stated that "over the next 10 days" it would release the technical report and open-source the corresponding model weights on Hugging Face and GitHub, a window that pointed to roughly June 11, 2026. [1][4][5] As of launch day, no MiniMax-M3 repository had appeared on the company's Hugging Face organization, whose newest published models remained in the M2 series. [10] Several outlets stressed this gap, noting that until the weights ship, the "open-weight" label is a company commitment rather than a verified fact. [5][7]
The central technical feature of M3 is MiniMax Sparse Attention (MSA), described by MiniMax as a clean and easily extensible new sparse-attention architecture that makes long context "another dimension that can be scaled." [1] Rather than computing full attention over every token, MSA restricts each query to a small subset of the KV cache, which is what allows the model to reach one-million-token context at sharply reduced cost. [3][6] Because MiniMax had not released its technical report at launch, the most detailed public accounts of MSA were reconstructions by outside analysts working from MiniMax's published diagram, so the specifics below should be read as informed reconstruction rather than confirmed disclosure. [6]
As reconstructed, MSA is a two-stage, block-level scheme built on a grouped-query attention substrate. [6] An inexpensive index branch first scores blocks of the KV cache using a single-head key projection and a block-max-pooling step, then selects the top-k blocks; a sparse branch then runs ordinary grouped-query attention over only the selected blocks. [6] Query heads within the same grouped-query group share one block selection, which keeps memory access contiguous and lets the design reuse standard FlashAttention-style kernels with little modification. [6] Analysts characterized MSA as a streamlined variant of DeepSeek's Native Sparse Attention (NSA), keeping only the selection branch and dropping NSA's compression and sliding-window branches, and contrasted its real-key-value, grouped-query approach with the latent-attention substrates used in DeepSeek's DeepSeek V3.2 and later sparse schemes. [6] The implied sparsity, around six to seven percent of blocks attended at one-million-token context, gives an effective receptive field on the order of tens of thousands of tokens. [6]
MiniMax reports concrete efficiency gains from MSA at one-million-token context: per-token compute about one twentieth that of M2, more than nine times faster prefill (reported as 9.7 times), and more than fifteen times faster decoding (reported as 15.6 times). [1][6] The company also compared MSA favorably to the open-source Flash-Sparse-Attention, claiming it runs more than four times faster. [1] These figures are MiniMax's own and had not been independently reproduced at launch. [7]
MiniMax did not disclose M3's total or active parameter counts at launch, consistent with the company holding back full technical details until the planned technical-report release. [1][3] M3 is widely described as a mixture-of-experts model in the lineage of M2, whose published weights totaled in the low hundreds of billions of parameters, but precise figures for M3 were not available from MiniMax at release. [6][10]
M3 supports a context window of up to one million tokens, with MiniMax guaranteeing a minimum of 512K tokens of context availability through the API. [3][9] This long context is the capability MSA is built to serve, and MiniMax's pricing reflects it: API calls within 512K input tokens are billed at a standard rate, while calls above 512K use a higher long-context rate. [1][9]
M3 is a natively multimodal model. MiniMax describes it as supporting image input, and some coverage and the company's launch materials describe video understanding and the ability to operate a desktop computer as well. [1][3][9] Reporting on the official model page emphasized text and image input for M3 specifically, while the launch blog and several outlets cited image and video. [3][9] The model targets long-horizon, multi-step agentic work: writing and modifying code across large codebases, autonomous web browsing and information retrieval, and tool use via interfaces such as the Model Context Protocol. [1][4] A thinking mode can be toggled on the API at the same price as non-thinking calls, and the service exposes standard and priority tiers. [1]
MiniMax published a set of benchmark results positioning M3 against leading proprietary models, principally GPT-5.5, Gemini 3.1 Pro, and Anthropic's Claude Opus line. The figures below are as reported by MiniMax and reproduced in launch coverage. They are vendor-run on MiniMax's own infrastructure with baselines chosen by the company, and independent evaluators had not published comparable scores at launch. [4][5][7]
| Benchmark | MiniMax M3 (reported) | Comparison as stated by MiniMax | Source |
|---|---|---|---|
| SWE-Bench Pro | 59.0% | Surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Claude Opus 4.7 | [1][2] |
| Terminal-Bench 2.1 | 66.0% | Reported figure | [1][4] |
| SWE-fficiency | 34.8% | Reported figure | [1][4] |
| KernelBench (Hard) | 28.8% | Reported figure | [1][4] |
| MCP Atlas | 74.2% | Reported figure | [1][4] |
| BrowseComp | 83.5 | Exceeds Claude Opus 4.7 (79.3) | [3][4] |
| OSWorld-Verified | 70.06% | Reported figure | [4] |
| SVG-Bench | Not stated | MiniMax says it surpasses Claude Opus 4.7 | [1][3] |
| OmniDocBench | Not stated | MiniMax says it scores above Gemini 3.1 Pro | [1][3] |
| PostTrainBench | 0.37 | Below Claude Opus 4.7 (0.42) and GPT-5.5 (0.39) | [3] |
The most-discussed result was SWE-Bench Pro, a benchmark for software-engineering code modification. M3's 59.0% beats the GPT-5.5 and Gemini 3.1 Pro scores cited by MiniMax, but it sits below the strongest Anthropic model: by launch day, Claude Opus 4.8 led that benchmark at 69.2%, and reporters noted MiniMax's announcement compared against the older Claude Opus 4.7 rather than the newer model. [2][5]
At launch M3 was available as a hosted service while its open-weight release remained pending. The table summarizes the position as of June 1, 2026.
| Aspect | Status at launch (June 1, 2026) | Source |
|---|---|---|
| API access | Live on the MiniMax platform, model id MiniMax-M3 | [1][5][9] |
| Coding product | MiniMax Code updated to run on M3 | [1] |
| Open weights | Not released at launch; pledged on Hugging Face and GitHub within about 10 days (around June 11, 2026) | [1][5][10] |
| Technical report | Not released at launch; pledged within the same 10-day window | [1][5] |
| License name | Not specified at launch | [4][5] |
| API price | About $0.60 per million input tokens and $2.40 per million output tokens | [2] |
| Long-context pricing | Standard rate up to 512K input tokens; higher rate above 512K | [1][9] |
| Token plans | Plus ~$20/mo (~1.7B tokens), Max ~$50/mo (~5.1B tokens), Ultra ~$120/mo (~9.8B tokens) | [1] |
MiniMax and VentureBeat framed the pricing as a major selling point, with M3 running at roughly 8 to 20 percent of the cost of leading proprietary U.S. models. [2] Third-party API marketplaces listed the model shortly after launch, in some cases under a dated identifier and with temporary promotional discounts. [11]
Early coverage treated M3 as a serious entrant in the open-weight frontier race while flagging that key pieces of the release were still outstanding. The Decoder described it as an open-weight model with a million-token context challenging proprietary leaders, and MarkTechPost and Pandaily highlighted the MSA architecture, the long context, and the agentic and multimodal positioning. [3][4][12] VentureBeat's framing was that M3 eclipsed GPT-5.5 and Gemini 3.1 Pro on key benchmarks "for just 5 to 10 percent of the cost," a notable claim given the model's open-weight ambitions. [2] VentureBeat had also previously reported on MiniMax "teasing" the M3 series and its sparse-attention approach ahead of the full launch, underscoring that the architecture had been previewed before the model shipped. [13]
Several caveats accompanied the launch. The most important is that the open weights and the technical report were not available at release; MiniMax committed to publishing both within about ten days, but until they ship the open-weight designation is a pledge rather than a verifiable fact, and the precise architecture remained partly a matter of outside reconstruction. [5][6][7] The benchmark numbers were all vendor-run on MiniMax's own infrastructure with self-selected baselines, and independent evaluation services had not published comparable scores at launch. [5][7] The headline SWE-Bench Pro figure, while ahead of some rivals, trailed the strongest contemporary proprietary model, and the launch comparison used an older Anthropic model as its reference point. [2][5] MiniMax also did not disclose the model's parameter count. [3] Finally, as a Chinese API service, M3 falls under the jurisdiction of China's 2017 National Intelligence Law, a consideration some coverage raised for users processing sensitive data through the hosted API. [7]