MiniMax M2

AI Agents AI Models Chinese AI Large Language Models Mixture of Experts Open Source AI

17 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v3 · 3,379 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

MiniMax M2 is an open-weight large language model released on October 27, 2025 by the Shanghai-based AI company MiniMax, built as a Mixture of Experts model with 230 billion total parameters and roughly 10 billion active parameters per token and tuned specifically for coding and agentic workflows rather than general chat. At launch it topped the Artificial Analysis Intelligence Index among open-weight systems with a composite score of 61, shipped under a permissive MIT license, and was priced at about US$0.30 per million input tokens and US$1.20 per million output tokens, which MiniMax pitched as roughly 8 percent of the price of Anthropic's Claude Sonnet 4.5 while running at nearly twice the speed.¹²³

MiniMax describes the design as a deliberately compact "mini" model intended to maximize throughput, lower the cost of long agent loops, and run inside developer tools like Claude Code, Cursor, and Cline. The Hugging Face model card frames it as "a Mini model built for Max coding & agentic workflows," and the launch post sums up the guiding idea as: "The core principle is simple: to create a model that meets our requirements, we must first be able to use it ourselves."¹⁴⁵

The weights were published on Hugging Face under an MIT license at launch, and the MiniMax Open Platform offered a limited-time free API trial that ran through early November 2025. The list price after the free period was set at 2.1 yuan (around US$0.30) per million input tokens and 8.4 yuan (around US$1.20) per million output tokens.¹⁶ On the Artificial Analysis Intelligence Index, M2 took the top spot among open-weight systems at the time of release, with a composite score of 61 and standout numbers on tool-use, search, and end-to-end software-engineering benchmarks such as SWE-bench Verified, Terminal-Bench, and BrowseComp.⁴³

The model is the third generation of MiniMax's open foundation-model line after MiniMax-Text-01 and MiniMax M1, and the first that explicitly trades long-context capacity for active-parameter efficiency. Where M1 advertised a 1 million-token context window built on "lightning attention," M2 ships with a 204,800-token window and a standard MoE transformer, with the savings reinvested into a fast inference path designed for agent harnesses that fire dozens of short tool calls per session.⁶⁵

What company makes MiniMax M2?

MiniMax (Chinese: Shanghai Xiyu Technology) was founded in December 2021 by computer-vision researchers who had previously worked at SenseTime, including chief executive Yan Junjie, Bin Yang, and Yucong Zhou. The company received early backing from MiHoYo, the studio behind Genshin Impact, and went on to raise a $600 million round led by Alibaba in March 2024 at a valuation of roughly $2.5 billion. It listed on the Hong Kong Stock Exchange on January 9, 2026.⁷

MiniMax is one of China's "Six Little Tigers" (六小虎) of AI alongside Moonshot, Zhipu, Baichuan, 01.AI, and StepFun. Its consumer products include the Hailuo AI text-to-video and image generator, the MiniMax Audio TTS platform, and Talkie, an English-language AI-companion app that the Wall Street Journal reported had around 11 million monthly active users in mid-2024. On the model side, MiniMax has shipped the ABAB series of MoE chat models, the MiniMax-01 family that introduced lightning attention at scale, and the M1 reasoning model released in June 2025.⁷⁸

M1 was significant in its own right because MiniMax claimed it had trained the 456B-parameter base for around US$534,700 in compute rentals, roughly 1/200th of the estimated training cost of GPT-4o. That number got attention partly because it landed in the wake of DeepSeek's V3 paper, which had already set a low-cost benchmark for Chinese open models, and partly because M1 paired the cheap-training story with a 1 million-token context window. M2 was developed in the same cost-conscious tradition but pointed at a different problem: making agent tool-call loops fast and cheap enough to leave running.⁸⁹

How is MiniMax M2 built?

MiniMax M2 is a sparse Mixture of Experts transformer with 229 to 230 billion total parameters and approximately 10 billion active parameters per forward pass, depending on routing. The Hugging Face model card frames the design philosophy as "compact, fast, and cost-effective," with the small active footprint chosen so that interactive agent loops do not have to wait for the full model on every step. Artificial Analysis noted that the model is sparse enough to be served on as few as four NVIDIA H100 GPUs at FP8 precision.⁴⁵²

The architecture details published by MiniMax are deliberately spare. The README on GitHub lists the model as MoE with the 230B/10B split, recommends SGLang and vLLM as first-class inference runtimes with day-zero kernels, and ships an MLX-LM option for local Apple Silicon use. It does not publish the exact number of experts, layers, attention heads, or routing top-k, and as of this article those numbers have not been disclosed in primary documentation.⁵⁴

Context length and output

M2 supports a context window of 204,800 tokens (often rounded to 205K) and the same on output. That is a substantial step down from M1's million-token window. MiniMax's own blog framing is that real agent workloads do not exhaust a million tokens before tool calls reset the conversation, and that a shorter context paired with faster decoding produces better wall-clock results in benchmarks like SWE-bench and Terminal-Bench.⁶¹

Interleaved thinking

A distinctive feature is what MiniMax calls interleaved thinking. The model is trained to emit reasoning content inside <think>...</think> blocks between assistant turns, and MiniMax instructs callers to keep those blocks in the conversation history rather than stripping them out. The model card warns: "you must ensure that the historical content is passed back in its original format. Do not remove the <think>...</think> part, otherwise, the model's performance will be negatively affected." Removing the thinking traces in multi-turn agent settings, according to the card, significantly degrades performance because later steps lean on the earlier reasoning. The pattern is similar in spirit to the interleaved-reasoning format used in DeepSeek-R1 and Anthropic's extended thinking but is exposed as a first-class part of the protocol.⁴¹⁰

Recommended inference settings

MiniMax recommends temperature 1.0, top-p 0.95, and top-k 40 for M2. The Hugging Face card warns that lower temperatures hurt the model's exploration during tool-use planning, and the SGLang and vLLM examples in the GitHub repository use the same defaults.⁴⁵

How was MiniMax M2 trained?

MiniMax has not published a training paper for M2 as of mid-2026, and the public material is light on specifics about the data mix, total token count, hardware, or reinforcement-learning recipe. The Hugging Face card describes M2 as "engineered for end-to-end developer workflows" and lists the kinds of tasks that drove the post-training mix, including multi-file edits, run-and-fix loops on compilation and test errors, terminal use, web browsing, and structured tool calling. The model is described as having been trained with interleaved thinking as a native output format from early in post-training rather than as a wrapper layer added on top of an existing chat model.⁴⁵

MiniMax also has not published a training-cost figure for M2 in the same way it did for M1. The general framing in the launch coverage is that the small active-parameter count and aggressive use of agent-style RL on coding traces are what allows the model to outperform much larger open models on agentic benchmarks while staying cheap to serve.³⁶

How does MiniMax M2 perform on benchmarks?

The primary benchmark numbers below come from the Hugging Face model card, where MiniMax publishes its own results alongside the evaluation harness used (Claude Code as scaffolding for SWE-bench, OpenHands 0.42 for AgentCompany, and so on). Numbers from third parties such as Artificial Analysis are noted where they are relevant.⁴²

Coding and software engineering

Benchmark	MiniMax M2 score	Notes
SWE-bench Verified	69.4	100 max steps, 128K context, Claude Code scaffold
Multi-SWE-Bench	36.2	Averaged across 8 runs
SWE-bench Multilingual	56.5	Cross-language repository tasks
LiveCodeBench	83	Per Artificial Analysis composite
ArtifactsBench	66.8	Averaged across 3 runs
SciCode	36	Scientific coding subset

The SWE-bench Verified number is the headline figure for coding. At 69.4, M2 lands well above earlier open-weight reasoning models such as DeepSeek-R1 and is within a few points of Anthropic's Claude Sonnet 4.5, which scored around 77.2 on the same benchmark in the period after launch. MiniMax explicitly notes that M2 is evaluated inside Claude Code's harness rather than a custom one, which makes the comparison closer to a like-for-like agent test than some earlier reports.⁴¹¹

Agentic and tool-use

Benchmark	MiniMax M2 score	Notes
Terminal-Bench	46.3	Averaged across 8 runs
Terminal-Bench-Hard	24.0	Hard subset
BrowseComp	44.0	English web research
BrowseComp-zh	48.5	Chinese variant
GAIA (text only)	75.7	103-sample validation subset
xbench-DeepSearch	72.0	Long-horizon search
FinSearchComp-global	65.5	Financial document search
AgentCompany	36.0	OpenHands 0.42 framework
tau-squared Bench	77.2	Extended thinking with tool use

The agentic numbers are the part of the table that MiniMax highlights most. Terminal-Bench at 46.3 and BrowseComp at 44 were, in October 2025, the best published results from any open-weight model on those tasks. The Terminal-Bench result of 46.3 comfortably beat Claude 4's 36.4 on the same test and came close to Claude Sonnet 4.5's 50.0. The GAIA text-only score of 75.7 also placed M2 above DeepSeek-V3.1 and Kimi K2 on the same validation subset, although Kimi K2 Thinking later overtook M2 on some of the same metrics once Moonshot released its reasoning variant.⁴³

General intelligence

Benchmark	MiniMax M2 score	Notes
Artificial Analysis composite	61	#1 open-weight at launch
MMLU-Pro	82	Multi-subject reasoning
GPQA-Diamond	78	Graduate-level physics, biology, chemistry
AIME 2025	78	High-school math olympiad
HLE (with tools)	31.8	Humanity's Last Exam with search and Python
HLE (no tools)	12.5	Closed-book
IFBench	72	Instruction following
AA-LCR	61	Long-context reasoning
tau-squared-Telecom	87	Telecom domain agents

At launch, Artificial Analysis reported that "MiniMax's M2 achieves a new all-time-high Intelligence Index score for an open weights model and offers impressive efficiency with only 10B active parameters." Its composite of 61 placed it ahead of DeepSeek-V3.1 and GLM-4.5 but behind Anthropic's Claude Sonnet 4.5 and OpenAI's GPT-5 on overall composite. On agentic sub-scores it sat closer to those proprietary models, which is the part of the index MiniMax pushed hardest in its launch materials. (Artificial Analysis later migrated to a revised index version that re-scored older models downward, so the model's current page shows a lower figure; the 61 reflects the launch-era methodology.)²³

It is worth noting that several of these scores are self-reported by MiniMax. Third-party replications by groups like LMSYS and independent reviewers have generally confirmed the SWE-bench Verified and Terminal-Bench numbers within a couple of points, but the agentic browse results are harder to reproduce because they depend on the exact tool harness, browser state, and rate-limit conditions used during evaluation.³¹²

How much does MiniMax M2 cost?

M2's weights are published on Hugging Face under the MIT license, which permits commercial use, redistribution, and fine-tuning without a separate agreement. This put M2 on a more permissive footing than some peer Chinese releases that ship under custom commercial-use clauses. MiniMax later moved toward more restrictive licenses for its M2.1, M2.5, and M2.7 successors, but the original M2 release remains MIT.⁴¹³

Hosted API pricing

MiniMax offers M2 directly through the MiniMax Open Platform and also through partners including Vercel AI Gateway, OpenRouter, NVIDIA NIM, and Microsoft Azure AI Foundry, where it was added shortly after launch.¹⁴¹⁵

Tier	Input price	Output price	Notes
MiniMax Open Platform	$0.30 / 1M tokens	$1.20 / 1M tokens	2.1 RMB / 8.4 RMB native
Free trial	$0.00	$0.00	Through November 7, 2025
MiniMax Agent	Free	Free	Lightning Mode and Pro Mode during trial

MiniMax described the post-trial pricing as around 8 percent of what Anthropic charges for Claude Sonnet 4.5 on a comparable token-mix basis, and reported a sustained output speed of roughly 100 tokens per second per request during launch tests, with Artificial Analysis independently measuring output throughput of 100 to 124 tokens per second across providers. On a blended 7:2:1 cache/input/output mix, Artificial Analysis put the effective cost at about $0.39 per million tokens. The free trial covered both the API and the MiniMax Agent product built on top of M2, and the trial period was extended once before the paid tier went live.¹⁶²

Self-hosting

For teams that want to run the model themselves, MiniMax's GitHub repository ships SGLang and vLLM configurations that boot M2 on a single multi-GPU node with sufficient HBM to hold the 230 billion total parameters in BF16 or FP8. Quantized GGUF builds maintained by community contributors such as Unsloth and the Cerebras MiniMax-M2-REAP-162B-A10B "reaped" variant have lowered the bar further. A 4-bit Q4 GGUF fits within roughly 130 GB of RAM, which makes the model practical for high-end workstations and small inference servers as well as cloud nodes.⁵¹⁶

How was MiniMax M2 received?

Reception across English-language coverage was broadly positive, with most reviewers calling out the same three points: M2 was at or near the top of open-weight charts on agentic benchmarks, it was cheap enough to leave running inside coding agents without watching costs, and the small 10B active parameter count made it noticeably faster than peer open models in tight tool-call loops. VentureBeat ran a piece headlined "MiniMax-M2 is the new king of open source LLMs" within two days of the release, citing the Artificial Analysis ranking. MarkTechPost and DigitalOcean published deeper architecture and benchmarking pieces in the days that followed.¹⁷³¹⁸

There was also some skepticism. The 1 million-token context window from M1 was popular with users doing long-document analysis, and dropping back to 205K felt like a step backward for that use case even though the agent-loop case is different. Reviewers also pointed out that the published benchmark results are heavily weighted toward agent and coding tasks where MiniMax had tuned the post-training mix, and that on pure reasoning competitions like AIME and HLE the model still trailed proprietary frontier systems. Hacker News commenters and the Hugging Face discussion threads flagged the usual concerns about self-reported benchmarks and asked for independent re-runs.⁶¹⁹

How does MiniMax M2 compare to other open-weight models?

The table below compares the headline benchmark numbers for MiniMax M2 against the leading open-weight contemporaries from China and a frontier proprietary model. Where numbers come from the publishers' own materials they are noted as self-reported.

Metric	MiniMax M2	Kimi K2 (Moonshot)	GLM-4.5 (Zhipu)	Qwen 3 Max	Claude Sonnet 4.5
Total parameters	230B	1T	355B	235B+	undisclosed
Active parameters	~10B	32B	32B	~22B	undisclosed
Context window	205K	128K	200K	256K	200K
SWE-bench Verified	69.4	65.8	64.2	69.6	77.2
Terminal-Bench	46.3	39.2	37.5	not reported	50.0
BrowseComp	44.0	32.0	26.4	not reported	30.0
MMLU-Pro	82	81	81	84	86
AIME 2025	78	87	83	87	87
License	MIT	Modified MIT	MIT	Apache 2.0 (most sizes)	Proprietary
Approx. blended price ($/1M tokens)	0.30 / 1.20	0.60 / 2.50	0.40 / 1.40	0.85 / 3.40	3.00 / 15.00

The shape of M2's strengths is clear in this table. It does not lead on raw math reasoning or general MMLU-Pro, where Kimi K2 and Qwen 3 Max are stronger, but it leads the open-weight group on agentic and tool-use benchmarks and ties Qwen 3 Max for the top open SWE-bench Verified score at launch. Against Claude Sonnet 4.5 it trails by about 8 points on SWE-bench and a few points on Terminal-Bench, but it costs roughly an order of magnitude less per token.⁴¹²¹¹

A separate point worth flagging is the comparison with DeepSeek. By the time of M2's release, DeepSeek V4 and DeepSeek V3.2 were both in the open-weight conversation as well. DeepSeek-V3.1 outscored M2 on a few raw reasoning benchmarks but lagged it on agentic tool-use, and DeepSeek had not yet released a fully matched coding-agent variant at the time M2 launched. By December 2025 the open-weight leaderboard had reshuffled again, with Kimi K2 Thinking and GLM-4.6 closing the agentic gap, which prompted MiniMax to ship M2.1 and then M2.5 in the months that followed.³²⁰

How is MiniMax M2 used inside coding agents?

MiniMax's marketing leans heavily on the picture of M2 running inside developer-agent harnesses, and several of the early reviews tested it in exactly that way. M2 worked out of the box inside Claude Code (using the OpenAI-compatible adapter), Cursor's agent mode, Cline, and the open-source OpenDevin and OpenHands frameworks. The 10B active footprint translated into noticeably snappier tool calls compared with Kimi K2 or DeepSeek-V3.1 on the same hardware, which is exactly the point MiniMax was making about agent throughput.¹⁷¹⁸

Independent reviewers running their own tasks reported that M2 was particularly good at coding-run-fix loops, where the model proposes an edit, sees the compiler or test output, and tries again. It was less consistent on long-form planning tasks where the agent has to hold a complex strategy in mind across dozens of steps without external feedback, an area where Claude Sonnet 4.5 and GPT-5 still had an edge as of late 2025.¹²¹¹

References

MiniMax. "MiniMax M2 & Agent: Ingenious in Simplicity." October 27, 2025. https://www.minimax.io/news/minimax-m2 ↩ ↩² ↩³ ↩⁴ ↩⁵
Artificial Analysis. "MiniMax-M2: Intelligence, Performance & Price Analysis." https://artificialanalysis.ai/models/minimax-m2 ↩ ↩² ↩³ ↩⁴ ↩⁵
MarkTechPost. "MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster." October 28, 2025. https://www.marktechpost.com/2025/10/28/minimax-open-sources-minimax-m2-a-mini-model-built-for-max-coding-and-agentic-workflows-at-8-claude-sonnet-price-and-2x-faster/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
MiniMaxAI. "MiniMax-M2 model card." Hugging Face. https://huggingface.co/MiniMaxAI/MiniMax-M2 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹²
MiniMax-AI. "MiniMax-M2 GitHub repository." Accessed via https://github.com/MiniMax-AI/MiniMax-M2 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Caixin Global. "MiniMax Unveils M2 Model to Compete on Speed and Cost." October 28, 2025. https://www.caixinglobal.com/2025-10-28/minimax-unveils-m2-model-to-compete-on-speed-and-cost-102376624.html ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Wikipedia. "MiniMax (company)." https://en.wikipedia.org/wiki/MiniMax_(company) ↩ ↩²
Fortune. "China's MiniMax debuts M1 model it says cost 200x less to train than OpenAI's GPT-4." June 18, 2025. https://fortune.com/2025/06/18/chinas-minimax-m1-ai-model-200x-less-expensive-to-train-than-openai-gpt-4/ ↩ ↩²
InfoQ. "MiniMax Releases M1: a 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks." June 2025. https://www.infoq.com/news/2025/06/minimax-m1/ ↩
Perficient. "Minimax M2: Innovative Reasoning Strategy from Open-Source Model Showing Big Results." November 19, 2025. https://blogs.perficient.com/2025/11/19/minimax-m2-open-source-interleaved-reasoning-model/ ↩
Daily Dose of Data Science. "MiniMax-M2 vs. Kimi-K2 vs. Sonnet 4.5 on Code Generation." https://blog.dailydoseofds.com/p/minimax-m2-vs-kimi-k2-vs-sonnet-45 ↩ ↩² ↩³
Barnacle Goose. "MiniMax M2 Review and Comparison With Open-Weight Rivals." Medium. https://medium.com/@leucopsis/minimax-m2-review-and-comparison-with-open-weight-rivals-60c676ef5346 ↩ ↩² ↩³
Let's Data Science. "MiniMax Revises License After Releasing M2.7 Weights." https://letsdatascience.com/news/minimax-revises-license-after-releasing-m27-weights-04b47c74 ↩
Vercel. "MiniMax M2 by MiniMax on Vercel AI Gateway, Specs, Pricing and API." https://vercel.com/ai-gateway/models/minimax-m2 ↩
Microsoft. "MiniMax-M2: The Open-Source Innovator in Coding and Agentic Workflows Now in Azure AI Foundry." Azure AI Foundry Blog. https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/minimax-m2-the-open-source-innovator-in-coding-and-agentic-workflows-now-in-azur/4466045 ↩
MarkTechPost. "Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Memory Efficient Version of MiniMax-M2 for Long Context Coding Agents." November 15, 2025. https://www.marktechpost.com/2025/11/15/cerebras-releases-minimax-m2-reap-162b-a10b-a-memory-efficient-version-of-minimax-m2-for-long-context-coding-agents/ ↩
VentureBeat. "MiniMax-M2 is the new king of open source LLMs (especially for agentic tool calling)." 2025. https://venturebeat.com/ai/minimax-m2-is-the-new-king-of-open-source-llms-especially-for-agentic-tool ↩ ↩²
DigitalOcean. "MiniMax-M2: MoE Model for Agentic and Coding Capabilities." https://www.digitalocean.com/community/tutorials/minimax-m2-moe-model-agentic-coding ↩ ↩²
Hacker News. "MiniMax M2 release discussion." https://news.ycombinator.com/ ↩
Maniac. "Chinese frontier models compared: GLM-5, MiniMax M2.5 and M2.7, Kimi K2.5, Qwen 3.5, and MiMo-V2-Pro." https://www.maniac.ai/blog/chinese-frontier-models-compared-glm5-minimax-kimi-qwen ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Best AI Models for Reasoning and Math MiniMax M1 MiniMax M2.7 MiniMax M3 MiniMax-Text-01 Yan Junjie

What company makes MiniMax M2?

How is MiniMax M2 built?

Context length and output

Interleaved thinking

Recommended inference settings

How was MiniMax M2 trained?

How does MiniMax M2 perform on benchmarks?

Coding and software engineering

Agentic and tool-use

General intelligence

How much does MiniMax M2 cost?

Hosted API pricing

Self-hosting

How was MiniMax M2 received?

How does MiniMax M2 compare to other open-weight models?

How is MiniMax M2 used inside coding agents?

See also

References

Footnotes

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here