MiniMax M2.7
Last reviewed
Jun 3, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,366 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
7 citations
Review status
Source-backed
Revision
v1 · 1,366 words
Add missing citations, update stale details, or suggest a clearer explanation.
MiniMax M2.7 is a large language model released by the Chinese AI company MiniMax on March 18, 2026. It is an agentic coding and reasoning model built on a sparse mixture of experts architecture, with about 230 billion total parameters and roughly 10 billion active per token. M2.7 is best known for one claim in particular: MiniMax described it as the company's first model to take part in its own development, with an internal version autonomously handling a meaningful share of the reinforcement learning research workflow that produced it. The model sits directly before MiniMax M3 in the company's lineup and was its most capable release until M3 arrived in June 2026.
M2.7 is the last model in MiniMax's M2 generation. That generation began with MiniMax M2 in late October 2025, a model positioned for what the company called the "agentic era," and it moved quickly through several point releases. MiniMax M2.5 shipped on February 12, 2026, followed by M2.7 about five weeks later. All of these share the same underlying design: a mixture-of-experts transformer with 230 billion total parameters and 10 billion active. The point releases were refinements of that architecture rather than ground-up rebuilds.
M2.7 is the bridge to the next generation. MiniMax M3, released on June 1, 2026, was a larger architectural change, with a new attention scheme, a much longer context window, native multimodal input, and computer use. So M2.7 marks the high point of the M2 line, and the model the company says it leaned on to help build what came after. Reading M2.7 alongside M3 gives a sense of how fast MiniMax was iterating in the first half of 2026.
M2.7 uses a sparse mixture-of-experts design. The full network holds about 230 billion parameters, but only a fraction run on any given token. NVIDIA's technical documentation for the model lists 256 local experts with 8 activated per token, which works out to roughly 10 billion active parameters and a 4.3% activation rate. The idea is familiar from other MoE systems: keep the capacity of a very large model while paying inference costs closer to a 10-billion-parameter dense model. A top-k router decides which experts handle each token.
The model is text-only. It does not take images or other modalities, which is one of the clearer dividing lines between M2.7 and the multimodal M3 that followed. The context window is about 205,000 tokens. NVIDIA cites a 200K input length, OpenRouter lists 204,800 tokens for input and output combined, and the maximum output is 131,072 tokens.
| Specification | Detail |
|---|---|
| Developer | MiniMax |
| Release date | March 18, 2026 |
| Architecture | Sparse mixture of experts (text only) |
| Total parameters | About 230 billion |
| Active parameters | About 10 billion per token |
| Experts | 256 local experts, 8 activated per token |
| Context window | About 205,000 tokens (131,072 max output) |
| Predecessor | MiniMax M2.5 |
| Successor | MiniMax M3 |
The headline feature, and the thing worth treating carefully, is what MiniMax called "self-evolution." The company titled its announcement "Early Echoes of Self-Evolution," and the framing is deliberately modest. M2.7 did not train itself end to end. What MiniMax described is narrower: an internal version of the model acted as an autonomous research assistant inside the team's own reinforcement learning pipeline, and handled part of the iterative engineering work that usually falls to human researchers.
MiniMax reported two concrete examples. In the first, an internal M2.7 ran an iterative loop over its own training scaffold: analyze failed runs, plan a change, modify the scaffold code, run an evaluation, compare results, then decide to keep or revert the change. The company says this loop ran autonomously for more than 100 rounds and produced about a 30% improvement on internal evaluation sets, including by searching for better sampling settings such as temperature and frequency and presence penalties. In the second, the model competed on machine-learning engineering tasks, iterating over multiple 24-hour trials.
To run that loop, the team built a deliberately simple harness with three parts: a short-term memory written to a markdown file after each round, a self-feedback step where the model criticizes its own latest results, and a self-optimization step where it proposes what to try next. MiniMax says this "Agentic Researcher" setup let M2.7 cover an estimated 30% to 50% of the reinforcement learning workflow inside its team, with human researchers stepping in for the important decisions. VentureBeat, which covered the launch, described the model in those terms: self-evolving, and able to perform 30% to 50% of the RL research workflow.
A few caveats are worth keeping in mind. These figures come from MiniMax's own internal evaluations and have not been independently reproduced. "Self-evolution" here means an agent automating chunks of an existing engineering pipeline, not a model autonomously rewriting its own weights or goals. And the 30% to 50% range is an estimate of workflow coverage from the company's RL team, not a precisely measured benchmark. The capability is real and notable, but the marketing word does more lifting than the underlying mechanism does.
MiniMax positioned M2.7 mainly as a coding and agentic model, and its reported scores cluster around software engineering and tool-use tasks. The numbers below come from MiniMax's announcement and were repeated across coverage of the release.
| Benchmark | Score | What it measures |
|---|---|---|
| SWE-Pro | 56.22% | Real-world software engineering tasks |
| Terminal Bench 2 | 57.0% | Command-line and terminal tasks |
| SWE Multilingual | 76.5 | Software fixes across multiple languages |
| Multi SWE Bench | 52.7 | Multi-repository software engineering |
| VIBE-Pro | 55.6% | Coding and product tasks |
| NL2Repo | 39.8% | Building a repository from a natural-language spec |
| MLE Bench Lite | 66.6% medal rate | 22 machine-learning engineering competitions |
| GDPval-AA | 1495 ELO | Economically valuable knowledge work |
| Toolathon | 46.3% | Tool-use accuracy |
| MM Claw | 62.7% | End-to-end agent task accuracy |
On the MLE Bench Lite set of 22 competitions, MiniMax reported a best run of 9 gold, 5 silver, and 1 bronze medal, for an average 66.6% medal rate. The company framed the SWE-Pro result as approaching the level of much larger frontier systems while running at a fraction of their cost. As with the self-development numbers, these are vendor-reported scores; readers comparing models should check independent leaderboards where available.
At launch on March 18, 2026, M2.7 was offered through the MiniMax Agent product and the MiniMax API platform, including a dedicated coding plan, and through OpenRouter. NVIDIA also made it available as an optimized NIM microservice in its catalog, with the company reporting throughput gains on Blackwell Ultra hardware over the following month.
Open weights followed in April 2026, when MiniMax published the model to Hugging Face under the repository MiniMaxAI/MiniMax-M2.7. The license is a modified-MIT variant that restricts commercial use without written authorization from MiniMax, which makes M2.7 open weights rather than fully open source in the usual sense. Quantized community builds, including NVFP4 and GGUF formats, appeared shortly after for local use. On hosted APIs the model was priced cheaply for its capability, around $0.30 per million input tokens and $1.20 per million output tokens, with a discounted rate for cache hits.