Kimi K2.6
Last reviewed
Jun 2, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,777 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,777 words
Add missing citations, update stale details, or suggest a clearer explanation.
Kimi K2.6 is an open-weight large language model released by Moonshot AI on 20 April 2026. It is the company's flagship model in the Kimi K2 line, a trillion-parameter mixture of experts (MoE) system positioned for agentic coding, long-horizon autonomous execution, and "swarm" task orchestration. At launch it was widely described as the strongest publicly available open-weights model, ranking first among open models on the Artificial Analysis Intelligence Index while trailing the leading proprietary systems by a narrow margin.[1][2][3]
Kimi K2.6 succeeds Kimi K2, the July 2025 base release, and the reasoning-focused Kimi K2 Thinking, and it follows the Kimi K2.5 update. Unlike the earlier text-only K2 models, K2.6 is natively multimodal, accepting image and video input alongside text, and it ships with a configurable reasoning mode inherited from the Thinking line.[1][4]
Kimi K2.6 combines a very large MoE backbone with native vision support and an agent-orchestration layer aimed at production software work. Moonshot frames it as a model "built to put to work," emphasizing end-to-end coding, code-driven interface design, and the ability to run autonomously for extended sessions. The model's headline feature is its Agent Swarm capability, which lets a single request fan out across as many as 300 parallel sub-agents executing up to 4,000 coordinated steps.[1][5][6]
The model is distributed under a permissive license with weights published on Hugging Face, and it is served through Moonshot's own API as well as third-party platforms including Cloudflare Workers AI and Microsoft Foundry. As with prior releases in the series, Moonshot priced inference well below comparable proprietary frontier models, a positioning that featured prominently in launch coverage.[2][3][6][7]
Moonshot AI is a Beijing-based artificial intelligence company that develops the Kimi family of models and the Kimi assistant. Its K2 series established Moonshot as a leading developer of open-weight frontier models. The original Kimi K2 shipped in mid-2025 as a 1-trillion-parameter MoE model with 32 billion active parameters, released in base and instruction-tuned variants. Kimi K2 Thinking extended that backbone with an explicit chain-of-thought reasoning mode and interleaved tool calling, and it introduced native INT4 quantization for the series. The intermediate Kimi K2.5 release advanced agentic and coding performance and brought an earlier version of the agent-swarm system.[1][4]
Kimi K2.6 keeps the same overall MoE configuration of roughly 1T total and 32B active parameters as its predecessors, while adding native multimodality, a substantially larger context window than the original K2, and a scaled-up swarm system.[1][2][4]
Moonshot released Kimi K2.6 on 20 April 2026, publishing the open weights on Hugging Face and making the model available through its API at the same time. The release date is corroborated by Cloudflare, which listed the model on Workers AI on the same day, and by independent launch coverage. Artificial Analysis dated its own analysis to 21 April 2026.[2][3][8]
The launch was accompanied by integrations across multiple inference providers and developer tools, and the model quickly accumulated millions of downloads on Hugging Face in its first month.[1][6]
Kimi K2.6 is a sparse mixture-of-experts transformer. Moonshot's published model card discloses the following configuration.[1]
| Component | Value |
|---|---|
| Total parameters | ~1 trillion |
| Active parameters per token | ~32 billion |
| Number of layers | 61 (including 1 dense layer) |
| Number of experts | 384 |
| Selected experts per token | 8 |
| Shared experts | 1 |
| Attention heads | 64 |
| Attention mechanism | Multi-head latent attention (MLA) |
| Attention hidden dimension | 7,168 |
| MoE hidden dimension (per expert) | 2,048 |
| Activation function | SwiGLU |
| Vision encoder | MoonViT (~400M parameters) |
| Vocabulary size | ~160,000 |
| Context length | 256K (262,144 tokens) |
| Native quantization | INT4 |
The model accepts text, image, and video inputs in a single unified architecture rather than bolting a vision module onto a text core, with the MoonViT encoder handling visual tokens. Native INT4 quantization, carried over from Kimi K2 Thinking, is intended to reduce memory footprint and improve inference throughput without a separate post-training compression step. Moonshot recommends serving the model with inference engines such as vLLM, SGLang, and KTransformers.[1]
Like K2 Thinking, K2.6 supports an interleaved "thinking" mode and can preserve its reasoning content across multi-turn interactions, alongside an "instant" mode for lower-latency responses. Moonshot's recommended sampling settings differ by mode, with a higher temperature suggested for thinking mode and a lower one for instant mode.[1]
Moonshot positions Kimi K2.6 primarily as an agentic and coding model. Its documented strengths include:
Launch coverage also reported extended autonomous operation, with some sources citing autonomous runs lasting well beyond ten hours, though such figures come largely from Moonshot's own materials and early integrations rather than independent measurement.[5]
The scores below are drawn from Moonshot's published model card unless otherwise noted. As is typical for a launch, several figures are self-reported, and some agentic benchmarks are sensitive to the evaluation harness used; independent third-party replication was still in progress at release.[1][9]
| Benchmark | Kimi K2.6 | Notes / source |
|---|---|---|
| Artificial Analysis Intelligence Index (v4.0) | 54 | #1 open-weights; #4 overall, behind models scoring 57 [2] |
| AIME 2026 | 96.4 | Math reasoning [1] |
| GPQA Diamond | 90.5 | Graduate-level science [1] |
| LiveCodeBench (v6) | 89.6 | Competitive coding [1] |
| SWE-bench Verified | 80.2 | Software engineering [1] |
| SWE-Bench Pro | 58.6 | Real-world agentic SWE [1] |
| Terminal-Bench 2.0 | 66.7 | Terminal/agent coding [1] |
| BrowseComp | 86.3 (Agent Swarm) | Web browsing/research [1] |
| DeepSearchQA (F1) | 92.5 | Deep research [1] |
| MMMU-Pro | 79.4 | Multimodal understanding [1] |
| MathVision (with Python) | 93.2 | Visual math [1] |
On SWE-Bench Pro, a benchmark designed to track real agentic production work, multiple sources reported K2.6 ahead of leading proprietary models at the time of launch.[9][10]
| Model | SWE-Bench Pro |
|---|---|
| Kimi K2.6 | 58.6 |
| GPT-5.4 | 57.7 |
| Claude Opus 4.6 | 53.4 |
On pure reasoning benchmarks the picture was closer, with some proprietary models still ahead. For example, on AIME 2026 GPT-5.4 was reported at 99.2 versus K2.6's 96.4, and on GPQA Diamond at 92.8 versus 90.5.[9] Artificial Analysis's own composite placed K2.6 at index 54, the top open-weights score but a few points behind the strongest proprietary creators at 57.[2]
Kimi K2.6 is released under a Modified MIT License, the same permissive license Moonshot has used across the K2 series. The weights are openly downloadable, and the model is offered both through Moonshot's first-party API and via several third-party inference platforms.[1][7]
| Aspect | Detail |
|---|---|
| License | Modified MIT License [1] |
| Weights | moonshotai/Kimi-K2.6 on Hugging Face [1] |
| Official API | platform.moonshot.ai / platform.kimi.ai (OpenAI- and Anthropic-compatible) [1][11] |
| Chat interface | kimi.com [1] |
| Third-party hosting | Cloudflare Workers AI, Microsoft Foundry, and others [3][7] |
| API price, input (cache miss) | $0.95 per 1M tokens [11] |
| API price, input (cache hit) | $0.16 per 1M tokens [11] |
| API price, output | $4.00 per 1M tokens [11] |
The official input and output prices place K2.6 well below comparable proprietary frontier models, a point emphasized across launch coverage; reported prices on resellers and aggregators vary, and Artificial Analysis listed a blended input figure of about $0.95 per 1M tokens with output at $4.00.[2][6][11]
Early reception focused on Kimi K2.6's standing as the leading open-weights model and on its aggressive price-to-performance ratio for coding and agentic work. Artificial Analysis described it as "the new leading open weights model," noting it outperformed other publicly available models while landing just behind the top proprietary systems on its Intelligence Index.[2]
Commentators highlighted the Agent Swarm system and the model's coding results, particularly the SWE-Bench Pro figure, as evidence that open-weights models had reached parity with or surpassed proprietary frontier models on some agentic software tasks. Several writers framed the release as a notable moment for open models in the coding-agent space.[5][6][10]
Independent observers cautioned that many of the launch benchmark numbers were self-reported and that full third-party validation would take time. Agentic benchmarks such as Terminal-Bench in particular are harness-dependent, and reported scores for competing models varied substantially depending on the evaluation setup, complicating direct comparisons.[9]
Artificial Analysis also noted that, despite a markedly lower hallucination rate than Kimi K2.5 on its knowledge benchmark, K2.6 still produced incorrect answers on a substantial fraction of factual questions, and its overall intelligence score remained a few points below the leading proprietary models. As with any very large MoE model, practical deployment of the full-precision or even INT4 weights demands significant hardware, which constrains fully local use to well-resourced setups.[2]