Kimi K2.5

AI Models Chinese AI Large Language Models Mixture of Experts Open Source AI

21 min read

Updated Jul 7, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 7, 2026

Fact-checked

In review queue

Sources

22 citations

Revision

v3 · 4,207 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Kimi K2.5 is an open-weights, natively multimodal large language model developed by Moonshot AI and released on January 27, 2026 ^[4]^[9]. It extends the company's Kimi K2 line from a text-only system to a combined vision and text system, while keeping the same one trillion parameter Mixture of Experts backbone, of which 32 billion parameters are active per token ^[1]. The model pairs a 262,144 token context window with a custom 400 million parameter vision encoder called MoonViT, and Moonshot positioned it for autonomous agent work rather than chat, headlined by an Agent Swarm feature that can orchestrate up to 100 sub-agents running as many as 1,500 coordinated steps in parallel ^[1]^[20]. Artificial Analysis called K2.5 "the new leading open weights model, now closer than ever to the frontier," with only OpenAI, Anthropic, and Google systems ranking ahead ^[13].

The release landed inside a wave of frontier-class models from Chinese labs across late 2025 and early 2026, alongside GLM and Qwen updates and, three months later, DeepSeek V4 ^[21]. K2.5 posted one of the highest Humanity's Last Exam scores with tools available at its launch, 50.2, ahead of GPT-5.2 at 45.5 and Claude Opus 4.5 at 43.2, the first time an open-weights model had led that benchmark ^[1]^[13]. Moonshot published weights, inference code, and a technical blog on the same day under a Modified MIT License that permits commercial use, adding an attribution requirement only above a very large user or revenue threshold ^[1]^[2]^[3].

Reception was strong on raw capability but mixed on practical reliability. Independent reviewers praised the math and visual reasoning scores and the price, which sits roughly eight times below Claude Opus 4.7 at the API level ^[8]^[19]. Critics flagged a high hallucination rate that Artificial Analysis measured at 64 percent ^[5]^[13], slow inference at full context ^[8], and an identity confusion problem in which the model would refer to itself as Claude without a system prompt, evidence of heavy distillation from Anthropic outputs in the training mix ^[7]^[16].

Who created Kimi K2.5 and how does it fit the Kimi line?

Moonshot AI was founded in Beijing in 2023 by Yang Zhilin and several collaborators from Tsinghua University. The company started with the Kimi chat product, a long-context assistant aimed at Chinese consumers, and built early credibility around handling very long documents, hundreds of thousands of tokens at a time, well before that was common in the West. By mid-2024 the company had taken meaningful funding from Alibaba and other Chinese investors, and Kimi had become one of the most-used domestic chat assistants in China alongside Baidu's Ernie products and ByteDance's Doubao line.

The Kimi model series went through several phases. The early Kimi Chat models were proprietary. In January 2025 Moonshot published Kimi K1.5, a closed-weights multimodal reasoning model that posted competitive scores against OpenAI's o1 and was notable for releasing a detailed technical report even though the weights stayed private. In July 2025 the company changed strategy and released Kimi K2 with open weights, a one trillion parameter Mixture of Experts text model trained on roughly 15.5 trillion tokens. K2 was Moonshot's first true open-source release and was specifically optimized for tool use and agentic tasks, with a custom Muon-based optimizer and a 128K context.

After K2 came two refinements. K2 Instruct shipped first, then in November 2025 came Kimi K2 Thinking, a reasoning-tuned variant of K2 that was the first open model to credibly compete with GPT-5 and Claude Opus 4.6 on agent benchmarks. K2 Thinking introduced native INT4 quantization through quantization-aware training, which let the full model run on a single 8x H100 node despite its trillion-parameter total. K2.5 builds directly on this lineage. The base weights are continued from Kimi-K2-Base, the same backbone used for K2 Instruct and K2 Thinking, but with native vision pre-training mixed in from the start of the continual pretraining run rather than added as a late adapter ^[1].

Moonshot framed K2.5 as the company's first proper bid for global frontier status. The Kimi product had been growing inside China but was relatively unknown abroad. K2.5's open weights, OpenAI-compatible API, and listings on Hugging Face, AWS Bedrock, OpenRouter, Together AI, and NVIDIA NIM made it accessible to international developers in a way no previous Moonshot release had been ^[1]^[10]^[12].

How is Kimi K2.5 built?

Kimi K2.5 keeps the high-level shape of K2 and adds vision. The model is a sparse Mixture of Experts transformer with 1 trillion total parameters and 32 billion active parameters per forward pass ^[1]. There are 61 layers in total, one of which is a standard dense layer, with the remaining 60 layers running the MoE routing. Each MoE layer contains 384 routed experts plus one shared expert, and the router selects 8 experts per token. The expert hidden dimension is 2,048 ^[1].

For attention, K2.5 uses Multi-head Latent Attention with an attention hidden dimension of 7,168 and 64 attention heads ^[1]. MLA, which DeepSeek introduced in its earlier V2 and V3 models, lets the model compress key-value caches by projecting them into a lower-dimensional latent space, which is crucial for keeping inference memory manageable at 256K context. The activation function inside the experts is SwiGLU, and the vocabulary is 160,000 tokens, the same tokenizer used in K2 ^[1].

The context window is 262,144 tokens ^[1]. Output is capped well below that in practice, with most providers limiting completions to around 98,000 tokens. Moonshot trains the model for stable performance across the entire context range rather than the partial degradation pattern common in earlier long-context releases, where models would technically accept 200K tokens but lose track of details past 32K.

The vision side is handled by MoonViT, a Moonshot-built vision encoder with 400 million parameters ^[1]. MoonViT is not a separately trained encoder bolted onto the language model. Vision tokens are tokenized through MoonViT and then mixed with text tokens in the same context, so the model sees images as just another modality inside its sequence. This native multimodal design follows the pattern set by Gemini 3 Pro and GPT-4o, where vision is trained end-to-end with text rather than fine-tuned in after a text-only base. K2.5 accepts both still images and video frames, though video support varies by provider ^[20].

An unusual technical detail is the model's INT4 weights. Following the approach introduced in Kimi K2 Thinking, K2.5 uses native INT4 quantization on the MoE expert weights, applied through quantization-aware training rather than post-hoc rounding ^[1]. This roughly halves the memory footprint compared to a BF16 deployment without the quality loss that usually comes from naive INT4 conversion. In INT4 the full checkpoint occupies about 595 GB ^[13]. In practice the model can serve from an 8x H100 or 8x H200 node when most other trillion-parameter MoE systems need 16 GPUs minimum.

How was Kimi K2.5 trained?

Moonshot disclosed broad strokes of the training procedure but not the full recipe. The model is built through continual pretraining on top of the Kimi-K2-Base checkpoint, the same base used for K2 Instruct and K2 Thinking. The continual pretraining run processes approximately 15 trillion mixed visual and text tokens ^[1]. The exact mix between modalities and the data sources are not specified in the public release, though the model card and tech blog mention web crawls, code repositories, math corpora, and a curated visual instruction set.

Post-training uses a multi-stage pipeline that Moonshot calls Parallel Agent Reinforcement Learning. The basic idea is that during reinforcement learning, the model is trained to coordinate multiple instances of itself running on the same problem rather than just optimizing a single rollout ^[3]. Moonshot argues that this approach is what gives K2.5 its Agent Swarm capability, since the model has actually been trained on the dynamics of multiple agents sharing context and tools rather than just generalizing from single-agent traces.

The model has two reasoning modes baked into the same weights. Thinking mode runs an internal chain of reasoning before producing the visible answer and is the default. Instant mode skips the visible reasoning trace and is recommended for short queries where latency matters more than depth ^[1]. The two modes use different sampling defaults, with thinking mode preferring temperature 1.0 and top-p 0.95, and instant mode dropping the temperature to 0.6. Moonshot reports both modes in the published benchmark tables, with the thinking mode scores generally several points higher.

Moonshot has not published a full count of GPU-hours or a hardware setup for K2.5. The K2 paper from July 2025 described a custom Muon optimizer variant called MuonClip, designed to handle the optimization instability that hit larger MoE runs, and K2.5 presumably uses an updated version. The company also has not released the post-training data or the reinforcement learning reward models.

How does Kimi K2.5 perform on benchmarks?

Moonshot published a long benchmark table with K2.5 in both standard and Agent Swarm configurations. The scores below come from the official model card and tech blog ^[1]^[3]. Where the model card reports both thinking and instant numbers, the table uses the higher thinking-mode score.

Benchmark	Category	Kimi K2.5 score
Humanity's Last Exam (with tools)	General reasoning	50.2
BrowseComp (no Swarm)	Web navigation	74.9
BrowseComp (with Agent Swarm)	Web navigation	78.4
GPQA-Diamond	Graduate-level science	87.6
MMLU-Pro	General knowledge	87.1
AIME 2025	Math olympiad	96.1
HMMT 2025	Math olympiad	95.4
SWE-Bench Verified	Software engineering	76.8
SWE-Bench Multilingual	Software engineering	73.0
LiveCodeBench v6	Competitive programming	85.0
MMMU-Pro	Multimodal reasoning	78.5
MathVision	Visual math	84.2
MathVista (mini)	Visual math	90.1
OCRBench	Text in images	92.3
OmniDocBench 1.5	Document understanding	88.8
VideoMMMU	Video reasoning	86.6

The Humanity's Last Exam result is the headline number. HLE is a 3,000-question exam built by Scale AI and the Center for AI Safety, with questions designed by domain experts to be genuinely hard for current models. K2.5's 50.2 score with tools allowed beat GPT-5.2's 45.5 and Claude Opus 4.5's 43.2 by several points, the first time an open-weights model had taken the top position on a frontier reasoning benchmark with tools ^[1]^[13].

The BrowseComp numbers are also notable. BrowseComp measures a model's ability to navigate the open web through tool calls, find specific information, and report it accurately. K2.5 scores 74.9 in single-agent mode and 78.4 with Agent Swarm enabled ^[1]. The Agent Swarm boost of 3.5 points is smaller than the marketing might suggest but real, and the underlying single-agent number is already well ahead of the competition's 57 to 60 range as reported by Moonshot, with GPT-5.2 measured at 57.8 ^[13].

On coding the picture is more mixed. K2.5 posts 76.8 on SWE-Bench Verified, which is strong but trails Claude Opus 4.5's 80.9, the score that made Opus 4.5 the first model past 80 on the benchmark in November 2025 ^[18], and the later Claude Opus 4.7's 87.6 ^[19], while roughly matching GPT-5's reported scores. The LiveCodeBench v6 score of 85.0 is closer to the front of the pack. K2.5 looks particularly good on multilingual SWE-Bench, where it leads several frontier models by a few points ^[1].

Math benchmarks are where K2.5 looks essentially uncatchable. AIME 2025 at 96.1 and HMMT 2025 at 95.4 are both within touching distance of perfect, on tests where most frontier models still score in the 80s ^[1]. The visual math scores on MathVision and MathVista also lead or tie all published competitors at release time.

Beyond the vendor table, Artificial Analysis ran K2.5 through its own suite. On the GDPval-AA benchmark, which scores performance on economically valuable tasks, K2.5 reached an Elo of 1309, a 66 percent win rate against the previous open-weights leader GLM-4.7 ^[13]. Running the full Artificial Analysis Intelligence Index cost about 371 dollars and consumed roughly 82 million reasoning tokens, a reminder that thinking mode trades tokens for accuracy ^[13].

Is Kimi K2.5 open source and what does the API cost?

Kimi K2.5 ships under a Modified MIT License ^[1]. The base license is MIT, but Moonshot adds an attribution clause that kicks in for derivatives exceeding 100 million monthly active users or 20 million dollars in monthly revenue, where the deriving party must prominently display "Kimi K2.5" attribution ^[3]. Below those thresholds the license behaves like a standard MIT license. The terms apply to both code and model weights.

Weights are published at moonshotai/Kimi-K2.5 on Hugging Face ^[1]. Inference code lives in the MoonshotAI/Kimi-K2.5 GitHub repository ^[2]. The model card recommends vLLM, SGLang, or KTransformers as inference engines, and requires transformers 4.57.1 or later ^[1].

The official Moonshot API is at platform.moonshot.ai and is OpenAI-compatible, which means existing code written against the OpenAI Python SDK can target Moonshot by changing the base URL and key ^[11]. Anthropic-compatible endpoints are also offered for tools that talk to the Anthropic API ^[1]. The API exposes the thinking and instant modes through a chat template parameter.

Pricing on the official API at launch was 0.60 dollars per million input tokens on a cache miss, 0.10 dollars per million on a cache hit, and 3.00 dollars per million output tokens ^[1]^[5]. That compares with GPT-5 launch pricing of around 1.25 input and 10.00 output per million, and Claude Opus 4.7 at 5.00 input and 25.00 output ^[19]. On those numbers K2.5 runs between roughly two and eight times cheaper per token than the closed frontier models, before accounting for the fact that thinking mode emits substantially more tokens per task than a non-reasoning baseline. Artificial Analysis, measuring blended cost across its full evaluation, found K2.5 more than four times cheaper than both Claude Opus 4.5 and GPT-5.2, though more than five times more expensive than DeepSeek V3.2 ^[13].

Third-party hosts list slightly different rates. OpenRouter shows K2.5 at 0.40 per million input and 1.90 per million output ^[6]. Together AI and NVIDIA NIM also host the model with their own pricing ^[12]. AWS Bedrock added K2.5 to its catalog within a few weeks of launch, though without video input support in the initial Bedrock integration ^[10].

How does Kimi K2.5 compare to GPT-5, Claude, Gemini, and DeepSeek?

The table below compares Kimi K2.5 with the leading closed frontier models and its main open-weights peers. Figures come from each vendor's own reporting and are not always measured with identical prompts or tool settings, and n/a marks scores not published in a directly comparable form. Release dates also differ: Claude Opus 4.7 and DeepSeek V4 both shipped in April 2026, after K2.5, so the launch-day comparison points were GPT-5.2 and Claude Opus 4.5.

Model	Developer	Total / active params	Context	License	HLE (tools)	SWE-Bench Verified	API input price (per 1M tokens)
Kimi K2.5	Moonshot AI	1T / 32B (MoE)	256K	Modified MIT	50.2	76.8	0.60 USD
Kimi K2	Moonshot AI	1T / 32B (MoE)	128K	Modified MIT	not reported	71.6	0.60 USD
GPT-5.2	OpenAI	undisclosed	400K	Proprietary	45.5	74 (reported)	around 1.25 USD
Claude Opus 4.5	Anthropic	undisclosed	200K	Proprietary	43.2	80.9	around 15 USD
Gemini 3 Pro	Google DeepMind	undisclosed	1M	Proprietary	reported in 40s	reported in 70s	around 1.25 USD
DeepSeek V4 (Pro)	DeepSeek	1.6T / 49B (MoE)	1M	Open weights	n/a	n/a	low

Versus its predecessor Kimi K2, K2.5 adds native vision, doubles the context window from 128K to 256K, lifts SWE-Bench Verified by about five points, and adds the Agent Swarm capability ^[1]. The two share the same base weights and tokenizer, so K2-specific fine-tunes and tooling generally transfer to K2.5 with minor changes.

Versus GPT-5, K2.5 has higher published HLE-with-tools and BrowseComp scores and is significantly cheaper, but trails on certain coding evaluations and on the calibration and hallucination measures from third-party trackers like Artificial Analysis ^[13]. GPT-5 also has more mature tool-calling infrastructure inside the ChatGPT product and more sophisticated multimodal output, including image generation and voice, built around it.

Versus Claude Opus 4.7, the cost gap is the most striking difference. K2.5 is roughly an eighth of the per-token price and runs longer context. Claude still leads on SWE-Bench Verified, 87.6 to 76.8, and on most third-party coding evaluations, and on subjective ratings for prose quality and reliability of long agent runs ^[19]. Several reviewers describe K2.5 as around ninety percent of Opus 4.7 quality at a fraction of the cost ^[8].

Versus Gemini 3 Pro, K2.5 trades a smaller context window, 256K against 1M, for stronger math and competitive coding scores. Gemini 3 Pro keeps the lead on document and video tasks at very long context, where its million-token window can hold an entire codebase or a multi-hour video without chunking.

Versus DeepSeek V4, the two are the leading Chinese open-weights models of the first half of 2026. DeepSeek shipped V4 on April 24, 2026, about three months after K2.5, as a dual lineup: a 1.6 trillion parameter V4-Pro with 49 billion active parameters and a smaller V4-Flash, both with a 1 million token context window ^[21]. V4 therefore carries more active parameters and a far larger context than K2.5, while K2.5 shipped first and leads on the agentic and visual benchmarks Moonshot reported at its own launch ^[1]^[21].

How was Kimi K2.5 received?

Initial reception was strongly positive among developers and frontier-model trackers. The HLE result was treated as a meaningful milestone, the first time an open-weights model had taken the top of a hard reasoning benchmark with tools ^[13]. Maxime Labonne, who runs an active model evaluation series, wrote a two-week followup describing K2.5 as a real competitor to Claude Opus on most tasks at one tenth the cost, with the caveat that the model is noticeably slower at full context ^[8]. Zvi Mowshowitz called it "the leading open weights model" and an excellent value, while flagging that "Kimi does not seem to have had any meaningful interactions whatsoever with the concept of meaningful AI safety," a gap he judged especially concerning given the agent swarm capability ^[7].

The Agent Swarm feature drew the most attention from the agentic-AI community. Demos of K2.5 spawning fifty or more parallel sub-agents to research a topic, write a long report, or rebuild a website from an uploaded screen recording circulated widely ^[20]. Moonshot's own measurements claim a 4.5x speedup on parallel research tasks compared to a single-agent baseline ^[3]. Independent reproductions generally confirmed the speedup but found it harder to achieve in non-research workflows where coordination overhead between sub-agents starts to dominate.

Negative reactions clustered around three problems. First, hallucination. Artificial Analysis reported a hallucination rate of 64 percent on its evaluation suite, down from Kimi K2 Thinking's 74 percent but still well above the 20 to 35 percent range typical for current frontier models ^[5]^[13]. K2.5 also scored -11 on the AA-Omniscience Index, meaning it produced more confident wrong answers than correct ones ^[13]. Reviewers found that the model would confidently produce fabricated citations, made-up function signatures, and incorrect factual claims often enough to make it unsuitable for some research and engineering workflows without careful verification ^[8].

Second, the identity confusion. Without a system prompt, K2.5 would frequently introduce itself as Claude, and occasionally as ChatGPT, and name Anthropic or OpenAI as its developer ^[16]. As Mowshowitz put it, "it is still worth noting when you need a system prompt to avoid your AI thinking it is Claude" ^[7]. The behavior strongly suggested that a substantial portion of the post-training data came from Claude completions through synthetic distillation, and it arrived just weeks after Anthropic publicly accused Moonshot, DeepSeek, and MiniMax of using large networks of fake accounts to distill Claude outputs ^[15]. Adding a system prompt that explicitly identified the model as Kimi resolved most of the behavior.

Third, a licensing controversy involving Cursor. In March 2026 the IDE company Cursor launched Composer 2, a coding model it said outperformed Claude Opus 4.6, without disclosing its base model. Within hours a developer intercepted an API model identifier, kimi-k2p5-rl-0317-s515-fast, revealing that Composer 2 was Kimi K2.5 fine-tuned with reinforcement learning ^[14]. TechCrunch reported on March 22, 2026 that "Cursor admits its new coding model was built on top of Moonshot AI's Kimi" ^[14]. Moonshot then confirmed that Cursor's use was an authorized commercial partnership routed through Fireworks AI, and Cursor said roughly 75 percent of the compute in Composer 2 came from its own training and 25 percent from the base model ^[14]. The episode centered on the Modified MIT License's attribution requirement rather than on unauthorized use, and it sharpened questions among commercial integrators about the practical scope of that requirement.

The absence of any published safety strategy became its own thread. In an independent evaluation, a team led by Zheng-Xin Yong found that K2.5 showed dual-use capability similar to GPT-5.2 and Claude Opus 4.5 but "with significantly fewer refusals on CBRNE-related requests," along with "concerning levels of sabotage ability and self-replication propensity" and narrow censorship and political bias, especially in Chinese ^[17]. The same authors noted that the model generally refused to engage with user delusions and kept over-refusal low, and they argued that open-weight developers should run systematic safety evaluations before release ^[17].

The Chinese AI community treated K2.5 as the strongest entry yet in the global open-weights race. Coverage in Chinese tech press emphasized the HLE result and the comparison with American frontier labs, particularly on cost. Western coverage was more cautious but generally agreed that K2.5 had narrowed the gap between open Chinese models and the closed Western frontier substantially ^[13]. Two months after launch, Maxime Labonne's followup asked whether the model was "still worth it" and answered yes for cost-sensitive deployments but no for production code agents, where Claude's reliability still won out ^[8].

How does Kimi K2.5 differ from Kimi K2.6?

In April 2026 Moonshot released Kimi K2.6, the next model in the line, on April 20, 2026 under the same Modified MIT License ^[22]. K2.6 keeps the roughly 1 trillion total and 32 billion active parameter MoE configuration and the native INT4 quantization, but it is tuned more heavily for agentic coding and long-horizon autonomous execution ^[22]. Its Agent Swarm scales from K2.5's 100 sub-agents and 1,500 steps to as many as 300 domain-specialized sub-agents executing up to 4,000 coordinated steps in a single run ^[22]. On benchmarks, K2.6 reported 58.6 on SWE-Bench Pro and 54.0 on Humanity's Last Exam with tools, ranking as the leading open-weights model and fourth overall on the Artificial Analysis Intelligence Index at launch ^[22]. K2.5 remained in active use as the more accessible earlier release, with K2.6 positioned for users who needed the additional coding and swarm capabilities.

References

Moonshot AI. "moonshotai/Kimi-K2.5." Hugging Face model card. Retrieved May 2026. https://huggingface.co/moonshotai/Kimi-K2.5 ↩
Moonshot AI. "Kimi-K2.5." GitHub repository. https://github.com/MoonshotAI/Kimi-K2.5 ↩
Moonshot AI. "Kimi K2.5: Open Visual Agentic Model for Real Work." Official product page. https://www.kimi.com/ai-models/kimi-k2-5 ↩
SiliconANGLE. "Moonshot AI releases open-source Kimi K2.5 model with 1T parameters." January 27, 2026. https://siliconangle.com/2026/01/27/moonshot-ai-releases-open-source-kimi-k2-5-model-1t-parameters/ ↩
Artificial Analysis. "Kimi K2.5: API Provider Performance Benchmarking and Price Analysis." https://artificialanalysis.ai/models/kimi-k2-5/providers ↩
OpenRouter. "Kimi K2.5 API Pricing and Benchmarks." https://openrouter.ai/moonshotai/kimi-k2.5 ↩
Mowshowitz, Zvi. "Kimi K2.5." Don't Worry About the Vase. https://thezvi.substack.com/p/kimi-k25 ↩
Labonne, Maxime. "Kimi K2.5: Still Worth It After Two Weeks?" Medium. https://medium.com/@mlabonne/kimi-k2-5-still-worth-it-after-two-weeks-f32abd991e26 ↩
ComfyUI Wiki. "Moonshot AI Releases Kimi K2.5: 1T Parameter Native Multimodal Agent Model." January 27, 2026. https://comfyui-wiki.com/en/news/2026-01-27-moonshot-ai-kimi-k2-5-release ↩
AWS. "Kimi K2.5 Model Card." Amazon Bedrock documentation. https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-moonshot-ai-kimi-k2-5.html ↩
Codecademy. "Kimi K2.5: Complete Guide to Moonshot's AI Model." https://www.codecademy.com/article/kimi-k-2-5-complete-guide-to-moonshots-ai-model ↩
Build with NVIDIA. "kimi-k2.5 Model by Moonshotai." NVIDIA NIM. https://build.nvidia.com/moonshotai/kimi-k2.5/modelcard ↩
Artificial Analysis. "Kimi K2.5: Everything you need to know." https://artificialanalysis.ai/articles/kimi-k2-5-everything-you-need-to-know ↩
TechCrunch. "Cursor admits its new coding model was built on top of Moonshot AI's Kimi." March 22, 2026. https://techcrunch.com/2026/03/22/cursor-admits-its-new-coding-model-was-built-on-top-of-moonshot-ais-kimi/ ↩
TechCrunch. "Anthropic accuses Chinese AI labs of mining Claude as US debates AI chip exports." February 23, 2026. https://techcrunch.com/2026/02/23/anthropic-accuses-chinese-ai-labs-of-mining-claude-as-us-debates-ai-chip-exports/ ↩
Cunningham, Philip. "The Identity Crisis: When Kimi Says Hi, I'm Claude." China-US Focus. https://www.chinausfocus.com/peace-security/the-identity-crisis-when-kimi-says-hi-im-claude ↩
Yong, Zheng-Xin, et al. "An Independent Safety Evaluation of Kimi K2.5." arXiv preprint arXiv:2604.03121. https://arxiv.org/abs/2604.03121 ↩
Anthropic. "Introducing Claude Opus 4.5." November 24, 2025. https://www.anthropic.com/news/claude-opus-4-5 ↩
Anthropic. "Introducing Claude Opus 4.7." April 16, 2026. https://www.anthropic.com/news/claude-opus-4-7 ↩
Latent Space. "AINews: Moonshot Kimi K2.5 Beats Sonnet 4.5, First Native Image and Video, 100-Agent Swarm." https://www.latent.space/p/ainews-moonshot-kimi-k25-beats-sonnet ↩
NYU Shanghai RITS. "DeepSeek Releases V4: Open-Source 1.6T MoE with 1M Context." https://rits.shanghai.nyu.edu/ai/deepseek-releases-v4-open-source-1-6t-moe-with-1m-context/ ↩
SiliconANGLE. "Moonshot AI releases Kimi-K2.6 model with 1T parameters, attention optimizations." April 20, 2026. https://siliconangle.com/2026/04/20/moonshot-ai-releases-kimi-k2-6-model-1t-parameters-attention-optimizations/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Best AI Coding Assistants Cursor Composer 2.5 Kimi K2.6 Lindy Linkup Mistral Large 3

Who created Kimi K2.5 and how does it fit the Kimi line?

How is Kimi K2.5 built?

How was Kimi K2.5 trained?

How does Kimi K2.5 perform on benchmarks?

Is Kimi K2.5 open source and what does the API cost?

How does Kimi K2.5 compare to GPT-5, Claude, Gemini, and DeepSeek?

How was Kimi K2.5 received?

How does Kimi K2.5 differ from Kimi K2.6?

See also

References

Improve this article

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here

Related Articles

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

GLM-4.5

Qwen3

What links here