LongCat-Flash

AI Models Large Language Models Open Source AI

8 min read

Updated Jun 8, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 8, 2026

Fact-checked

In review queue

Sources

11 citations

Revision

v1 · 1,635 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

LongCat-Flash is an open-weight large language model developed by the LongCat team at Meituan, the Chinese on-demand local-services and food-delivery company. Released in late August 2025, it is a mixture-of-experts (MoE) model with roughly 560 billion total parameters that activates only about 18.6 billion to 31.3 billion parameters per token, averaging around 27 billion. ^[1]^[2] Its signature innovation is a "zero-computation experts" mechanism that lets the MoE router send unimportant tokens to no-op experts, so the amount of compute spent varies dynamically per token. ^[1] LongCat-Flash marked Meituan's entry into the frontier of open large language models, and it became the foundation for a fast-moving lineage that includes the reasoning-focused LongCat-Flash-Thinking, the multimodal LongCat-Flash-Omni, and the later trillion-parameter LongCat-2.0-Preview. ^[2]^[3]^[4]

Overview

LongCat-Flash was conceived as an efficient, high-throughput foundation model with a particular emphasis on agentic and tool-use tasks. ^[1] Rather than competing purely on raw scale, its design tries to maximize useful capability per unit of activated compute. The model's MoE architecture holds about 560 billion total parameters but only activates a small, variable fraction of them for each token, which keeps inference cheap relative to the model's total size. ^[1]^[2] Meituan reported inference throughput of more than 100 tokens per second and an output cost on the order of 0.70 US dollars per million tokens. ^[1]^[2]

The first public release, LongCat-Flash-Chat, is a non-reasoning ("non-thinking") instruction-tuned model. ^[5] It was followed within weeks by LongCat-Flash-Thinking, an explicit reasoning variant, and later by omni-modal and trillion-parameter siblings. ^[3]^[4] The release placed Meituan, a company best known for food delivery and local commerce rather than artificial intelligence research, alongside DeepSeek, Alibaba's Qwen family, and Moonshot AI's Kimi as a contributor to China's open-weight model ecosystem. ^[6]^[7]

Development (Meituan)

LongCat-Flash was built by Meituan's LongCat team. ^[1] Meituan is one of China's largest internet platforms, centered on food delivery, restaurant reviews, travel booking, and other local services; the move into frontier-scale language models represented a significant diversification for the company. ^[6]^[7] Caixin and other outlets framed the launch as Meituan formally entering the open-source AI race, joining incumbents such as DeepSeek and Alibaba. ^[6]

The LongCat-Flash Technical Report, posted to arXiv on September 1, 2025, is credited to the "Meituan LongCat Team" along with a large list of contributing authors. ^[1] Coverage of the release clustered around the same period, with the Chinese-state outlet China Youth International and the model's Hugging Face card both dating the open-source release to early September 2025, and some English-language summaries citing late August 2025 for the initial announcement. ^[5]^[8] Meituan released the model openly with weights, code, and a technical report, positioning LongCat as a continuing line rather than a one-off project. ^[1]^[5]

One widely noted detail is training efficiency: Meituan reported that LongCat-Flash was trained on more than 20 trillion tokens and that this pre-training run completed within roughly 30 days, supported by a scaling framework built for stability at large cluster sizes. ^[1]

Architecture (zero-computation experts, MoE)

LongCat-Flash uses a mixture-of-experts transformer with approximately 560 billion total parameters. ^[1] In a standard MoE, a router selects a fixed number of expert sub-networks to process each token, so the activated parameter count is constant. LongCat-Flash departs from this in two main ways. ^[1]

The first is zero-computation experts. The model adds expert "slots" that perform no computation (an identity or no-op transformation). Because not all tokens require the same amount of processing, the router can assign these no-op experts to less important tokens, effectively spending little or no extra compute on them, while routing significant tokens to real experts. ^[1] The result is that the number of activated parameters varies per token, ranging from about 18.6 billion to 31.3 billion and averaging roughly 27 billion, far below the 560 billion total. ^[1]^[2] An auxiliary control keeps the average activation budget stable during training so the dynamic routing does not destabilize. ^[1]

The second is Shortcut-connected MoE (ScMoE). This design rewires the network so that the heavy all-to-all communication required by MoE expert routing can be overlapped with computation, widening the "computation-communication overlap window." ^[1] This overlap is a major reason the model can sustain high inference throughput (over 100 tokens per second) despite its size. ^[1]^[2]

To train a model of this scale stably, Meituan reported a comprehensive framework combining hyperparameter transfer, model-growth initialization (growing a larger model from a smaller trained one), a "multi-pronged stability suite," and deterministic computation for reproducibility. ^[1] The released LongCat-Flash-Chat supports a 128,000-token context window. ^[5]

Specifications

Attribute	Value
Developer	Meituan (LongCat team) ^[1]
Initial release	Late August to September 1, 2025 (LongCat-Flash-Chat) ^[1]^[5]
Model type	Mixture-of-experts large language model ^[1]
Total parameters	~560 billion ^[1]
Activated parameters per token	~18.6B to 31.3B (avg ~27B) ^[1]
Key architecture features	Zero-computation experts; Shortcut-connected MoE (ScMoE) ^[1]
Training data	>20 trillion tokens (pre-training ~30 days) ^[1]
Context length	128,000 tokens (Chat) ^[5]
Reported inference speed	>100 tokens per second ^[1]^[2]
Reported output cost	~$0.70 per million tokens ^[1]
License	MIT ^[5]
Availability	Hugging Face, GitHub ^[5]

Capabilities and benchmarks

Meituan positioned LongCat-Flash as competitive with leading open and closed models while standing out on agentic and tool-use tasks. ^[1]^[2] Reported benchmark figures should be attributed to Meituan's own technical report and model card unless independently confirmed.

On the LongCat-Flash-Chat model card, Meituan reported scores including 89.71 on MMLU, 86.50 on ArenaHard-V2, 89.65 on IFEval (instruction following), and 48.02 on LiveCodeBench (coding). ^[5] On agentic and tool-use evaluations it reported 73.68 on the telecom split of the tau-squared benchmark, 39.51 on TerminalBench, and 24.30 on VitaBench. ^[5] Independent commentary noted that LongCat-Flash-Chat tended to outperform several mainstream models on agentic tasks while lagging somewhat on coding benchmarks. ^[2]^[7]

Multiple summaries described LongCat-Flash as performing on roughly the same tier as models from DeepSeek, Alibaba's Qwen3 family, and Moonshot AI, as well as some prominent US models, with its efficiency (low activated-parameter count and high throughput) cited as the main differentiator rather than top-of-the-leaderboard accuracy. ^[2]^[6]^[7] As with all vendor-reported benchmarks, these numbers reflect Meituan's evaluation conditions and should be read with that caveat.

Availability and licensing

LongCat-Flash-Chat was released as an open-weight model under the permissive MIT license, with weights distributed on Hugging Face and supporting code on GitHub. ^[5] The permissive license allows commercial use, redistribution, and modification, consistent with the broader trend of Chinese labs releasing capable models under permissive terms. Inference support was added by community runtimes; for example, the LMSYS team published guidance on serving LongCat-Flash with SGLang shortly after release. ^[9]

Not every model in the lineage is open. While LongCat-Flash-Chat, LongCat-Flash-Thinking, and LongCat-Flash-Omni were released openly, Meituan opened the later LongCat-2.0-Preview only for free testing rather than as downloadable weights. ^[4]

Significance and the LongCat lineage

LongCat-Flash is significant on two fronts. First, it represented a major new and somewhat unexpected entrant to the frontier open-weight landscape: a food-delivery and local-services company building and openly releasing a 560-billion-parameter model placed Meituan in the same conversation as dedicated AI labs. ^[6]^[7] Second, its zero-computation experts and ScMoE designs offered concrete architectural ideas for making very large MoE models cheaper to run, contributing to the broader open-MoE trend pioneered by models such as DeepSeek-V3 and Qwen's MoE releases. ^[1]^[2]

The model anchored a rapidly expanding family released over late 2025 and into 2026:

LongCat-Flash-Chat (released around late August to September 1, 2025): the non-thinking foundation chat model. ^[1]^[5]
LongCat-Flash-Thinking (technical report posted September 19, 2025): an explicit reasoning variant built on the same 560B / ~27B-activated backbone. Meituan reported strong results on competition mathematics and reasoning benchmarks (including a perfect score on AIME-25 in its reporting) and described it as reaching state-of-the-art among open-source models on logic, math, code, and agentic tasks; press coverage compared its reasoning ability favorably against leading proprietary models. ^[3]
LongCat-Video (open-sourced October 25, 2025): a roughly 13.6-billion-parameter video-generation model based on a diffusion transformer architecture, supporting text-to-video, image-to-video, and video continuation. ^[10]
LongCat-Flash-Omni (released October 31, 2025): a 560B / ~27B-activated omni-modal model that accepts arbitrary combinations of text, image, audio, and video input and produces text or speech output, with a 128K context and low-latency real-time audio-visual interaction. ^[11]
LongCat-2.0-Preview (opened for testing April 24, 2026): a trillion-parameter successor with a context window extended toward one million tokens. Unlike the Flash generation it was offered as a hosted preview (with a daily free-token quota) rather than as open weights, and Meituan reported training it largely on domestic Chinese accelerators. ^[4]

Taken together, the LongCat releases established Meituan as a persistent open-model contributor and a notable example of a Chinese consumer-internet company scaling frontier AI, including on domestic hardware, alongside the better-known efforts of DeepSeek, Qwen, and Kimi. ^[4]^[6]^[7]

References

Meituan LongCat Team. "LongCat-Flash Technical Report." arXiv:2509.01322, September 1, 2025. https://arxiv.org/abs/2509.01322 ↩
Skywork AI. "Meituan's open-source LongCat-Flash: A new MoE platform that can't be ignored." https://skywork.ai/blog/meituans-open-source-longcat-flash-a-new-moe-platform-that-cant-be-ignored/ ↩
"LongCat-Flash-Thinking Technical Report." arXiv:2509.18883, September 2025. https://arxiv.org/html/2509.18883v1 ↩
KuCoin News. "Meituan's trillion-parameter AI model, LongCat-2.0-Preview, is now open for testing." https://www.kucoin.com/news/flash/meituan-s-trillion-parameter-ai-model-longcat-2-0-preview-opens-for-testing ↩
"meituan-longcat/LongCat-Flash-Chat." Hugging Face. https://huggingface.co/meituan-longcat/LongCat-Flash-Chat ↩
Caixin Global. "Meituan Enters Open-Source AI Race With LongCat Model." September 2, 2025. https://www.caixinglobal.com/2025-09-02/meituan-enters-open-source-ai-race-with-longcat-model-102357875.html ↩
IndexBox. "Meituan Open-Sources 560B Parameter AI Model LongCat-Flash-Chat." https://www.indexbox.io/blog/meituan-open-sources-longcat-flash-chat-llm-to-rival-alibaba-deepseek/ ↩
China Youth International. "Meituan Releases and Open-Sources LongCat-Flash-Chat." September 1, 2025. https://en.youth.cn/RightNow/202509/t20250901_16211196.htm ↩
LMSYS Org. "LongCat-Flash: Deploying Meituan's Agentic Model with SGLang." September 1, 2025. https://www.lmsys.org/blog/2025-09-01-sglang-longcat-flash/ ↩
AIbase. "Meituan open-sources LongCat-Video." https://www.aibase.com/news/ ↩
MarkTechPost. "LongCat-Flash-Omni: A SOTA Open-Source Omni-Modal Model with 560B Parameters with 27B activated." November 2, 2025. https://www.marktechpost.com/2025/11/02/longcat-flash-omni-a-sota-open-source-omni-modal-model-with-560b-parameters-with-27b-activated-excelling-at-real-time-audio-visual-interaction/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

ZebraLogic