Muse Spark
Last reviewed
Jun 2, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 ยท 2,054 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 ยท 2,054 words
Add missing citations, update stale details, or suggest a clearer explanation.
Muse Spark is a proprietary multimodal reasoning model developed by Meta Superintelligence Labs (MSL), the artificial intelligence division Meta reorganized in 2025. Announced on April 8, 2026, it is the first model in Meta's "Muse" series and the lab's debut release [1][2]. Meta presents it as a natively multimodal model "with support for tool-use, visual chain of thought, and multi-agent orchestration," small and fast enough to run interactively yet able to reason through hard problems in science, math, and health [1]. It is notable for two reasons beyond its capabilities: it is widely described as Meta's first frontier model, and it is the company's first flagship model released without open weights, a sharp break from the open LLaMA lineage that defined Meta's earlier AI strategy [3][4].
The model should not be confused with Microsoft's "Muse," an unrelated 2025 generative model for video-game world simulation. Meta's Muse Spark is a general-purpose large language model and reasoning system aimed at consumer assistants and developer applications.
Muse Spark is positioned as the first rung on what Meta calls its scaling ladder toward "personal superintelligence" [1]. Rather than leading with a maximally large model, Meta describes a "deliberate and scientific approach to model scaling where each generation validates and builds on the last before we go bigger" [2]. Muse Spark is therefore characterized by Meta as "small and fast by design," with the next generation already in development [2].
The model accepts text, image, and voice input and can produce text alongside interactive outputs such as small websites, mini-games, and dashboards [2][5]. It ships with two everyday modes, branded Instant and Thinking, plus a heavier "Contemplating" mode that orchestrates multiple agents reasoning in parallel for the hardest queries [2][1]. Meta says it built the model with input from more than 1,000 physicians to strengthen health reasoning, a domain where it claims unusually strong results [1].
| Attribute | Detail |
|---|---|
| Developer | Meta Superintelligence Labs |
| Series | Muse (Muse Spark is the first model) |
| Announced | April 8, 2026 [1][2] |
| Internal code name | Avocado [5][6] |
| Modality | Natively multimodal (text, image, voice) [1][2] |
| Reasoning modes | Instant, Thinking, Contemplating [2] |
| Weights | Proprietary / closed (no open weights) [3] |
| Access | meta.ai, Meta AI app, private API preview [1] |
Muse Spark is the first publicly shipped model from Meta Superintelligence Labs, the unit Meta stood up after a leadership overhaul of its AI organization. The lab is led by Alexandr Wang, the former Scale AI chief executive who became Meta's first chief AI officer roughly nine months before the launch [5][7]. To bring Wang in, Meta paid about $14.3 billion for a 49% nonvoting stake in Scale AI, a deal that anchored the broader reset of the company's AI ambitions under Mark Zuckerberg [7].
The release marked Meta's first major model in over a year and was read by the press as the company's attempt to rejoin the frontier alongside OpenAI, Google, and Anthropic after a period of reduced visibility [7][8]. Meta says the work involved rebuilding its AI stack "from the ground up" over the preceding nine months, spanning model architecture, optimization, and data curation [1][2].
Meta unveiled Muse Spark on April 8, 2026, through a Meta Superintelligence Labs blog post on ai.meta.com and a companion newsroom announcement on about.fb.com [1][2]. Reporting from CNBC, TechCrunch, and Bloomberg framed it as the first model out of the new lab and the start of a "ground-up overhaul" of Meta's AI efforts [7][5][3]. The model had been developed under the internal code name Avocado before launch [5][6].
At announcement the model began rolling out inside Meta's consumer products. A May 12, 2026 update to Meta's newsroom post described continued rollout to WhatsApp, Instagram, Facebook, Messenger, and Meta's Ray-Ban and Oakley smart glasses [2].
Meta has disclosed the broad shape of how Muse Spark was built without releasing weights or a full technical report. The headline claim is efficiency: Meta states the model reaches its capabilities "with over an order of magnitude less compute than our previous model, Llama 4 Maverick," which it credits to an overhauled pretraining stack covering architecture, optimization, and data curation [1][3]. (For reference, Llama 4 Maverick was a mixture-of-experts model in the earlier, open-weight Llama 4 Scout and Maverick generation.)
On the post-training side, Meta describes reinforcement learning that shows log-linear scaling with smooth, predictable gains [1]. A distinctive element is what third-party coverage labeled "thought compression": during RL the model first improves by reasoning for longer, after which a penalty on thinking time pushes it to condense its chain of thought while holding onto the accuracy gains [9]. At inference, this thinking-time penalty combines with multi-agent orchestration to manage test-time compute [1].
Meta calls the model "natively multimodal," meaning vision and language were trained together from the start rather than a vision encoder being attached to a finished text model [9]. The company highlights performance on visual STEM questions, entity recognition, and localization as evidence of that integrated training [1].
Muse Spark powers a refreshed Meta AI assistant and is meant to be used conversationally across Meta's apps and glasses [2]. Beyond plain question answering, Meta and reviewers point to several specific abilities:
Meta and independent benchmarker Artificial Analysis published a range of scores. The figures below combine Meta's own disclosures with Artificial Analysis's evaluation and a benchmark table compiled by MarkTechPost from Meta's materials. Where a comparison comes from a single source, that source is cited directly; cross-domain comparisons of this kind should be read with the usual caution about vendor-influenced framing [10][4][9].
Meta's headline results, in Contemplating mode where noted:
| Benchmark | Muse Spark | Notes / comparison |
|---|---|---|
| Humanity's Last Exam (Contemplating) | 58% | Meta-reported [1][4] |
| Humanity's Last Exam (with tools) | 58.4 | vs GPT-5.4 Pro 58.7, Gemini 3.1 Deep Think 53.4 [9] |
| FrontierScience Research (Contemplating) | 38.3 | vs GPT-5.4 Pro 36.7, Gemini 3.1 Deep Think 23.3 [9][1] |
| HealthBench Hard | 42.8 | vs GPT-5.4 Xhigh 40.1, Gemini 3.1 Pro High 20.6, Claude Opus 4.6 Max 14.8 [9] |
| ARC-AGI-2 | 42.5 | weak spot vs Gemini 3.1 Pro High 76.5, GPT-5.4 Xhigh 76.1 [9] |
Artificial Analysis's independent evaluation placed Muse Spark fourth on its Artificial Analysis Intelligence Index with a score of 52, inside the top five and behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6 [4][10]:
| Benchmark (Artificial Analysis) | Muse Spark | Best competitor |
|---|---|---|
| Intelligence Index (overall) | 52 (rank 4) | Gemini 3.1 Pro, GPT-5.4, Claude Opus 4.6 above [10] |
| Humanity's Last Exam | 39.9% | Gemini 3.1 Pro Preview 44.7%, GPT-5.4 41.6% [10] |
| MMMU-Pro (vision) | 80.5% | Gemini 3.1 Pro Preview 82.4% [10] |
| CritPT (physics research) | 11% | fifth-highest; above Gemini 3 Flash 9% [10] |
| GDPval-AA (real-world tasks) | 1427 | Claude Sonnet 4.6 1648, GPT-5.4 1676 [10] |
Artificial Analysis also reported that Muse Spark used about 58 million output tokens to complete the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and well below Claude Opus 4.6 (157M) and GPT-5.4 (120M), which the firm read as evidence of the model's efficiency-oriented design [10]. Its summary was that "Muse Spark essentially closes the gap to the frontier in a single release" [10].
At launch Muse Spark was available to general users for free on the meta.ai website and in the Meta AI app, and through a private API preview offered to select partners [1]. Using the consumer surface requires signing in with a Meta account via Facebook or Instagram [5]. Contemplating mode was described as rolling out gradually rather than being on by default [1].
Unlike the open-weight LLaMA models, Muse Spark is closed: its weights are not downloadable and, per Bloomberg, its design and code are not being made public [3][6]. Meta has said it hopes to open-source future versions of the model, and Zuckerberg reiterated that Meta still plans to release open-source models in the future, but the flagship itself is proprietary [2][5].
Coverage was mixed. Market-focused outlets treated the launch as a credible re-entry: Artificial Analysis called it a model that closes the gap to the frontier in one release, and some analysts framed it as evidence of "significant" monetization potential for Meta's assistant [10][8]. CNBC noted that while the model showed promise, Wall Street remained focused on the broader question of Zuckerberg's AI strategy and return on tens of billions of dollars of spending [11].
Reviewers were more skeptical on capability. Gizmodo's coverage, headlined that the model "doesn't exactly spark joy," argued that Muse Spark was clearly stronger than Meta's prior offerings but "typically fell short" of leading rivals, singling out coding and agentic tasks as continued weaknesses, and cautioned readers to treat Meta's benchmark figures with skepticism given the company's history [12]. Several outlets characterized the release as moving Meta from out of the race to competing for "also-ran" status rather than for the top spot [12]. Wang himself acknowledged the model was a starting point, saying it was MSL's first model and that there were "certainly rough edges we will polish over time" [4].
The disclosed limitations fall into three buckets. First, capability gaps: independent and editorial reviews placed Muse Spark below the strongest frontier models on overall intelligence, coding, and agentic benchmarks, and Meta's own materials show a notable weakness on abstract-reasoning tests such as ARC-AGI-2 [12][9][10]. Second, transparency: as a closed model with no released weights and no full technical paper at launch, many architectural details remain undisclosed, and benchmark comparisons rely partly on Meta-supplied numbers [3][9]. Third, privacy and trust: because the consumer experience requires a Meta login and Meta has a contested history with user data, commentators raised concerns about how personal data might intersect with the model, especially given its health-assistant ambitions [5][12].