Inception Labs

AI Companies Diffusion Models Large Language Models

21 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

30 citations

Revision

v3 · 4,279 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Inception Labs (often referred to simply as Inception) is a Palo Alto, California-based artificial intelligence startup that commercializes diffusion language models (dLLMs) for text and code generation. Founded in 2024 by Stanford University computer science professor Stefano Ermon together with two of his former Ph.D. students, UCLA professor Aditya Grover and Cornell University professor Volodymyr Kuleshov, the company released Mercury Coder in February 2025, which it describes as the first commercial-scale diffusion-based large language model.^[1]^[2] Inception positions Mercury as a faster, more compute-efficient alternative to mainstream autoregressive large language models such as those produced by OpenAI, Anthropic, and Google DeepMind.^[3]

Rather than generating text token-by-token from left to right as autoregressive transformer models do, Inception's models begin with a noisy or masked draft of the full response and iteratively refine multiple tokens in parallel through a denoising process inspired by image diffusion models.^[4] The company reports throughput in excess of 1,000 tokens per second on a single NVIDIA H100 GPU for its coding models, several times higher than the throughput of speed-optimized frontier models of comparable quality; its 2026 flagship, Mercury 2, reaches 1,009 tokens per second on NVIDIA Blackwell GPUs.^[4]^[5]^[27] Inception raised a $50 million seed financing round, announced in November 2025, led by Menlo Ventures with participation from Mayfield, Innovation Endeavors, Microsoft's M12 fund, Snowflake Ventures, Databricks Investment and Nvidia's NVentures, along with angel investments from Andrew Ng and Andrej Karpathy.^[2]^[6]

Who founded Inception Labs?

The company was co-founded by three computer science researchers who collaborated for nearly a decade on diffusion modeling and generative AI prior to launching Inception. Stefano Ermon, the company's chief executive officer, is an associate professor of computer science at Stanford University where he leads work on probabilistic generative modeling. He is widely credited as a co-inventor of modern diffusion models, the same family of techniques that underpin image and video generators such as Sora and Midjourney.^[7]^[2] According to Menlo Ventures' partnership announcement, Ermon and his collaborators "pursued a contrarian hypothesis that diffusion models could match autoregressive approaches for language generation," eventually demonstrating that "a diffusion model matched the quality of autoregressive models on text generation and ran 10x faster."^[7]

Aditya Grover, a co-founder, is an assistant professor of computer science at the University of California, Los Angeles (UCLA). He completed his Ph.D. at Stanford under Ermon's supervision and is a co-author on several of the foundational research papers that Inception commercializes.^[1]^[7] Volodymyr Kuleshov, the third co-founder, is an assistant professor of computer science at Cornell Tech, the Cornell University campus in New York City. He likewise completed his Ph.D. at Stanford under Ermon. Kuleshov leads an academic research group that has been particularly active in publishing the modern theory of masked and block diffusion language models.^[8]^[9]

Beyond diffusion modeling, the founding team is associated with several broadly used techniques in modern generative AI. Members of the team are credited as co-inventors of Direct Preference Optimization (DPO), a popular alternative to reinforcement learning from human feedback for aligning language models, as well as work that contributed to Flash Attention.^[2]^[7] Inception's broader research and engineering team draws from Stanford, UCLA, Cornell, Google DeepMind, Meta AI, Microsoft AI and OpenAI according to the company's website.^[10]

What research is Inception's technology based on?

Inception's commercial models build on several years of academic work on discrete diffusion models for language. Three lines of research are particularly relevant to the company's products.

The first is SEDD (Score Entropy Discrete Diffusion), introduced in the 2024 paper "Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution" by Aaron Lou, Chenlin Meng, and Stefano Ermon. The paper proposed score entropy, a loss that extends score matching to discrete spaces, and demonstrated that the resulting diffusion language model could substantially close the perplexity gap with autoregressive models, in some settings beating GPT-2 on generative perplexity while requiring far fewer function evaluations.^[11] On X (Twitter), Ermon described the work as "diffusion models finally bridging the gap with autoregressive models on language."^[12]

The second is MDLM (Masked Diffusion Language Models), introduced in "Simple and Effective Masked Diffusion Language Models" by Subham Sekhar Sahoo, Marianne Arriola, Aaron Gokaslan, Edgar Mariano Marroquin, Alexander M. Rush, Yair Schiff, Justin T. Chiu and Volodymyr Kuleshov, published at NeurIPS 2024. MDLM introduced a substitution-based parameterization that reduces the absorbing-state diffusion training objective to a mixture of classical masked language modeling losses, achieving state-of-the-art likelihoods among diffusion models and approaching autoregressive perplexity.^[9]^[13]

The third is BD3-LMs (Block Discrete Denoising Diffusion Language Models), introduced in "Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models" by Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo and Volodymyr Kuleshov, accepted as an oral presentation at ICLR 2025. BD3-LMs decompose a token sequence into blocks and perform discrete diffusion within each block, interpolating between autoregressive and diffusion language models by tuning the block size and providing variance-reduced training and data-driven noise schedules.^[14]^[15]

These three works, all closely tied to Ermon's Stanford group and Kuleshov's Cornell group, established that masked and block-wise discrete diffusion can match or approach autoregressive language model perplexity while supporting parallel inference and infilling. The Mercury technical report by Inception cites these foundations and notes that its production models adopt a Transformer denoiser trained with a masked diffusion objective at large scale.^[4]

How much has Inception Labs raised?

Inception Labs was incorporated in Palo Alto, California in 2024.^[1]^[16] According to Mayfield, the firm led an initial investment in the company in July 2024.^[16] At that time the company operated in stealth mode, and the precise size of the early-stage round was not publicly disclosed.^[1]

The company emerged from stealth on February 26, 2025, when it publicly unveiled Mercury Coder. TechCrunch's coverage that day noted that "the Mayfield Fund invested in the company," that Inception had secured "several customers, including unnamed Fortune 100 companies," and that Ermon declined to disclose specifics of the early funding.^[1] In April 2025, the company was selected to participate in the AWS Generative AI Accelerator cohort, according to publicly available announcements about that program.^[17]

On November 6, 2025, Inception announced a $50 million seed financing round led by Menlo Ventures. Participating investors included Mayfield (which had backed the company earlier), Innovation Endeavors, M12 (Microsoft's venture fund), Snowflake Ventures, Databricks Investment, and NVentures (Nvidia's venture capital arm), as well as angel investments from Andrew Ng and Andrej Karpathy.^[2]^[6]^[18] TechCrunch reported the figure as "$50 million in seed funding" and noted that the funding would support continued development of diffusion models for code and text and the scaling of the Mercury family.^[2] The company has not publicly disclosed its post-money valuation as part of the announcement.^[2]

What is Mercury Coder?

Inception's first publicly available product was Mercury Coder, announced on February 26, 2025 as part of the company's emergence from stealth.^[3]^[1] The model was released in two variants: Mercury Coder Mini and Mercury Coder Small. According to the company, Mercury Coder is "the first publicly available dLLM" and a coding-focused diffusion language model designed to deliver both high throughput and competitive code quality.^[3]

In Inception's launch blog post, Mercury Coder Mini was reported to achieve approximately 1,109 tokens per second of output throughput on an NVIDIA H100 GPU, while Mercury Coder Small was reported at approximately 737 tokens per second on the same hardware.^[3]^[19] The company stated that these speeds are roughly 5-10x faster than speed-optimized autoregressive frontier models running in the 50-200 tokens per second range, and up to 20x faster than larger frontier models that produce fewer than 50 tokens per second.^[3]

On standard coding benchmarks reported in the company's launch materials and the subsequent Mercury technical report, Mercury Coder Small achieved 90.0 pass@1 on HumanEval and 76.2 on MultiPL-E, while Mercury Coder Mini achieved 88.0 on HumanEval and 74.1 on MultiPL-E. On fill-in-the-middle code tasks, Mercury Coder Small reached 84.8 and Mini reached 82.2, exceeding Codestral 2501's reported 82.5.^[19]^[4] These results place Mercury Coder broadly in line with speed-optimized competitors such as GPT-4o Mini, Claude 3.5 Haiku, Gemini 2.0 Flash-Lite and Codestral on quality, while substantially exceeding them on throughput.^[4]^[19]

In Copilot Arena, a human-evaluation platform that pits coding models against each other inside an IDE, Inception reported that Mercury Coder Mini ranked second overall while posting the lowest average latency of any participating model, at roughly 25 milliseconds.^[4]^[19]

Mercury technical report

In June 2025 Inception's research team published a technical report titled "Mercury: Ultra-Fast Language Models Based on Diffusion" on arXiv, authored by Samar Khanna, Siddhant Kharbanda, Shufan Li, Harshit Varma, Eric Wang, Sawyer Birnbaum, Ziyang Luo, Yanis Miraoui, Akash Palrecha and the three founders Stefano Ermon, Aditya Grover and Volodymyr Kuleshov.^[4] The report describes Mercury Coder Mini and Small as Transformer-based denoisers trained on trillions of tokens on clusters of NVIDIA H100 GPUs, using a masked diffusion training objective and an iterative parallel decoding procedure at inference time.^[4]

According to the technical report, the models support a standard context length of 32,768 tokens, extensible up to 128,000 tokens.^[4] The "coarse-to-fine" decoding procedure starts from a fully masked or noised draft of the response and refines all positions in parallel across a small number of denoising steps, in contrast to autoregressive decoding which advances one token at a time.^[4]

Subsequent products

Mercury (general chat model)

Following Mercury Coder, Inception expanded the Mercury family to general-purpose chat. The company released a general chat-oriented version of Mercury, marketed simply as Mercury, in 2025.^[20] In its blog post introducing the general chat model, Inception reported that Mercury supports a 128,000-token context window, is OpenAI API-compatible, and is positioned for use cases including chat, summarization and general text generation at high concurrency.^[20]

On third-party benchmarking by Artificial Analysis, the company reported that Mercury matches the quality of speed-optimized frontier models such as GPT-4.1 Nano and Claude 3.5 Haiku while running over 7x faster, with throughput around 708 tokens per second versus roughly 96 tokens per second for GPT-4.1 Nano on the same evaluation.^[20] The model achieved 69 percent on MMLU-Pro, 85 percent on HumanEval and 83 percent on MATH-500 in Inception's reported numbers.^[20]

Inception also announced that Mercury would serve as the founding LLM partner for Microsoft's NLWeb project, a Microsoft Research initiative that allows publishers to add natural-language interfaces to their websites.^[20]

Cloud and API availability

In late August 2025 Inception announced that Mercury and Mercury Coder were available through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. AWS's announcement, dated August 27, 2025, highlighted that the models deliver "ultra-fast generation speeds of up to 1,100 tokens per second on NVIDIA H100 GPUs, up to 10 times faster than comparable models," support for fill-in-the-middle code generation and context lengths up to 128,000 tokens.^[21]

In addition to AWS Bedrock and SageMaker JumpStart, Inception lists availability via its own API platform, Azure AI Foundry, OpenRouter, Models.dev and Poe.^[22]^[23] The company offers OpenAI-compatible API endpoints, enabling drop-in replacement of OpenAI client calls.^[23]

Mercury 2 and Mercury Edit 2 (2026)

In early 2026 Inception introduced a refreshed and significantly upgraded generation of the Mercury family. The prior generation of Mercury models is now designated Mercury 1, which remains supported for existing customers.^[28] Mercury 2, branded by the company as "the fastest reasoning LLM" and described as the first reasoning-capable diffusion LLM, was announced on February 24, 2026 and discussed further in subsequent company blog posts.^[24]^[5]^[27] Mercury Edit 2, a compact coding- and editing-focused diffusion LLM, was introduced on March 30, 2026.^[5]^[29]

According to Inception's announcement, Mercury 2 includes larger parameter counts, higher-quality training data, an enhanced denoiser architecture, new training objectives and faster inference algorithms, with improvements across coding, math, instruction following and knowledge recall.^[5] The model ships with a 128,000-token context window, tunable reasoning depth, native tool use and schema-aligned JSON (structured) output, and it exposes an OpenAI-compatible API for drop-in adoption.^[27]^[28] In Inception's reported numbers, Mercury 2 delivers approximately 5x faster generation than leading speed-optimized reasoning models while reaching 1,009 tokens per second of output throughput on NVIDIA Blackwell GPUs, compared with around 89 tokens per second for Claude Haiku 4.5 in reasoning mode and around 71 tokens per second for GPT-5 Mini in the same evaluation.^[24]^[27] Skyvern chief technology officer Suchintan Singh, an early customer, said that "Mercury 2 is at least twice as fast as GPT-5.2, which is a game changer for us," while Zed co-founder Max Brunsfeld said the model's suggestions "land fast enough to feel like part of your own thinking, not something you have to wait for."^[27]

On quality, Inception reported that Mercury 2 scored 91.1 on AIME 2025, 73.6 on GPQA, 71.3 on IFBench, 67.3 on LiveCodeBench, 38.4 on SciCode and 52.9 on Tau2. According to the company, these scores place Mercury 2 in competitive range with Claude Haiku 4.5 and GPT-5.2 Mini on quality while delivering roughly an order of magnitude more throughput.^[24]

Pricing for Mercury 2 was set at $0.25 per million input tokens and $0.75 per million output tokens at launch, with cached input priced at $0.025 per million tokens, undercutting Gemini 3 Flash's $0.50 / $3.00 by half on input and four times on output, according to the company.^[24]^[28] Mercury Edit 2 launched at the same per-token pricing and is optimized for autocomplete and next-edit prediction in latency-sensitive coding workflows, with a 32,000-token context window; Inception reported that it delivers a 48 percent higher suggestion acceptance rate than the previous editing model while being 27 percent more selective about when to suggest an edit.^[28]^[29] Mercury Edit 2 is integrated into the Zed code editor, whose co-founder Max Brunsfeld described diffusion as "a meaningfully different way to generate edit predictions."^[29] Inception's web site as of May 2026 advertises 99.5 percent uptime and enterprise service-level agreements for both models.^[10]

The current Mercury product line, per Inception's models page and launch materials, is summarized below.

Model	Role	Reported speed	Context window	Pricing per 1M tokens (input / cached / output)
Mercury 2	Flagship reasoning dLLM	1,009 tok/s (Blackwell)	128K	$0.25 / $0.025 / $0.75
Mercury Edit 2	Coding and editing dLLM	1,000+ tok/s	32K	$0.25 / $0.025 / $0.75
Mercury 1 (Mercury, Mercury Coder)	Legacy, still supported	737 to 1,109 tok/s (H100)	up to 128K	legacy pricing

Sources for the table: Inception launch materials and models page.^[24]^[27]^[28]^[29]

How do diffusion language models work?

The technical premise of Inception's products is that text generation can be reformulated as iterative parallel denoising rather than sequential next-token prediction. In the company's framing, traditional autoregressive transformer decoders such as GPT and LLaMA generate a response token by token, conditioning each step on the entire previously generated prefix. This makes inference fundamentally sequential and bounded by the latency of the single autoregressive step, regardless of how much GPU parallelism is available.^[4]

Diffusion language models invert this picture. Inception's models begin with a sequence of mask or noise tokens of a chosen length and apply a Transformer-based denoiser that, in each step, predicts a refined version of all tokens in parallel. The denoiser is iterated for a small number of steps until it converges on a fluent response. Because each step processes every position in parallel, the wall-clock latency is dominated by the number of denoising steps rather than the length of the output, and GPU throughput is much more effectively used.^[4]

The technical lineage is rooted in masked language modeling (which has long allowed bidirectional context) and in continuous-state diffusion models for images and video (which proved that iterative denoising can produce high-quality samples). The masked diffusion language model formulations developed in MDLM and BD3-LMs provide the theoretical grounding, while SEDD provides an alternative score-based perspective.^[9]^[14]^[11] Inception's Mercury technical report describes a production-scale implementation that combines these ideas with a Transformer denoiser, masked diffusion training and an inference-time parallel decoding procedure.^[4]

How does Mercury compare to autoregressive LLMs?

Inception's marketing and technical materials draw a direct contrast between Mercury and mainstream autoregressive large language models.^[3]^[4]

On speed, the company reports that on standard NVIDIA H100 hardware, Mercury Coder Small and Mini deliver roughly 737 and 1,109 tokens per second respectively, that the general-purpose Mercury chat model delivers roughly 708 tokens per second, and that Mercury 2 reaches 1,009 tokens per second on NVIDIA Blackwell GPUs even in reasoning mode. By comparison, speed-optimized autoregressive frontier models are typically reported in the 50-200 tokens per second range on similar hardware, and the largest autoregressive frontier models often produce fewer than 50 tokens per second.^[3]^[4]^[20]^[24]^[27]

On cost, the company has launched Mercury 2 and Mercury Edit 2 at $0.25 per million input tokens and $0.75 per million output tokens, while earlier Mercury family members carried pricing around $0.25 per million input tokens and $1.00 per million output tokens. Inception positions these prices as competitive with or below the lowest-cost autoregressive APIs of similar quality.^[24]^[5]

On quality, Inception reports that Mercury Coder is broadly competitive with GPT-4o Mini, Claude 3.5 Haiku, Gemini 2.0 Flash-Lite and Codestral on coding benchmarks, that Mercury matches GPT-4.1 Nano and Claude 3.5 Haiku on general chat benchmarks, and that Mercury 2 sits in competitive range of Claude Haiku 4.5 and GPT-5.2 Mini on reasoning benchmarks.^[4]^[20]^[24] These claims are based primarily on Inception's own reporting and benchmark partner Artificial Analysis; longer-running independent evaluations remain limited at the time of writing.^[20]

On deployment flexibility, diffusion language models offer some properties not native to autoregressive decoders. Because the denoiser conditions on the entire output context (including future positions), Mercury supports infilling and fill-in-the-middle code completion as a first-class operation rather than via specialized tokens.^[4] Because the number of denoising steps can be tuned, latency can be traded against quality in ways that left-to-right autoregressive decoding does not allow.^[4]

There are trade-offs as well. Diffusion language models historically have lagged the very best autoregressive systems on tasks that require long, free-form, open-ended generation, and the field's longest-running benchmarks are designed around autoregressive baselines. Independent academic evaluation of how well masked-diffusion language modeling scales relative to the best autoregressive models at trillion-token training budgets is still an open research area. Inception's published benchmarks indicate the gap on common benchmarks has been substantially narrowed, but the relative behavior of dLLMs on long-context reasoning, agentic tool use and emergent behaviors at frontier scale remains a subject of ongoing study.^[4]^[11]

Industry reception and adoption

Press coverage of Inception's emergence from stealth in February 2025 emphasized the contrarian nature of the company's bet on diffusion for text. TechCrunch's coverage described Inception's approach as "a new type of AI model" and quoted Ermon noting that with traditional LLMs "you cannot generate the second word until you've generated the first one," whereas diffusion models can generate text in parallel.^[1] The New Stack characterized Mercury 2 as "10x faster than Claude, ChatGPT, Gemini" in its reasoning category.^[25] PYMNTS noted Inception's positioning as a Silicon Valley startup building faster LLMs.^[26]

Mayfield's portfolio announcement in 2025 framed Mercury as the first commercial-scale diffusion LLM that "runs 10x faster and at 1/10th the cost of traditional LLMs," and drew parallels between the platform shift Inception is pursuing and earlier technology transitions.^[16] Menlo Ventures, the lead investor in the November 2025 financing, described the founders' bet as "a contrarian hypothesis that diffusion models could match autoregressive approaches for language generation" and characterized Inception as commercializing diffusion-based reasoning at production scale.^[7]

Adoption signals include availability of Mercury and Mercury Coder on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart from August 2025, on Azure AI Foundry, on OpenRouter and Poe, and as the founding LLM partner for Microsoft's NLWeb project.^[20]^[21]^[22]^[23] Named product integrations include the Zed code editor and coding tools such as ProxyAI and Kilo Code.^[27]^[29] The technical report describes Fortune 100 deployments and high-throughput enterprise workloads, although Inception has generally not disclosed specific customer names.^[1]^[4]

Inception's research output has also been cited by other diffusion language model efforts. The MDLM line of work, developed in Kuleshov's group, has been adopted as the basis for ByteDance's Seed Diffusion (described in third-party reporting as an industry-grade diffusion LLM) and for Nvidia's Genmol effort on molecular generation, according to Inception-associated public materials.^[8]

Recent developments

As of mid-2026, Inception's publicly described product line consists of:

Mercury 2, the reasoning-capable diffusion LLM, with throughput of 1,009 tokens per second on NVIDIA Blackwell GPUs and pricing of $0.25 per million input tokens and $0.75 per million output tokens.^[24]^[5]^[27]
Mercury Edit 2, a coding- and editing-focused diffusion LLM with the same pricing and a 32,000-token context window, optimized for latency-sensitive coding workflows.^[5]^[29]
Mercury 1, the prior generation (the original Mercury general chat model and Mercury Coder Mini and Small), which remains supported for existing customers and documented in the Mercury technical report.^[4]^[20]^[3]^[28]

The diffusion-LLM approach that Inception pioneered has drawn competition from large incumbents. On June 10, 2026, Google DeepMind released DiffusionGemma, an experimental open-weight diffusion language model built on a 26-billion-parameter Gemma 4 mixture-of-experts architecture that claims up to four times faster inference than comparable autoregressive models.^[30] In a June 2026 head-to-head reported by Decrypt, Mercury 2 reached comparable throughput of roughly 1,000 tokens per second while scoring about 90 percent on AIME 2026 to DiffusionGemma's 69.1 percent, and 77 percent on GPQA to DiffusionGemma's 73.2 percent, indicating that the open-weight model matched the speed of parallel generation but not its reasoning quality.^[30] Inception characterized the result by stating that "Mercury 2 continues to lead the Pareto frontier for quality, speed, and cost among publicly available diffusion LLMs."^[30]

Inception's website lists availability through its own API, AWS Bedrock and Azure Foundry, an OpenAI-compatible interface, enterprise service-level agreements at 99.5 percent or higher uptime, and a research and engineering organization drawn from Stanford, UCLA, Cornell, Google DeepMind, Meta AI, Microsoft AI and OpenAI.^[10] The company continues to position diffusion language modeling as a platform shift on the order of the transition from recurrent neural networks to transformer models, with the founders publicly arguing that diffusion will become the default architecture for foundation-model inference where latency and cost matter.^[7]^[10]

References

Russell Brandom and Marina Temkin, "Inception emerges from stealth with a new type of AI model," TechCrunch, February 26, 2025. https://techcrunch.com/2025/02/26/inception-emerges-from-stealth-with-a-new-type-of-ai-model/ . Accessed 2026-05-19. ↩
Marina Temkin, "Inception raises $50 million to build diffusion models for code and text," TechCrunch, November 6, 2025. https://techcrunch.com/2025/11/06/inception-raises-50-million-to-build-diffusion-models-for-code-and-text/ . Accessed 2026-05-19. ↩
"Introducing Mercury, the World's First Commercial-Scale Diffusion Large Language Model," Inception Labs blog, 2025. https://www.inceptionlabs.ai/blog/introducing-mercury . Accessed 2026-05-19. ↩
Samar Khanna, Siddhant Kharbanda, Shufan Li, Harshit Varma, Eric Wang, Sawyer Birnbaum, Ziyang Luo, Yanis Miraoui, Akash Palrecha, Stefano Ermon, Aditya Grover and Volodymyr Kuleshov, "Mercury: Ultra-Fast Language Models Based on Diffusion," arXiv:2506.17298, 2025. https://arxiv.org/html/2506.17298v1 . Accessed 2026-05-19. ↩
"The Next Step for dLLMs: Scaling up Mercury," Inception Labs blog, May 14, 2026. https://www.inceptionlabs.ai/blog/mercury-refreshed . Accessed 2026-05-19. ↩
"Inception Raises $50M to Power Diffusion LLMs, Increasing LLM Speed and Efficiency by up to 10X and Unlocking Real-Time, Accessible AI Applications," Business Wire, November 6, 2025. https://www.businesswire.com/news/home/20251106570339/en/Inception-Raises-$50M-to-Power-Diffusion-LLMs-Increasing-LLM-Speed-and-Efficiency-by-up-to-10X-and-Unlocking-Real-Time-Accessible-AI-Applications . Accessed 2026-05-19. ↩
"From the Lab to the Frontier: The Story Behind Inception," Menlo Ventures, March 25, 2026. https://menlovc.com/perspective/from-the-lab-to-the-frontier-the-story-behind-inception/ . Accessed 2026-05-19. ↩
Subham Sekhar Sahoo et al., MDLM project page. https://s-sahoo.com/mdlm/ . Accessed 2026-05-19. ↩
Subham Sekhar Sahoo, Marianne Arriola, Aaron Gokaslan, Edgar Mariano Marroquin, Alexander M. Rush, Yair Schiff, Justin T. Chiu and Volodymyr Kuleshov, "Simple and Effective Masked Diffusion Language Models," arXiv:2406.07524, NeurIPS 2024. https://arxiv.org/abs/2406.07524 . Accessed 2026-05-19. ↩
Inception Labs, official website. https://www.inceptionlabs.ai/ . Accessed 2026-05-19. ↩
Aaron Lou, Chenlin Meng and Stefano Ermon, "Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution," arXiv:2310.16834, ICML 2024. https://arxiv.org/abs/2310.16834 . Accessed 2026-05-19. ↩
Stefano Ermon, post on X about SEDD. https://x.com/StefanoErmon/status/1763454106767880310 . Accessed 2026-05-19. ↩
MDLM repository (kuleshov-group/mdlm). https://github.com/kuleshov-group/mdlm . Accessed 2026-05-19. ↩
Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo and Volodymyr Kuleshov, "Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models," arXiv:2503.09573, ICLR 2025 (Oral). https://arxiv.org/pdf/2503.09573v1 . Accessed 2026-05-19. ↩
BD3-LMs project page and repository. https://github.com/kuleshov-group/bd3lms . Accessed 2026-05-19. ↩
Mayfield Fund, "Introducing Inception Labs." https://www.mayfield.com/introducing-inception-labs/ . Accessed 2026-05-19. ↩
"Inception Selected to Participate in the 2025 AWS Generative AI Accelerator," WebWire. https://www.webwire.com/ViewPressRel.asp?aId=344856 . Accessed 2026-05-19. ↩
Maria Deutscher, "Low-latency AI model pioneer Inception nabs $50M led by Menlo Ventures," SiliconAngle, November 6, 2025. https://siliconangle.com/2025/11/06/low-latency-llm-pioneer-inception-nabs-50m-led-menlo-ventures/ . Accessed 2026-05-19. ↩
"Inception Labs Introduces Mercury: A Diffusion-Based Language Model for Ultra-Fast Code Generation," MarkTechPost, June 26, 2025. https://www.marktechpost.com/2025/06/26/inception-labs-introduces-mercury-a-diffusion-based-language-model-for-ultra-fast-code-generation/ . Accessed 2026-05-19. ↩
"Introducing Mercury, our General Chat Diffusion Large Language Model," Inception Labs blog. https://www.inceptionlabs.ai/blog/introducing-mercury-our-general-chat-model . Accessed 2026-05-19. ↩
"Mercury foundation models from Inception Labs are now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart," AWS Machine Learning Blog, August 27, 2025. https://aws.amazon.com/blogs/machine-learning/mercury-foundation-models-from-inception-labs-are-now-available-in-amazon-bedrock-marketplace-and-amazon-sagemaker-jumpstart/ . Accessed 2026-05-19. ↩
"Mercury Diffusion LLM Now Available on Azure AI Foundry," Inception Labs blog. https://www.inceptionlabs.ai/blog/mercury-azure-foundry . Accessed 2026-05-19. ↩
Inception Labs Enterprise page. https://www.inceptionlabs.ai/enterprise . Accessed 2026-05-19. ↩
"Inception Launches Mercury 2, the Fastest Reasoning LLM - 5x Faster Than Leading Speed-Optimized LLMs, with Dramatically Lower Inference Cost," Business Wire, February 24, 2026. https://www.businesswire.com/news/home/20260224034496/en/Inception-Launches-Mercury-2-the-Fastest-Reasoning-LLM-5x-Faster-Than-Leading-Speed-Optimized-LLMs-with-Dramatically-Lower-Inference-Cost . Accessed 2026-05-19. ↩
"Inception says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini," The New Stack. https://thenewstack.io/inception-labs-mercury-2-diffusion/ . Accessed 2026-05-19. ↩
"Silicon Valley Startup Inception Labs Creates Faster LLM," PYMNTS, 2025. https://www.pymnts.com/artificial-intelligence-2/2025/silicon-valley-startup-inception-labs-creates-faster-llm/ . Accessed 2026-05-19. ↩
"Introducing Mercury 2," Inception Labs blog, 2026. https://www.inceptionlabs.ai/blog/introducing-mercury-2 . Accessed 2026-07-08. ↩
"Our Models," Inception Labs. https://www.inceptionlabs.ai/models . Accessed 2026-07-08. ↩
"Introducing Mercury Edit 2," Inception Labs blog, 2026. https://www.inceptionlabs.ai/blog/introducing-mercury-edit-2 . Accessed 2026-07-08. ↩
"Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game," Decrypt, June 21, 2026. https://decrypt.co/371722/inception-labs-mercury-2-ai-beats-googles-diffusiongemma . Accessed 2026-07-08. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Autoregressive Model Diffusion Language Models Discrete diffusion language model Gemini Diffusion LLaDA (Large Language Diffusion)Mercury (Inception Labs)SEDD (Score Entropy Discrete Diffusion)Stefano Ermon

Who founded Inception Labs?

What research is Inception's technology based on?

How much has Inception Labs raised?

What is Mercury Coder?

Mercury technical report

Subsequent products

Mercury (general chat model)

Cloud and API availability

Mercury 2 and Mercury Edit 2 (2026)

How do diffusion language models work?

How does Mercury compare to autoregressive LLMs?

Industry reception and adoption

Recent developments

References

Improve this article

Related Articles

Diffusion Language Models

LLaDA (Large Language Diffusion)

Mercury (Inception Labs)

Gemini Diffusion

Midjourney

Black Forest Labs

What links here

Related Articles

Diffusion Language Models

LLaDA (Large Language Diffusion)

Mercury (Inception Labs)

Gemini Diffusion

Midjourney

Black Forest Labs

What links here