Essential AI
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,347 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,347 words
Add missing citations, update stale details, or suggest a clearer explanation.
Essential AI is a San Francisco based artificial intelligence research company founded in 2023 by Ashish Vaswani and Niki Parmar, two co-authors of the 2017 paper "Attention Is All You Need" that introduced the transformer architecture.[1][2] The company emerged from stealth in December 2023 with $56.5 million in Series A funding led by March Capital, alongside backers including Google, Nvidia, AMD, Thrive Capital, Franklin Venture Partners, and KB Investment, bringing total disclosed funding to roughly $65 million.[1][3] Essential AI initially framed its mission around an "enterprise brain" that would automate knowledge work, and the firm has since shifted its public profile toward openly released artifacts: the Essential-Web v1.0 web data corpus (June 2025) and the Rnj-1 family of open weight language models (December 2025).[4][5][6] The company is led by Ashish Vaswani as chief executive officer and operates a research stack that combines TPU and AMD Instinct GPU compute, the Muon optimizer, and a custom data taxonomy pipeline.[4][5]
| Attribute | Detail |
|---|---|
| Legal name | Essential AI Labs, Inc. |
| Headquarters | San Francisco, California |
| Founded | 2023 (seed announced May 4, 2023) |
| Founders | Ashish Vaswani, Niki Parmar |
| Chief executive | Ashish Vaswani |
| Seed round | $8.3 million, led by Thrive Capital |
| Series A | $56.5 million, led by March Capital (announced December 11, 2023) |
| Total disclosed funding | approximately $65 million |
| Notable investors | Thrive Capital, March Capital, Google, Nvidia, AMD, Franklin Venture Partners, KB Investment, Conviction, Elad Gil |
| Public artifacts | Essential-Web v1.0 (June 2025), EAI-Distill-0.5b, Muon pretraining paper (May 2025), Rnj-1 base/instruct (December 2025) |
| Initial product framing | "Enterprise Brain", full-stack enterprise automation |
| Later public framing (2025) | "Open platform" for STEM and code, agentic foundation models |
Sources: [1][2][3][4][5][6][7]
Essential AI was founded by two researchers whose careers had run in parallel from Google Brain through the founding cohort of Adept AI Labs. Ashish Vaswani is the lead author of "Attention Is All You Need", the 2017 paper that introduced the transformer architecture and that has become one of the most cited works in modern machine learning.[2][8] Vaswani holds a PhD in computer science from the University of Southern California and worked at Google Brain between roughly 2016 and 2021, where he led research that produced the seminal paper.[2][8] Niki Parmar, a co-author of the same paper, holds a bachelor's degree from the Pune Institute of Computer Technology and a master's degree in computer science from the University of Southern California, and was the only non-PhD researcher on the original "Attention Is All You Need" author list at the time of publication.[9]
In late 2021 Vaswani, Parmar, and former Google director David Luan left Google to co-found Adept AI, which emerged from stealth in April 2022 with a $65 million Series A round.[10][11] Adept positioned itself as building "Action Transformer" models that could control software through natural language. In September 2022 the company published its first model, ACT-1, but by late 2022 the founder group had begun to fracture.[10] According to multiple reports, Vaswani and Parmar departed Adept in November 2022, after about one year at the company, to begin work on what would become Essential AI.[10][11] David Luan and the remainder of the Adept leadership later joined Amazon in mid 2024 under a licensing arrangement with Amazon's AGI organisation, marking the effective end of Adept as an independent foundation model effort.[11]
Essential AI's first publicly reported funding round was an $8 million seed round that closed in early 2023 and was reported by Reuters on May 4, 2023.[12] Thrive Capital led the round; Conviction and Elad Gil were also disclosed participants.[12] At that point the company remained in stealth and its only public description was that it would "build software for enterprises to use large language models".[12]
Essential AI announced its $56.5 million Series A on December 11, 2023, formally exiting stealth.[1][3] The round was led by March Capital and included AMD, Franklin Venture Partners, Google, KB Investment, Nvidia, and existing seed investor Thrive Capital.[1][3] Combined with the prior seed financing, total disclosed funding stood at approximately $65 million at exit from stealth.[1] The Series A press materials disclosed an expanded seed amount of $8.3 million (compared with the $8 million figure reported earlier in 2023) and listed additional individual seed investors including Brad Gerstner, Amjad Masad, David H. Petraeus, Francis D'Souza, Gustavo Sapoznik, Jamie Montgomery, and Mei Zuo.[1][3]
The investor base is notable for combining hyperscaler interests (Google), GPU and accelerator vendors (Nvidia, AMD), and conventional venture capital. The AMD investment correlates with Essential AI's later disclosure that it ran substantial training and inference workloads on AMD Instinct MI300X accelerators (see Technical infrastructure below).[4][5] As of the company's December 2025 Rnj-1 announcement, no additional public funding rounds beyond the Series A had been disclosed, and reporting continued to cite the cumulative figure of approximately $65 million.[6][13]
| Round | Date | Lead | Amount | Notable participants |
|---|---|---|---|---|
| Seed | reported May 4, 2023 (closed earlier) | Thrive Capital | $8.0M (later disclosed as $8.3M) | Conviction, Elad Gil, Brad Gerstner, Amjad Masad, David H. Petraeus, others[1][12] |
| Series A | December 11, 2023 | March Capital | $56.5M | Google, Nvidia, AMD, Thrive Capital, Franklin Venture Partners, KB Investment[1][3] |
At launch, Essential AI articulated a mission centered on enterprise productivity. The Series A press release described the company's goal as building "the Enterprise Brain" and "full-stack AI products that quickly learn to increase productivity by automating time-consuming and monotonous workflows".[1] Founder statements emphasised a partnership model in which large language models would assist knowledge workers rather than displace them, with a particular emphasis on data analytics, financial analysis, and other corporate workflows.[1][7] Coverage in late 2023 by VentureBeat and Computerworld described products in development to help business users make data-driven decisions independently and to expand the productivity of data science teams.[14][7]
Over the course of 2024 and 2025 the public framing of Essential AI evolved. Although the company has never publicly disclosed a specific enterprise product, its blog and research outputs increasingly emphasised open infrastructure and frontier capabilities in code and STEM. By the time of the Rnj-1 release in December 2025 the company described itself as building "an open platform to accelerate the science and engineering of deep learning", with declared focus areas in long context code automation, whole repository refactoring, and kernel optimisation on next generation accelerators.[5][6] Bloomberg reporting at the Rnj-1 launch characterised the strategy as a pivot from a closed enterprise focus to open weight research, while preserving the original investor base.[13] In this framing Essential AI sits alongside other relatively small, well capitalised foundation model labs founded by ex-Google or ex-OpenAI researchers, including Cohere, Inflection AI, and Imbue.
Essential AI has published a small, technically dense set of research artifacts that together describe the company's training stack. The three principal public outputs, each carrying the Essential AI institutional affiliation, are described below.
In May 2025 Essential AI published "Practical Efficiency of Muon for Pretraining" on arXiv (2505.02222).[15] The paper, with 23 listed authors at Essential AI and Vaswani as the senior author, presents the company's empirical case for adopting the Muon optimizer in place of AdamW for large language model pretraining.[15] The headline claim is that "Muon, the simplest instantiation of a second order optimizer, explicitly expands the Pareto frontier over AdamW on the compute time tradeoff."[15] Specifically, the authors report that Muon retains data efficiency at large batch sizes well beyond the critical batch size at which AdamW degrades, while remaining computationally cheap enough to be practical at the scale of foundation model training.[15]
A secondary contribution of the paper is the first empirical demonstration of maximal update parametrization (muP) being used to calibrate hyperparameters for an LLM trained with Muon.[15] The authors introduce a telescoping algorithm for handling error in muP transfer and run ablations across models up to four billion parameters.[15] In a companion blog post Essential AI also published a critical observation, titled "Muon Doesn't Clearly Grok faster", which examines the relationship between Muon and grokking dynamics.[5]
Essential AI's most cited release is Essential-Web v1.0, a 24 trillion token web corpus published on arXiv (2506.14111) on June 17, 2025 and released on Hugging Face under the Open Data Commons Attribution licence.[4][16] The dataset contains 23.6 billion deduplicated documents drawn from 101 Common Crawl snapshots between CC-MAIN-2013-20 and CC-MAIN-2024-38, with total storage on the order of 75 TB.[16] Each document carries a 12 field metadata schema that the paper refers to as EAI-Taxonomy, organised into five groups:
The labels are produced by EAI-Distill-0.5b, a 0.5 billion parameter classifier distilled from Qwen2.5-32B-Instruct (selected as the teacher after a comparison that included DeepSeek-V3 and Qwen2.5-72B-Instruct).[4][16] The student model achieves an annotator agreement within roughly three percentage points (Cohen's kappa 0.71 versus 0.74) of the teacher while running about 50 times faster, with the full corpus annotated using approximately 90,000 AMD MI300X GPU hours over about a week.[16]
The processing pipeline removes documents through global exact deduplication using xxhash, locality sensitive hashing based deduplication with a Jaccard threshold of 0.7 (14 bands of 9 rows each), and quality filters drawn from the RedPajama V2 pipeline and the DCLM baseline fastText classifier.[16] Total document throughput is reduced from 248.4 billion raw documents to 23.6 billion documents, with about 45.6% removed at exact deduplication and a further 50.8% by language filtering.[16]
The paper's headline empirical result is that simple SQL style filters over the EAI-Taxonomy metadata produce competitive domain specific subsets without requiring a dedicated classifier. Compared against the strongest published baselines, taxonomy filtered subsets perform at roughly negative 8.0% on mathematics, positive 14.3% on web code, positive 24.5% on STEM (MMLU STEM), and positive 8.6% on medical evaluation suites.[4][16] Specific reported numbers include 28.7% HumanEval+ for the taxonomy filtered code mixture (versus 28.0% for DCLM baseline) and 47.0% MMLU CS (versus 32.0% for DCLM baseline).[16]
On December 5 to 8, 2025 Essential AI released Rnj-1, the company's first publicly available pair of open weight language models, alongside an announcement blog and a research note.[5][6][17] The release was reported by Bloomberg and was widely covered as the first language model directly attributed to Vaswani's new lab.[13]
Rnj-1 (pronounced "range one" and named in homage to the mathematician Srinivasa Ramanujan) consists of two checkpoints, a base model and an instruction tuned model.[5][17] Both contain approximately 8.3 billion parameters and follow an architecture broadly similar to Gemma 3, with 32 transformer layers, a model dimension of 4096, an MLP dimension of 16384, 32 attention heads, 8 key value heads, a head dimension of 128, GeGLU activations, and a vocabulary of 128,000 tokens.[17] The pretraining context is 8,192 tokens, extended to 32,768 by mid training and to 128,000 through YaRN RoPE scaling.[17]
Training consisted of three phases:
| Phase | Tokens | Global batch | Learning rate schedule |
|---|---|---|---|
| Pretraining (8K context) | 8.4 trillion | 18M tokens | WSD (warmup, stable, decay, final stable)[17] |
| Mid training (context extension to 32K) | 380 billion | 24M tokens | Fixed 2e-5[17] |
| Supervised fine tuning | 150 billion | 16M tokens | Fixed 2e-5[17] |
The Muon optimizer is used throughout training, providing the practical validation of the company's earlier Muon paper at the 8 billion parameter scale.[5][17] Training is distributed across both Google TPU v5p hardware and AMD Instinct MI300X GPUs, reflecting the involvement of both Google and AMD as investors.[5][17] The model is released under the Apache 2.0 licence as EssentialAI/rnj-1 and EssentialAI/rnj-1-instruct, with quantised GGUF builds also distributed.[17]
Essential AI positions Rnj-1 as a code and STEM focused model rather than a generalist chatbot. On reported benchmarks the model achieves strong results on HumanEval+, MBPP+, BigCodeBench, and LiveCodeBench v6, with the instruct variant reaching about 86.21% on HumanEval fill in the middle (HE-FIM-Python).[17] On SWE-bench Verified (in a bash only harness) Rnj-1 instruct scores about 20.8%, which Essential AI describes as "an order of magnitude" higher than comparably sized 8B class open models, and the company specifically calls out cases where Rnj-1 outperforms much larger models such as gpt-oss 20B on agentic tasks.[5][17] Mathematics performance includes about 92.6% on GSM8K and competitive performance on AIME 2025, while science performance is reported as competitive on GPQA Diamond and SuperGPQA.[17] Function calling performance on the Berkeley Function Calling Leaderboard is reported as high.[5][17] The model card explicitly notes regressions under 128K extrapolation, identity confusion between providers, and weakness on factual recall.[17] Bloomberg reported that Rnj-1's coding scores approached those of GPT-4o despite the much smaller parameter count, which the company attributed to the combination of disciplined pretraining data curation (drawn from Essential-Web v1.0) and the Muon optimizer.[13]
Essential AI has also published an arXiv paper titled "Rethinking Reflection in Pre-Training" (2504.04022) examining how reasoning behaviour emerges during pretraining as opposed to post training.[18] The paper shares authorship overlap with the Muon and Essential-Web papers, including Andrew Hojel, Michael Pust, Mohit Parmar, and Vaswani.[18] The blog also publishes notes on related topics such as Muon and grokking dynamics.[5]
Public Essential AI papers and model cards describe an infrastructure stack that mixes multiple accelerator architectures. Essential-Web v1.0 was annotated using "approximately 90,000 AMD MI300X GPU hours" over approximately one week using a cluster of 512 AMD Instinct MI300X GPUs.[16] The Rnj-1 model card states that training was "distributed across TPU v5p and AMD MI300X infrastructure" and that the model is robust to inference time quantisation in BF16, FP8, and NVFP4 numerical formats.[17] These disclosures are consistent with the structure of the Series A, which combined investments from Google (a major TPU operator) and AMD.
Training and inference tooling described in public artifacts includes the Muon optimizer (used in place of AdamW across pretraining and post training), the muP parametrisation for hyperparameter transfer, a WSD learning rate schedule, and inference support through vLLM and the standard Hugging Face Transformers library.[15][17]
Public information about the Essential AI team is concentrated in arXiv paper author lists, the company website, and LinkedIn coverage. Ashish Vaswani serves as chief executive officer and lead author on the company's principal papers.[4][5][15][17] Niki Parmar was the co-founder named alongside Vaswani in funding announcements and the original 2023 press materials.[1][9] As of 2026, public reporting indicates that Parmar moved on from a full time Essential AI role to a Member of Technical Staff position at Anthropic, working on frontier capabilities and reinforcement learning research.[9]
The author lists on the Muon, Essential-Web, and Rnj-1 papers identify a recurring research engineering team, including Andrew Hojel, Michael Pust, Mohit Parmar, Tim Romanski, Yash Vanjani, Ritvik Kapila, Anthony M. Polloreno, Karl Stratos, Philip Monk, Adarsh Chaluvaraju, Andrew Ma, Anil Thomas, Ashish Tanwer, Darsh J. Shah, Khoi Nguyen, Kurt Smith, Michael Callahan, Platon Mazarakis, Peter Rushton, Saurabh Srivastava, Somanshu Singla, and Ishaan Shah.[15][16] Reporting at the time of the Series A indicated that Essential AI was building a multidisciplinary team spanning engineering, research, design, and product, and aiexpert.network's coverage describes a team explicitly drawn from machine learning research backgrounds.[7]
Essential AI is one of a cluster of foundation model startups founded by Google or Google Brain veterans during 2021 to 2023, several of which share investors, technical lineage, or strategic positioning. The table below summarises a comparison among five companies often discussed alongside Essential AI: Adept AI, Inflection AI, Cohere, Imbue, and Magic. Numbers are drawn from public funding disclosures and product announcements.
| Company | Founded | Founders (selected) | Total disclosed funding | Original focus | Current status / pivot |
|---|---|---|---|---|---|
| Essential AI | 2023 | Ashish Vaswani, Niki Parmar | approx. $65M[1] | Enterprise automation ("Enterprise Brain")[1] | Open code/STEM models (Rnj-1), open datasets (Essential-Web)[5][6][13] |
| Adept AI | 2022 | David Luan, Ashish Vaswani, Niki Parmar | approx. $415M[11] | Action Transformer browser/desktop agents (ACT-1)[10] | Founders and core team joined Amazon's AGI org in mid 2024; IP licensed to Amazon[11] |
| Inflection AI | 2022 | Mustafa Suleyman, Reid Hoffman, Karen Simonyan | large multi billion dollar rounds (publicly reported) | Personal AI ("Pi" assistant) | Most staff and key founders joined Microsoft in early 2024; remaining business pivoted to enterprise |
| Cohere | 2019 | Aidan Gomez, Ivan Zhang, Nick Frosst | over $1 billion (public reporting) | Enterprise LLM API platform | Continued independent enterprise focus, multiple model generations (Command series) |
| Imbue | 2021 (renamed from Generally Intelligent) | Kanjun Qiu, Josh Albrecht | over $200 million | Reasoning agents and code agents | Continued independent operation |
| Magic | 2022 | Eric Steinberger, Sebastian De Ro | over $400 million | Long context code generation models | Continued independent operation |
Essential AI is distinguished within this group by the strength of its founders' technical lineage as transformer co-authors, its relatively modest funding level (under $100 million, in contrast with multi billion dollar peers), and its more recent shift from a closed enterprise framing to open weight releases.[1][6][13]
Essential AI's significance derives less from its initial commercial product (which remains unannounced) than from the public artifacts the company has released since 2025. Three points are commonly highlighted in coverage of the firm:
A secondary form of significance lies in Essential AI's role as an example of a now common pattern in foundation model development: small, well capitalised laboratories led by alumni of the early transformer era, releasing open weight models that are competitive with proprietary systems several times their size on specific benchmark categories. Coverage by trade publications such as Gigazine described Rnj-1 as one of the first 8 billion parameter open weight models from a U.S. based laboratory to claim near GPT-4o code generation results on multiple coding suites, although third party reproduction of the company's numbers, particularly on SWE-bench Verified at the level Essential AI reports, was still in progress at the time of release.[13][17]
The Essential-Web release in particular has been cited in subsequent academic and industry discussions of pretraining data composition. Its principal methodological contribution, the use of an explicit taxonomy with metadata fields that can be queried directly using SQL style filters, contrasts with two earlier dominant approaches: heuristic n-gram and perplexity filtering as used in RedPajama and quality scoring through fastText classifiers as used in DCLM and FineWeb.[4][16] By exposing the underlying classifier signals as document level metadata, Essential AI lowers the cost of constructing domain specific subsets and makes ablation studies easier to design.
Essential AI's external communications have been concentrated in three channels: the company blog at essential.ai/blog, paper releases on arXiv, and dataset and model releases on Hugging Face under the EssentialAI organisation. The company also operates accounts on the X social network (@essential_ai) and LinkedIn, and maintains a public Discord community for users of the Rnj-1 model family.[5][17] In contrast with many similarly funded foundation model startups, Essential AI has not pursued sustained press outreach: most of its mainstream coverage has been driven by the two named launch moments (the December 2023 Series A and the December 2025 Rnj-1 release) and by the secondary attention generated by reaching the top of trending model and dataset lists on Hugging Face.[4][6][13][17]
The company's blog posts typically take a technical rather than promotional register. The accompanying "Muon Doesn't Clearly Grok faster" post, for example, is a relatively narrow methodological commentary on the relationship between the Muon optimizer and grokking dynamics in small models, and is consistent with the more general norms of contemporary research lab blogs at organisations such as Anthropic and Cohere.[5] The Rnj-1 launch blog post is structured around model architecture, training recipe, benchmark results, and a discussion of intended use cases (long context code automation, kernel optimisation on new accelerators, and whole repository refactoring), with limited marketing material.[6]
A number of caveats apply to Essential AI's public record. The company has not, as of early 2026, publicly launched any of the enterprise products discussed at its Series A in December 2023, and Bloomberg's coverage of the Rnj-1 release explicitly framed the transition as a strategic pivot rather than a planned next phase of the original roadmap.[13] Reporting on Adept's late 2024 effective sale to Amazon, including the departure of the original CEO David Luan, also raised questions about durability across the Vaswani-Parmar founder cohort, although the two Adept co-founders in question had already left Adept for Essential more than a year earlier.[11]
The Essential-Web paper's benchmark gains are domain specific and rely on relatively small training token budgets (typically 80 billion tokens at the comparison scale) where ranking effects can be sensitive to choice of evaluation suite.[16] On at least one domain, mathematics, EAI-Taxonomy filtered data underperformed the best published baseline (FineMath 3+) by roughly 8 percentage points on the GSM8K benchmark at matched scale.[16] The Rnj-1 model card additionally notes that the model is "not optimised for factual recall", that it exhibits a strong tendency to emit code even in non coding contexts, and that some quality regressions occur under 128K extrapolation, particularly on MMLU STEM.[17] Identity confusion with other model providers, in which the model claims to be an unrelated commercial system, is also documented in the model card.[17] These behaviours are consistent with limitations seen across the broader category of 8B class instruction tuned open weight models.
The company has not published explicit revenue, customer, or commercial deployment figures, and the most detailed publicly available descriptions of staffing and infrastructure are drawn from paper author lists rather than direct disclosure.[4][5][16][17]
See Adept AI, Inflection AI, Cohere, and Mistral AI for adjacent foundation model startups. See Muon and AdamW for the optimizers discussed in Essential AI's pretraining research. See Common Crawl, RedPajama, DCLM, and FineWeb for related web scale pretraining datasets. See Gemma 3 for the open architecture template most closely followed by Rnj-1.