Imbue
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,264 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,264 words
Add missing citations, update stale details, or suggest a clearer explanation.
Imbue is a San Francisco artificial intelligence research lab focused on training foundation models and building agent systems oriented toward reasoning and software engineering. The company was co-founded by Kanjun Qiu (chief executive officer) and Josh Albrecht (chief technology officer) under the name Generally Intelligent and rebranded to Imbue in September 2023 in connection with a $200 million Series B round that valued the lab at more than $1 billion.[^1][^2][^3] Imbue trains custom large language models, with publicly disclosed work on a 70B-parameter model trained on a 4,088 GPU NVIDIA H100 cluster, and has released a body of open-source infrastructure, evaluation datasets, and a Bayesian Optimization based hyperparameter optimizer named CARBS.[^4][^5][^6] Beyond model training, Imbue has shipped developer tools including Sculptor, an interface for running parallel coding agents in isolated containers, and continues to publish research on the theoretical underpinnings of deep learning.[^7][^8]
| Item | Detail |
|---|---|
| Legal name | Imbue, Inc. |
| Prior name | Generally Intelligent (2021 to September 2023) |
| Co-founders | Kanjun Qiu (CEO), Josh Albrecht (CTO) |
| Headquarters | San Francisco, California |
| Founded | 2021 |
| Renamed | September 2023 |
| Series B (initial) | $200 million, announced September 7, 2023 |
| Series B (extension) | Additional $12 million, October 23, 2023 |
| Reported valuation | Over $1 billion (Series B) |
| Lead investor | Astera Institute |
| Notable Series B investors | Nvidia, Kyle Vogt, Simon Last, Drew Houston, Notion's Akshay Kothari, Amazon Alexa Fund, Eric Schmidt, Tom Brown |
| Compute partner | Voltage Park, Dell, Nvidia (4,088 H100 GPU cluster) |
| Mission focus | Foundation models for reasoning, code, and agents |
| Open-source releases | cluster-health, carbs, sculptor, evaluation datasets |
The company was founded in 2021 by Kanjun Qiu and Josh Albrecht, who had previously collaborated on Sourceress, an AI agent adjacent machine-learning recruiting startup that participated in Y Combinator and raised approximately $13 million in venture capital before winding down operations.[^9] Earlier still, Qiu and Albrecht co-founded Ember Hardware, which worked on laser-based projection displays intended for virtual reality.[^10] Qiu's professional background also included a stint as the first chief of staff at Dropbox during a period of rapid scaling from roughly 200 to 1,200 employees, an experience she has described as formative for thinking about institutional design and operational coordination. She studied computer science at MIT, where she also worked as a graduate researcher at the MIT Media Lab and supported her tuition by writing high-frequency trading algorithms. Albrecht had previously co-founded BitBlinder, a privacy-focused peer-to-peer tool, and CloudFab, a 3D printing services company; the two founders have also co-hosted the Generally Intelligent research podcast since late 2020 and jointly run Outset Capital, a small early-stage investment vehicle for technical founders.[^10][^11]
Generally Intelligent emerged from stealth on October 20, 2022 with an initial financing of approximately $20 million plus more than $100 million in optioned drawdown commitments structured as a multi-year capital agreement. The company described its mission as developing "generally capable agents with human-like intelligence" that could be safely deployed in the real world. Investors disclosed at the time included Tom Brown, the former GPT-3 engineering lead at OpenAI; Jonas Schneider, the former OpenAI robotics lead; Dropbox co-founders Drew Houston and Arash Ferdowsi; the Astera Institute, a nonprofit research funder associated with Jed McCaleb; and Tim Hanson, a founding member of the Neuralink team who joined the Generally Intelligent board.[^10] Initial public coverage emphasized the lab's deliberate avoidance of immediate productization in favor of long-horizon research on what the founders described as the cognitive components missing from contemporary models.[^10]
Before the rebrand, Generally Intelligent was best known publicly for two assets. The first was the Generally Intelligent podcast, which Qiu and Albrecht launched in December 2020 as a long-form interview show "by deep learning researchers, for deep learning researchers," featuring guests including Percy Liang of Stanford, Chelsea Finn of Stanford and Google, Tri Dao of Princeton, Jamie Simon of UC Berkeley, and Rylan Schaeffer of Stanford; the show has been widely cited within the machine learning community as one of the more technically detailed interview programs covering training dynamics and theory.[^11] The second was Avalon, a procedurally generated 3D reinforcement learning benchmark presented at NeurIPS 2022 in the Datasets and Benchmarks track, where embodied agents face survival tasks such as climbing, jumping, hunting, and foraging in worlds that share a common reward function and action space across 20 task families.[^12][^13] The Avalon paper, with Albrecht and Qiu as senior authors, framed the benchmark as a way to study sample efficiency and generalization for reinforcement learning in environments resembling the kinds of physical and ecological challenges that shaped the evolution of biological intelligence.[^12]
On September 7, 2023, Generally Intelligent announced a $200 million Series B funding round and a corresponding rebrand to Imbue.[^1][^2][^3] The round was led by the Astera Institute, with participation from Nvidia, Cruise co-founder and former chief executive Kyle Vogt, and Notion co-founder Simon Last; additional Series B participants disclosed in later coverage included Notion co-founder and chief operating officer Akshay Kothari and previously involved angel investors.[^1][^3] The round took total funding to approximately $220 million and assigned a valuation in excess of $1 billion, placing Imbue among the AI "unicorns" of 2023 alongside Adept AI, Inflection AI, and other applied agent companies.[^2][^3]
In a blog post accompanying the announcement, Qiu wrote that the new name reflected a desire to focus on "imbuing computers with the ability to reason," and reiterated that the company would not seek to compete with frontier model providers on raw scale but would instead train smaller specialized models around reasoning capability for agents.[^1] TechCrunch coverage described the rebrand as a deliberate departure from the more research-oriented Generally Intelligent identity, with Imbue framing its commercial vision around AI systems that could write, debug, and operate software autonomously on behalf of end users.[^1]
On October 23, 2023, Imbue disclosed an extension to the Series B in which the Amazon Alexa Fund and former Google chief executive Eric Schmidt contributed an additional $12 million, bringing the total Series B above $210 million.[^14] The extension did not change the headline valuation but signaled additional industry interest from cloud and former big-tech principals.
In parallel with the funding announcement, Imbue disclosed a multi-year partnership with Dell Technologies to deploy a high-performance computing system valued at approximately $150 million, built around NVIDIA H100 GPU accelerators.[^4] Reporting at the time and Imbue's later technical writeups confirmed that the cluster contained approximately 10,000 H100 GPUs in aggregate capacity, with the lab's primary training partition consisting of 4,088 H100 GPUs spread across 511 hosts (eight GPUs per node) connected by a three-tier InfiniBand fabric.[^4][^5] Hosting and physical infrastructure were provided by Voltage Park, a bare-metal H100 provider associated with Astera Institute, with additional cooperation from Dell and Nvidia on hardware and firmware.[^5][^15]
Across late 2023 and 2024, Imbue used the cluster to pre-train a 70B-parameter Large Language Model from scratch on roughly two trillion tokens, with the goal of producing a base model fine-tuned for multi-step reasoning and code understanding.[^6] On June 25, 2024 the lab released a four-part technical series describing the project, alongside several open-source repositories, and Imbue CTO Josh Albrecht discussed the work in a Latent Space podcast episode co-hosted by Swyx and Databricks chief AI scientist Jonathan Frankle.[^15][^16]
Following the 70B project, Imbue's public output shifted toward developer tooling. In April 2025 the company released Sculptor, a research preview of a desktop application that orchestrates multiple parallel coding agents, each running in its own container with an isolated copy of the working repository.[^7][^17] Sculptor was re-released as a more polished application on September 26, 2025 after a ground-up rebuild that integrated Claude Code for the underlying agent reasoning and introduced a "Pairing Mode" that syncs container state to the developer's local environment.[^7][^17] By 2026 Imbue's product lineup also included mngr, a parallel agent runner; Bouncer, a Twitter (X) feed filtering tool; and several smaller open-source utilities (Blueprint, Vet, Latchkey) published on the company's GitHub organizations.[^8] In early 2026 the lab also published research blog posts titled "Beating ARC-AGI-2 with Code Evolution" and "There Will Be a Scientific Theory of Deep Learning," signaling a continued investment in fundamental research alongside applied work on agents.[^8]
Throughout 2025 and 2026, Imbue's company-facing communications also emphasized policy positions, including the role of individual developers in maintaining bargaining power against centralized AI providers; the company's stated philosophy that "your tools should work for you, not the company that built them" was reflected in choices such as bring-your-own Claude API keys for Sculptor and a preference for MIT-licensed tooling.[^7][^8]
Imbue describes its mission as building "AI that works for humans, not the company that built them," with a stated emphasis on amplifying individual human agency rather than wholesale automation.[^8] The technical thesis behind the company is that reasoning, not raw scale or knowledge retrieval, is the dominant bottleneck preventing today's AI agents from completing long-horizon tasks reliably.
In a 2023 conversation with the Latent Space podcast, Qiu argued that contemporary large language models lack explicit reasoning data during pre-training and that capable agents will require optimization of reasoning skills as a first-class objective rather than a fine-tuning afterthought.[^18] The Imbue team has framed this as a "full-stack" research agenda spanning four threads:[^8]
This thesis has shaped a deliberate decision not to pursue a frontier generalist model along the lines of GPT-4 or Claude, but instead to train smaller, specialized models that can be combined with explicit reasoning scaffolds, evaluation harnesses, and product surfaces tailored to specific workflows such as code editing.[^1][^18] In several public talks, Qiu has contrasted this approach with the "more data plus more parameters" strategy of frontier laboratories, arguing that scaling alone is unlikely to close the gap between current chatbots and reliable long-horizon agents without changes to how reasoning is trained and evaluated.[^18]
The most substantial public artifact of Imbue's work is a series of four technical posts published in mid-2024 describing the engineering required to train a 70B-parameter model from bare metal on a fresh H100 cluster.[^5][^6][^19] The series, written by "about a dozen engineers and researchers" at the lab, covered hardware bring-up, evaluations, hyperparameter optimization, and an overview of the resulting model.[^6]
The primary training cluster consisted of 4,088 NVIDIA H100 GPU accelerators distributed across 511 host machines, with eight H100 cards per host.[^5] Each GPU was connected to a Mellanox ConnectX-7 InfiniBand network interface card capable of 400 gigabits per second of simultaneous send and receive throughput, with the interconnect organized as a three-tier "fully non-blocking" fat-tree fabric using roughly 320 InfiniBand switches and approximately 12,000 cables.[^5] A separate 100 gigabit per second Ethernet network was used for dataset and checkpoint traffic, and a third management Ethernet network exposed BIOS, baseboard management controller (BMC), and power supply controls for cluster-wide automation.[^5]
The cluster was hosted at Voltage Park facilities, with Dell supplying server and rack hardware and Nvidia providing optimization assistance on NCCL (NVIDIA Collective Communications Library) and InfiniBand UFM management.[^5][^15]
The bare-metal bring-up phase was the focus of one of the most widely circulated entries in the series, which Imbue paired with the open-source release of the imbue-team/cluster-health repository under the MIT License.[^5][^20] The repository collects scripts and utilities developed during commissioning, including:[^20]
ib_burn) that exercises every link in the fabric simultaneously.The blog post accompanying the release reported that the team observed roughly 3% of hosts failing per week during initial training, and that garbage-collection desynchronization across distributed workers was a persistent source of throughput regression.[^15] Imbue argued that direct partnerships with hardware vendors (rather than running on standard cloud images) had been load-bearing for reducing mean time to repair.[^15][^16]
A second open-source release was the cost-aware hyperparameter tuning algorithm CARBS (Cost-Aware Pareto-Region Bayesian Search), available at imbue-ai/carbs under the MIT License and described in a research paper.[^21][^22] CARBS models both task performance and a cost metric (typically GPU-hours) as Gaussian Processes and performs Bayesian Optimization in the neighborhood of the empirically observed performance-cost Pareto frontier, rather than over a fixed global search space.[^22] As a side effect of this local search, the algorithm produces a learned scaling relationship for every hyperparameter as a function of compute, which can be extrapolated to predict tuned values at larger model sizes.[^22]
Reported results include:
CARBS was the optimizer Imbue used to extrapolate hyperparameter choices from smaller proxy models up to the 70B scale, allowing the team to proceed to full-scale training without an exhaustive search at the target size.[^19][^22]
Imbue pre-trained its custom 70B-parameter base model on approximately two trillion tokens, then fine-tuned it on a suite of reasoning and code-comprehension tasks.[^6] The reported architecture uses group-query attention, SwiGLU activations, RMS normalization, and a custom tokenizer, with the base model design borrowing structural choices from the Llama 3 family.[^6] On a battery of multiple-choice reasoning benchmarks (drawn from cleaned versions of 11 public NLP evaluations plus an internal code-comprehension set), Imbue reported that its fine-tuned 70B model outperformed zero-shot GPT-4o on the multiple-choice sub-tasks and approached the performance of a fine-tuned Llama 3 70B model trained on more than seven times as much pre-training data.[^6]
Crucially, Imbue described the training run as proceeding with "minimal training instability and no loss spikes," a result the team attributed to CARBS-derived hyperparameter selection rather than reactive curve fixes during training.[^6] In the public discussion of the project, Albrecht highlighted that this combination of cluster-health automation and CARBS-derived hyperparameters meant the team could run an unusually high-stakes single training job (a multi-week run at full cluster utilization) without the kind of multi-restart history that has been common in other open accounts of training pipelines at the same scale.[^15][^16]
Alongside the model results, the lab released:[^6]
The weights of the base 70B model itself were not openly released; Imbue framed the released datasets, scripts, and CARBS as the publicly useful artifacts of the project.[^6][^15]
Sculptor is Imbue's developer-facing product as of 2026, described as "the missing UI for parallel coding agents."[^7][^17] The application is a desktop client (initially available on macOS Apple Silicon and Linux) that allows a developer to spawn multiple AI coding agents in parallel, each running inside its own container with a copy-on-write clone of the working repository and an isolated dependency environment.[^7][^17] Each agent can install packages, run tests, and execute arbitrary code without affecting the developer's machine, mitigating a category of safety issues associated with permission-skipping modes of long-running coding agents.
Notable Sculptor capabilities include:
Sculptor was released as a research preview in April 2025 and re-released in a rebuilt form on September 26, 2025; through 2026 it has remained free during beta.[^7][^17] The source code is published on GitHub under imbue-ai/sculptor.[^17]
Imbue also publishes several smaller tools through its GitHub organizations, including mngr (a parallel coding-agent runner positioned as a lighter-weight headless version of Sculptor), Bouncer (an X / Twitter feed filtering and "healing" utility), and the experimental Blueprint, Vet, and Latchkey utilities.[^8] Earlier company experiments, including internal demo agents shown in talks, focused on browser-based task execution and natural-language code editing prior to the consolidation around Sculptor as the primary product surface. The lab's product strategy in 2025 and 2026 has been to keep these auxiliary tools open source and small, with Sculptor as the only commercial-scale offering, an approach Qiu has framed as appropriate for a research lab that wants to maintain a long-term reasoning research agenda while still shipping software developers can adopt.[^8]
Within the wider 2022 to 2026 wave of agent-focused AI startups, Imbue occupies a distinctive position. Unlike most application-layer agent companies, the lab has invested heavily in the underlying training stack: a 4,088 H100 cluster, in-house dataset cleaning, a from-scratch 70B pre-training run, and a custom hyperparameter optimizer. At the same time, unlike frontier general-purpose laboratories such as OpenAI, Anthropic, and Google DeepMind, it has chosen not to release a flagship generalist chat model, instead publishing infrastructure, datasets, and product tooling.[^1][^6][^15]
The cluster-health and CARBS releases, in particular, have been cited within the LLM systems community as among the more detailed openly available descriptions of how to bring a 4,000-GPU class H100 cluster from racks to a working training job, including the failure modes and remediation tooling that vendor documentation often omits.[^15][^16][^20] The Avalon benchmark, although less widely used than OpenAI Gym derivatives, remains one of the few open 3D procedurally generated environments designed explicitly for reinforcement learning generalization studies.[^12]
Imbue is frequently grouped with a cluster of mid-2020s AI startups oriented around agents and code, though each peer has taken a different positioning strategy.
| Company | Founded | Focus | Series B-or-later valuation | Public model artifacts |
|---|---|---|---|---|
| Imbue | 2021 | Reasoning foundation models and agent tools | Over $1 billion (Sept 2023)[^2] | 70B base (unreleased); CARBS, cluster-health, evaluation datasets[^6][^15] |
| Adept AI | 2022 | Action transformers for web and desktop UIs; later partial team move to Amazon | (No comparable Series B disclosed) | Fuyu and ACT-1 model families |
| Inflection AI | 2022 | Consumer personal AI (Pi assistant); pivoted to enterprise after 2024 Microsoft deal | $4 billion (mid 2023) | Inflection-1, -2, -2.5, -3 |
| Magic.dev | 2022 | Long-context code generation models | Reported above $1 billion | Internal proprietary models |
| Cognition AI | 2023 | Devin software engineer agent and Windsurf-derived IDE | Multi-billion-dollar valuation | Devin agent platform |
What differentiates Imbue within this group is the deliberate combination of foundation-model training with developer-tool product work, and a relatively heavy emphasis on releasing infrastructure rather than benchmark-leading model weights.[^1][^6]
Several aspects of Imbue's public posture have drawn scrutiny.
First, although the company has disclosed a 70B reasoning-tuned model and shared its evaluation methodology, the model weights themselves have not been released, and independent third-party reproductions of the reported benchmark gains over GPT-4o are limited.[^6] As Imbue does not publish on common public leaderboards, comparison with peer foundation models depends on benchmarks chosen by the lab.
Second, Imbue's headline claim that its 70B fine-tuned model "outperforms GPT-4o zero-shot" applies specifically to a multiple-choice reasoning subset rather than to free-form generation, instruction following, or agentic task completion; this scope has been noted in technical write-ups summarizing the 70B series.[^6][^15]
Third, the company's compute strategy depends on a single bare-metal cluster operated by Voltage Park, an arrangement that combines tight hardware co-design with concentrated vendor risk relative to multi-cloud peers.[^5][^15]
Finally, like other private AI laboratories operating with reasoning agents, Imbue has not yet published the kind of detailed system cards or safety evaluations that frontier labs such as Anthropic and OpenAI now produce alongside model releases, in part because most of its model artifacts are internal.[^6][^15]