Positron AI
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 1,638 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 1,638 words
Add missing citations, update stale details, or suggest a clearer explanation.
Positron AI is an American semiconductor startup headquartered in Reno, Nevada, that designs and manufactures purpose-built hardware for transformer inference. Founded in 2023 by chief executive Mitesh Agrawal and chief technology officer Thomas Sohmers, the company markets a memory-optimized appliance called Atlas that it positions as a more power-efficient alternative to general-purpose Nvidia GPUs for serving large language models.[^1][^2] Positron raised a $230 million Series B in February 2026 at a valuation above $1 billion, bringing total disclosed funding to roughly $305 million and making the three-year-old company an AI-hardware "unicorn."[^3][^4]
Positron was founded in the spring of 2023 with a thesis that mainstream GPUs are over-provisioned for transformer inference workloads, where the ratio of compute to memory operations is close to 1:1 and overall throughput is bottlenecked by memory bandwidth rather than arithmetic.[^5] Rather than chase a general-purpose accelerator, the company designed hardware exclusively around the matrix and attention operations of transformer models, with the explicit goal of running the largest open and proprietary LLMs at the lowest cost-per-token and within power envelopes compatible with conventional air-cooled data centers.[^5][^6]
The company is headquartered in Reno, Nevada, and emphasizes that its hardware is "designed, fabricated, and assembled in the U.S.," with the first-generation silicon fabricated in Arizona.[^4][^7] In its own published timeline, Positron operated with fewer than ten employees through the first eight months of its existence, when it produced its initial Llama-2 7B prototype on an FPGA platform; by Month 15 it had shipped the production Atlas system, and by early 2026 it had completed two major funding rounds.[^6]
Thomas Sohmers, co-founder and chief technology officer, is a serial processor architect who received a Thiel Fellowship in 2013 at age 17 to start REX Computing, a fabless semiconductor company that designed energy-efficient processors for high-performance computing and DSP workloads. He also co-founded a cryptocurrency-ASIC venture called Terapute Technologies and later served as Director of Technology Strategy and Head of Distributed Systems at Groq, where he worked on data-center-scale machine-learning systems before founding Positron.[^8]
Mitesh Agrawal, chief executive officer, joined Positron as CEO in early 2025 after spending roughly eight and a half years at Lambda (formerly Lambda Labs), most recently as chief operating officer. At Lambda he is credited with helping scale the GPU-cloud business from approximately $500,000 to about $500 million in annualized revenue run-rate and with leading several of the company's later funding rounds.[^9][^10] His move from a Nvidia-aligned neocloud to a Nvidia competitor was widely interpreted as a signal that demand for non-Nvidia inference silicon had matured beyond a niche.[^10]
Edward Kmett serves as Chief Scientist. He is identified in Positron's Series A announcement alongside Sohmers and Agrawal as part of the founding leadership team.[^11]
Positron's own materials describe the founding group as "a visionary, an applied mathematician, and an engineer," and disclose that the company recruited Agrawal as its new chief executive at Month 21 of operation, replacing earlier leadership.[^6]
Atlas is Positron's first-generation product, marketed as a complete "Transformer Inference Server" rather than a bare chip.[^12] The system is built around eight Positron Archer transformer accelerators, each paired with 32 GB of HBM for a total of 256 GB of accelerator memory per server.[^12] The Archer accelerators are coupled to dual AMD EPYC Genoa 9374F host processors (64 cores total) and a 24-channel DDR5 memory subsystem standard at 384 GB and expandable to 2 TB, supporting models that spill outside HBM into host memory.[^12]
The first-generation Atlas chips use a memory-optimized, FPGA-based architecture, according to coverage of Positron's Series A round. Positron reports that this design achieves roughly 93% HBM bandwidth utilization, compared with a typical 10–30% utilization on GPU-based systems, and that a single 2-kilowatt Atlas server can host models with up to half a trillion parameters.[^11][^13] The company's stated design principle is that transformer inference is fundamentally a memory-bandwidth problem, so silicon area and power should be spent moving weights into matrix engines rather than on the general-purpose flexibility of a GPU.[^5]
Atlas exposes an OpenAI-compatible API endpoint and is binary-compatible with Hugging Face Transformers model checkpoints, which Positron presents as a drop-in path from existing GPU deployments.[^11]
Positron publishes head-to-head comparisons against Nvidia H100 and H200 systems. On Llama 3.1 8B in BF16 (with no speculative decoding or paged attention), Positron reports approximately 280 tokens per second per user on a 2,000-watt Atlas server, compared with roughly 180 tokens per second per user from an eight-GPU Nvidia DGX H200 drawing about 5,900 watts.[^1][^12] Translated into the company's headline metrics, Atlas claims roughly 3.08× performance-per-dollar and 4.54× performance-per-watt versus the DGX H200, with "3× lower latency in production workloads."[^12] Against the prior-generation H100, Positron has cited 3.5× better performance-per-dollar and up to 66% lower power consumption.[^11][^13]
A single Atlas chassis is 7"H × 19"W × 29.25"D, weighs roughly 100 pounds (45.4 kg), and is powered by dual 2,000-watt redundant Titanium-rated power supplies. It is air-cooled and does not require the liquid cooling or extreme rack power densities associated with current high-end GPU clusters, which Positron presents as enabling deployment in "hundreds of existing data centers" without facility upgrades.[^12][^13]
Positron has disclosed three priced equity rounds, totaling roughly $305 million.[^3]
Positron's first publicly named production customers are:
Positron has also stated that Atlas is "already deployed in production environments" at additional unnamed enterprise customers and "leading neocloud providers."[^13]
Positron's announced second-generation silicon is named Asimov, with custom ASICs (rather than the FPGA-based design of the first Atlas chips) targeted for production in early 2027. Asimov is designed to support up to 2 TB of high-speed memory per accelerator, and the next-generation system that houses it, Titan, is designed to provide 8 TB of memory per system at memory bandwidth comparable to Nvidia's Rubin-generation GPUs. Positron claims Titan will be able to hold models on the order of 16 trillion parameters in a single chassis.[^11][^14] The Series B funding is explicitly earmarked to accelerate the Asimov/Titan roadmap.[^4]
Positron is one of a cohort of post-2020 startups attempting to take share from Nvidia in inference rather than training. Its closest analogues by architectural philosophy include:
Compared with these rivals, Positron's distinguishing bets are (1) attaching the largest practical pool of HBM and host DDR5 to a transformer-specific datapath so that very large models fit in a single 2 kW chassis, and (2) pursuing a U.S.-domestic supply chain — with fabrication in Arizona and assembly in the United States — at a moment when both customers and the U.S. government are paying closer attention to onshore AI infrastructure.[^4][^11]