Cerebras Systems
Last reviewed
May 17, 2026
Sources
39 citations
Review status
Source-backed
Revision
v4 ยท 5,881 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
39 citations
Review status
Source-backed
Revision
v4 ยท 5,881 words
Add missing citations, update stale details, or suggest a clearer explanation.
Cerebras Systems is an American artificial intelligence hardware company that designs and manufactures wafer-scale processors for AI training and inference. Founded in March 2016 and headquartered in Sunnyvale, California, Cerebras is best known for producing the Wafer-Scale Engine (WSE), the largest chip ever made, which integrates an entire silicon wafer into a single processor. The company has positioned itself as a leading challenger to NVIDIA in the AI accelerator market, with a particular focus on high-speed inference for large language models. Cerebras completed an initial public offering on the Nasdaq under the ticker symbol CBRS on May 14, 2026, raising $5.55 billion in what became the largest global IPO of the year [18][19].
Cerebras Systems was incorporated in March 2016 by Andrew Feldman, Gary Lauterbach, Michael James, Sean Lie, and Jean-Philippe Fricker. The five-person founding team had worked together at the energy-efficient microserver company SeaMicro, which Feldman and Lauterbach had previously co-founded. SeaMicro was acquired by AMD in 2012 for $334 million, after which the core engineering group remained inside AMD until breaking away to start Cerebras. The founders brought decades of combined experience in networking, server design, and microprocessor architecture to the new venture [1][20].
The core insight behind Cerebras was that AI workloads, particularly deep learning training, could benefit from a processor built at a radically larger scale than conventional chips. Rather than designing a small chip and connecting many of them together (the approach used by GPUs), Cerebras decided to build a single chip spanning an entire 300mm silicon wafer. This approach eliminates the inter-chip communication bottlenecks that slow down distributed AI systems and allows the entire model to reside on-chip during computation.
The company operated in stealth mode for its first three years before unveiling the original Wafer-Scale Engine (WSE) at the Hot Chips conference in August 2019. During the stealth period the team focused on solving five interlocking engineering problems that had blocked wafer-scale integration for decades, including yield, power delivery, thermal management, cross-reticle wiring, and packaging. By the time Cerebras revealed its first chip, the company had already shipped early CS-1 systems to research customers and was working with the U.S. Department of Energy on scientific computing workloads [11].
The founders of Cerebras have remained closely involved with the company since 2016, an unusual pattern for a hardware startup that scaled past ten years.
| Founder | Role at Cerebras | Background |
|---|---|---|
| Andrew Feldman | Co-founder, CEO | BA Stanford 1991, MBA Stanford GSB 1997; previously CEO of SeaMicro, executive roles at Force10 Networks and RiverStone Networks |
| Gary Lauterbach | Co-founder, CTO | Former chief architect at Sun Microsystems and CTO of SeaMicro |
| Sean Lie | Co-founder, Chief Hardware Architect | Former lead architect at SeaMicro and AMD |
| Michael James | Co-founder, Chief Software Architect | Veteran compiler and runtime engineer with prior work at SeaMicro |
| Jean-Philippe Fricker | Co-founder, Chief System Architect | Systems and packaging engineer with prior work at SeaMicro |
Feldman has been a serial entrepreneur in Silicon Valley, having sold three companies and taken a fourth public before founding Cerebras. He grew up on the campus of Stanford University, where his father was a faculty member, and he completed both undergraduate and graduate degrees there. Following the company's May 2026 IPO, Feldman's stake was valued at roughly $3.2 billion and Sean Lie's at roughly $1.7 billion, making both billionaires on paper [21].
The defining innovation of Cerebras is the Wafer-Scale Engine, a processor that occupies an entire silicon wafer rather than being cut into individual dies like a traditional chip. This approach required Cerebras to solve several engineering challenges that had prevented wafer-scale integration for decades, including managing defects, distributing power evenly, and handling thermal dissipation across such a large area.
Building a chip the size of an entire wafer presented five major technical challenges that Cerebras had to overcome:
Defect tolerance. In traditional chip manufacturing, defective dies are simply discarded. On a wafer-scale chip, defects cannot be avoided but must be managed. Cerebras designed the WSE with approximately 100x the defect tolerance of a conventional GPU. Each AI core on the WSE-3 occupies roughly 0.05 mm2, about 1% the size of an NVIDIA H100 streaming multiprocessor (SM) core. When a defect hits a WSE core, it disables only 0.05 mm2 of silicon, whereas the same defect on an H100 would disable approximately 6 mm2. The WSE-3 contains 970,000 physical cores, with 900,000 active in the shipping product, meaning Cerebras achieves 93% silicon utilization, which is higher than leading GPUs despite building the world's largest chip [10].
Cerebras also developed a sophisticated routing architecture that allows dynamic reconfiguration of connections between cores. When a defect is detected during manufacturing test, the system automatically routes around the disabled core using redundant communication pathways, maintaining the fabric's full bandwidth and connectivity.
Power delivery. Delivering tens of kilowatts of power uniformly across a wafer-sized chip cannot be accomplished with traditional edge-of-die power connections. Cerebras designed a custom engine block that delivers power directly into the face of the wafer, achieving the power density required for hundreds of thousands of active cores. The custom connector and PCB design ensures uniform voltage across the entire wafer surface [11].
Thermal management. With overall power delivery in the mid-teen kilowatt range, the WSE generates substantial heat that must be removed uniformly. Traditional heat sink attachment techniques could not be used because of the thermal expansion mismatch between silicon and copper. Cerebras invented a new material and connector design that allows the wafer to expand and contract while remaining in thermal contact with a copper heat exchanger. Water flows through micro-fins on the backside of the heat exchanger, and the wafer slides against the polished front surface, maintaining thermal coupling despite differing coefficients of thermal expansion [11].
Cross-reticle connectivity. A standard photolithographic reticle can expose only a portion of a wafer at a time. To create a chip that spans the entire wafer, Cerebras had to connect circuits across reticle boundaries with high bandwidth and low latency, a problem unique to wafer-scale integration.
Die-to-die communication. The fabric interconnect that links all 900,000 cores must provide enormous aggregate bandwidth while consuming minimal power and area. The WSE-3's on-chip fabric provides 214 Pbit/s of bandwidth, enabling data to flow between any two cores on the wafer with predictable, low latency.
The first-generation Wafer-Scale Engine was announced in August 2019. It featured 400,000 AI-optimized processing cores, 1.2 trillion transistors, and 18 gigabytes of on-chip SRAM. The CS-1 system, which housed the WSE-1, included twelve 100 Gigabit Ethernet connections for data transfer. At the time, it was by far the largest chip ever fabricated, with a die area of 46,225 square millimeters, roughly 56 times larger than the biggest GPU available. Argonne National Laboratory deployed the first CS-1 in 2019, becoming the inaugural national-lab customer for the technology [22].
In April 2021, Cerebras announced the second-generation Wafer-Scale Engine (WSE-2), manufactured using TSMC's 7nm process. The WSE-2 represented a major leap in specifications:
| Specification | WSE-1 | WSE-2 |
|---|---|---|
| Transistors | 1.2 trillion | 2.6 trillion |
| AI cores | 400,000 | 850,000 |
| On-chip SRAM | 18 GB | 40 GB |
| Memory bandwidth | 9.6 PB/s | 20 PB/s |
| Fabric bandwidth | 100 Pb/s | 220 Pb/s |
| Process node | 16nm | 7nm |
The CS-2 system built around the WSE-2 became the primary commercial product for Cerebras, used by research institutions and enterprises for AI training. Notably, the 40 GB of on-chip SRAM eliminated the need for external high-bandwidth memory (HBM), allowing the entire working set of many AI models to reside directly on the processor.
The third-generation Wafer-Scale Engine (WSE-3) was announced in March 2024 at a dedicated launch event and later presented in detail at the Hot Chips 2024 conference. Manufactured on TSMC's 5nm process, the WSE-3 is the most powerful AI chip ever built, containing approximately 4 trillion transistors across its 46,255 mm2 die area. Detailed coverage of the chip lives in the sibling article on the Cerebras WSE-3.
| Specification | WSE-2 | WSE-3 |
|---|---|---|
| Transistors | 2.6 trillion | 4 trillion |
| AI cores (active) | 850,000 | 900,000 |
| Physical cores | ~900,000 | 970,000 |
| On-chip SRAM | 40 GB | 44 GB |
| Memory bandwidth | 20 PB/s | 21 PB/s |
| Fabric bandwidth | 220 Pb/s | 214 Pb/s |
| Peak AI performance | N/A | 125 PFLOPS |
| Process node | 7nm | 5nm |
| Manufacturer | TSMC | TSMC |
The WSE-3 delivers 125 petaflops of peak AI performance, which Cerebras claims is roughly double the performance of the WSE-2 at the same power consumption and cost. The memory bandwidth of 21 PB/s is approximately 7,000 times greater than the NVIDIA H100's off-chip HBM bandwidth, which is the fundamental advantage that enables Cerebras's inference speed records. Because the WSE stores model weights in on-chip SRAM distributed alongside the compute cores, data travels only fractions of a millimeter from memory to compute, rather than crossing package boundaries as it does in GPU architectures [3].
The Cerebras CS-2 and CS-3 are complete computing systems designed to house the WSE-2 and WSE-3 chips, respectively. Each system integrates the wafer-scale processor with all necessary power delivery, cooling, and I/O connectivity in a single rack-mountable unit. The systems are designed to be straightforward to deploy, requiring only standard datacenter power and cooling.
The CS-3, powered by the WSE-3, delivers 125 petaflops of AI compute while consuming 23 kW of power per system. Key system-level specifications include:
| Specification | CS-3 |
|---|---|
| Processor | WSE-3 (4T transistors, 900K cores) |
| Peak AI compute | 125 PFLOPS |
| On-chip memory | 44 GB SRAM |
| System memory (with MemoryX) | Up to 1.2 PB |
| Power consumption | 23 kW |
| Cooling | Direct liquid cooling |
| Max systems in cluster | 2,048 (via SwarmX) |
| Max cluster compute | ~0.25 zettaFLOPS |
One of the key advantages of the CS systems is that they eliminate the need for complex multi-node distributed computing setups. A single CS-3, for instance, can train models that would otherwise require clusters of hundreds of GPUs, dramatically simplifying the software stack and reducing the engineering effort required to scale AI training.
Cerebras developed two complementary technologies to extend the CS systems beyond single-chip limitations:
MemoryX is an external memory system that extends the available memory for the WSE beyond the on-chip SRAM. With up to 1.2 petabytes of capacity, MemoryX enables the CS-3 to handle models with trillions of parameters by streaming weight data to the processor at high bandwidth. This is large enough to store models with 24 trillion parameters in a single logical memory space without partitioning or refactoring, enabling training of next-generation frontier models 10x larger than GPT-4 and Gemini [3].
SwarmX is a high-bandwidth fabric that connects multiple CS systems together. Up to 2,048 CS-3 systems can be linked via SwarmX to build hyperscale AI supercomputers delivering up to a quarter of a zettaFLOP. SwarmX maintains near-linear scaling efficiency, meaning that doubling the number of CS-3 systems nearly doubles aggregate performance.
Cerebras Inference launched in August 2024 as a cloud-based inference service that leverages the unique architecture of the WSE to deliver what the company describes as the fastest LLM inference available. Because the WSE keeps the entire model on-chip (or streams it efficiently via MemoryX), it avoids the memory bandwidth bottleneck that limits GPU-based inference. At launch Cerebras claimed inference 10x to 20x faster than systems built using NVIDIA H100 Hopper GPUs, and the service has set successive speed records on each major open-source model since [23].
The CS-3 has achieved remarkable inference speeds across a range of models, consistently setting records for tokens-per-second output:
| Model | Tokens/second | Notes |
|---|---|---|
| Llama 3.1 8B | 1,800 | Single-user latency |
| Llama 3.1 70B | 2,100 | Per-user, roughly 8x faster than H200 |
| Llama 3.1 405B | 969 | Largest open-source model at launch |
| Llama 4 Scout | 2,600+ | 19x faster than fastest GPU at the time |
| Llama 4 Maverick | 2,500+ | Beat NVIDIA Blackwell on independent benchmarks |
| gpt-oss-120B | 2,700+ | Via Core42 partnership |
| K2 Think | 2,000 | Reasoning-optimized model |
In head-to-head comparisons, Cerebras claims the CS-3 achieves over 21x faster inference than NVIDIA's flagship Blackwell B200 GPU on the Llama 3 70B model in a reasoning scenario with 1,024 input tokens and 4,096 output tokens. On the gpt-oss-120B model, the CS-3 delivered 2,700+ tokens/second compared to approximately 900 tokens/second on Blackwell B200 [4]. Independent benchmarks from third-party site Artificial Analysis have repeatedly placed Cerebras at the top of token-per-second rankings, with inference speeds up to 75 times faster than the same models served by hyperscalers such as Amazon, Microsoft, and Google [24].
The service supports popular open-source models including Llama 3.1 (8B, 70B, and 405B variants), Llama 4 Scout, Llama 4 Maverick, Mistral models, DeepSeek variants, and the gpt-oss family. In a flagship partnership with Mistral, Cerebras powered the Le Chat AI assistant at over 1,100 tokens per second, roughly ten times faster than ChatGPT at the time. Cerebras and Core42 (a subsidiary of G42) also launched global access to OpenAI's gpt-oss-120B model, serving it at approximately 3,000 tokens per second [9].
The speed advantage of Cerebras Inference has made it particularly attractive for real-time applications, agentic AI workflows, and scenarios where low latency directly impacts user experience. Cerebras planned to increase its inference capacity from 2 million to over 40 million tokens per second by Q4 2025, distributed across eight data centers spanning North America and Europe.
In April 2025, Meta announced that Cerebras would power its new Llama API, the first cloud inference service offered directly by Meta for the Llama family of models. Developers selecting the Cerebras backend in the Llama API see generation speeds up to 18 times faster than traditional GPU-based offerings, with Llama 4 Scout running at 2,600 tokens per second compared to roughly 130 tokens per second for ChatGPT and 25 tokens per second for DeepSeek on equivalent endpoints [25]. The partnership represented a meaningful endorsement from Meta, the originator of the Llama model family, and channeled significant developer traffic to Cerebras hardware through Meta's own API surface.
Throughout 2025 and 2026 Cerebras expanded its roster of paying enterprise customers across multiple industries:
| Customer | Use case | Notes |
|---|---|---|
| Meta | Llama API backend | Powers Meta's official Llama inference cloud |
| Mistral | Le Chat assistant | 1,100+ tokens/s, claimed consumer chat speed record |
| Notion | Enterprise search | Sub-300 ms results across 100M+ users |
| AlphaSense | Market intelligence | 10x faster insights via Cerebras Inference |
| Cognition | Agentic coding | Powers Devin-style autonomous coding workflows |
| IBM | Enterprise AI | Integrates Cerebras into broader AI offerings |
| Mayo Clinic | Genomic foundation model | Rheumatoid arthritis treatment prediction |
| GlaxoSmithKline | Drug discovery | Protein and epigenomic transformer models |
| AstraZeneca | Drug discovery | Large biomedical training jobs |
| U.S. Department of Defense | Classified workloads | Disclosed in S-1 customer list |
This breadth of customers, from frontier labs to regulated pharmaceutical firms to U.S. government agencies, became a central pillar of Cerebras's IPO marketing in 2026 [17][26].
Cerebras has contributed several open-source models and research efforts to the AI community:
One of Cerebras's most significant strategic partnerships has been with G42, the Abu Dhabi-based AI technology holding company. The two companies announced a multi-billion-dollar deal in July 2023 to co-build a constellation of AI supercomputers and share infrastructure for training and inference. Together, they have built the Condor Galaxy series of AI supercomputers:
| System | Location | Compute | AI Cores | WSE Generation |
|---|---|---|---|---|
| CG-1 | Santa Clara, CA | 4 exaFLOPS | 54 million | WSE-2 |
| CG-2 | Undisclosed (US) | 4 exaFLOPS | 54 million | WSE-2 |
| CG-3 | Dallas, TX | 8 exaFLOPS | 58 million | WSE-3 |
| CG India | India | 8 exaFLOPS | TBD | WSE-3 |
| Full constellation (planned) | Global | 36 exaFLOPS | - | Mixed |
The full Condor Galaxy constellation of nine interconnected AI supercomputers is designed to deliver 36 exaFLOPS of AI compute by the end of 2026, making it one of the largest collections of interconnected AI supercomputers in the world.
The G42 partnership provided Cerebras with both a major customer and a development platform for demonstrating the capabilities of its wafer-scale technology at datacenter scale. In 2024, G42 accounted for roughly 85% of Cerebras's $290 million in revenue, a concentration that drew scrutiny from U.S. regulators during the IPO process. By the May 2026 amended S-1 filing, G42's direct share had fallen to 24% of 2025 revenue, with the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), a related party of G42, accounting for an additional 62%, leaving total UAE-linked revenue at roughly 86% of 2025 sales [17][26].
The relationship with G42 created significant complications for Cerebras's IPO plans because of U.S. regulatory scrutiny of G42's prior business ties to China, including investments related to Huawei. After Cerebras filed its first S-1 in September 2024, the Committee on Foreign Investment in the United States (CFIUS) opened a review of G42's minority stake in Cerebras, focused on the risk that advanced AI chip technology could be transferred to China through UAE intermediaries. Cerebras paused its IPO process until the review concluded.
G42 divested its Chinese investments and signed a parallel partnership with Microsoft during 2024, both of which were designed to address national security concerns. CFIUS cleared the restructured arrangement on March 31, 2025, after G42 agreed to convert its Cerebras stake into non-voting shares, removing the company from Cerebras's voting investor base. This regulatory clearance unlocked Cerebras's ability to refile for an IPO and continue selling chips to G42 and its subsidiaries [16][28].
In March 2026, Amazon Web Services signed a multiyear deal with Cerebras to make the WSE-3 wafer-scale chip available to cloud customers through Amazon Bedrock. AWS became the first major cloud provider to offer Cerebras's disaggregated inference solution, with plans to launch Cerebras hardware on Amazon Bedrock in the coming months and add open-source LLM support later in 2026. This partnership represented a significant validation of Cerebras's technology by one of the world's largest cloud providers and is described in the S-1 as a binding term sheet for ongoing capacity commitments [5][17].
In January 2026, Cerebras agreed to provide 750 megawatts of computing power to OpenAI through 2028. The transaction was structured as a Master Relationship Agreement (MRA) valued at more than $20 billion, roughly twice the $10 billion headline figure that initially circulated in press coverage. The MRA includes explicit expansion provisions that allow OpenAI to scale the commitment up to 2 gigawatts of capacity by 2030, which would make it among the largest single-customer AI infrastructure deals ever signed.
The deployment will roll out in multiple tranches starting in 2026 and make Cerebras the largest dedicated low-latency AI inference provider to OpenAI, supplementing OpenAI's existing capacity from NVIDIA GPUs and custom chips. Cerebras CEO Andrew Feldman described the deal as a turning point that validated wafer-scale silicon for production-scale frontier AI deployment, and analysts treated it as the centerpiece of the 2026 IPO narrative [29][30].
Cerebras has had an active relationship with the U.S. Department of Energy (DOE) since 2019. Argonne National Laboratory deployed the first CS-1 system that year, and Lawrence Livermore National Laboratory followed shortly after. In 2022, a collaboration between Cerebras and Argonne National Laboratory used WSE-based models to study SARS-CoV-2 genomic dynamics, work that won a Gordon Bell Special Prize.
In December 2025, Cerebras signed a memorandum of understanding (MOU) with DOE to accelerate the White House Genesis Mission, a national AI initiative to apply AI to scientific discovery and national security. The Pittsburgh Supercomputing Center and the National Energy Technology Laboratory have also operated CS systems for HPC and AI workloads [31][32].
Cerebras shifted from selling individual CS systems to operating its own inference clusters during 2024 and 2025, building out a North American and European footprint to support the Cerebras Inference cloud. Most facilities are operated jointly with regional data center partners.
| Location | Partner | Status (May 2026) |
|---|---|---|
| Santa Clara, California | Cerebras | Operational (CG-1, inference) |
| Stockton, California | Nautilus Data Technologies | Operational (floating barge facility) |
| Dallas, Texas | G42 / Crusoe | Operational (CG-3) |
| Minneapolis, Minnesota | Undisclosed | Operational |
| Montreal, Canada | Bit Digital | Operational |
| Oklahoma City, Oklahoma | Scale Datacenters | Operational (10 MW) |
| Undisclosed Europe site | Undisclosed | Coming online 2026 |
| Condor Galaxy India | G42 | Announced May 2026 |
The Oklahoma City facility opened in late 2025 as a 10 MW site providing more than 44 exaFLOPS of AI compute. The Stockton barge facility, hosted at Nautilus's water-cooled platform on the San Joaquin River, became Cerebras's first west coast inference cluster outside Santa Clara. The expansion was specifically designed to push aggregate token-generation capacity past 40 million tokens per second by the end of 2025 [33].
The architectural differences between Cerebras's wafer-scale approach and traditional GPU clusters lead to fundamentally different performance profiles:
| Metric | Cerebras CS-3 | NVIDIA DGX B200 (8x B200) | Advantage |
|---|---|---|---|
| Inference latency (Llama 70B) | ~2,100 tok/s per user | ~250 tok/s per user | CS-3 ~8x faster |
| On-chip memory bandwidth | 21 PB/s | ~64 TB/s (8 GPUs combined) | CS-3 ~300x higher |
| Programming model | Single device | Distributed (tensor/pipeline parallelism) | CS-3 simpler |
| Power consumption | 23 kW (system) | ~10 kW (8 GPUs + system) | GPU cluster more efficient per watt |
| Training flexibility | Optimized via MemoryX/SwarmX | Industry-standard frameworks | GPUs more flexible |
| Software ecosystem | Cerebras SDK | CUDA/PyTorch/TensorFlow | GPUs much broader |
The CS-3's primary advantage is inference latency, driven by the massive on-chip memory bandwidth that eliminates the memory wall problem. For training, GPU clusters retain advantages in software ecosystem maturity and flexibility, though Cerebras has made significant progress in training support through its compiler and MemoryX/SwarmX infrastructure.
Cerebras has raised substantial capital through multiple funding rounds, including a final pre-IPO Series H in February 2026 that took the valuation to $23 billion:
| Round | Date | Amount | Post-money valuation | Lead investors |
|---|---|---|---|---|
| Series A | May 2016 | $27M | n/a | Benchmark, Foundation Capital, Eclipse Ventures |
| Series B | 2017 | $60M | n/a | Benchmark Capital |
| Series C | 2018 | $112M | n/a | Benchmark Capital |
| Series D | 2019 | $272M | $1.6B (unicorn) | Koch Disruptive Technologies |
| Series E | 2021 | $250M | n/a | Alpha Wave Ventures |
| Series F | Nov 2021 | $720M | $4B | Alpha Wave Ventures, Abu Dhabi Growth Fund |
| Series G | Sept 2025 | $1.1B | $8.1B | Fidelity, Atreides Management |
| Series H | Feb 2026 | $1.0B | $23B | Undisclosed crossover investors |
| IPO | May 2026 | $5.55B | ~$48B at listing | Public markets |
Total private funding before the IPO exceeded $4 billion. The Series H round, completed shortly after the OpenAI MRA was signed, served as the final crossover round before Cerebras filed its amended S-1 in April 2026 [14][26][34].
Cerebras first filed for an IPO on the Nasdaq in September 2024 under the reserved ticker symbol CBRS. The original filing faced delays after CFIUS opened its review of the G42 stake, which forced Cerebras to withdraw the prospectus and pause the listing process. After restructuring its investor base and completing the Series G round, Cerebras refiled its S-1 on April 17, 2026, and amended it on May 4, 2026, with concrete pricing terms [15][35].
The IPO priced on May 13, 2026 at $185 per share, more than double the original target range, selling 30 million shares for total proceeds of $5.55 billion. Trading opened the following day at $350 and closed at roughly $311, up 68%, valuing Cerebras at around $95 billion on a fully diluted basis. The offering was the largest global IPO of 2026 at the time of pricing and one of the largest pure-play semiconductor listings in history [18][36].
The S-1 filings disclosed rapid revenue growth alongside continued operating losses:
| Year | Revenue | GAAP net income (loss) | Notes |
|---|---|---|---|
| 2022 | $24.6M | n/a | Early commercial period |
| 2023 | $78.7M | n/a | First full year of G42 ramp |
| 2024 | $290.3M | $(481.6M) | Loss reflects share-based comp |
| 2025 | $510.0M | $237.8M | Operating loss of $145.9M |
In the 2025 fiscal year, Cerebras reported $510 million in revenue, up 76% from 2024, and a GAAP net income of $237.8 million, though the company recorded a GAAP operating loss of $145.9 million and a non-GAAP net loss of roughly $75.7 million. Customer concentration remained the dominant risk factor in the S-1, with UAE-linked entities (G42 and MBZUAI combined) accounting for roughly 86% of 2025 revenue [17][26].
Cerebras competes in the AI accelerator market against several established and emerging players:
| Competitor | Approach | Key products |
|---|---|---|
| NVIDIA | GPU-based accelerators | H100, H200, B200, B300 |
| AMD | GPU-based accelerators | MI300X, MI325X, MI350 |
| Custom ASICs | TPU v6 (Trillium), TPU v7 (Ironwood) | |
| Groq | LPU inference accelerators | GroqChip1, LPU v2 |
| SambaNova | Reconfigurable dataflow units (RDUs) | SN40L, SN50 |
| Intel | Gaudi accelerators (discontinued) | Gaudi 3 |
| Amazon | Custom ASICs | Trainium2, Trainium3 |
Cerebras differentiates itself through the sheer scale of its wafer-scale approach, the simplicity of programming a single massive chip versus a distributed cluster, and its focus on both training and inference performance. The company's inference speed records have been a particularly effective marketing tool, demonstrating tangible performance advantages over GPU-based solutions.
The pure-play inference startups all chase a similar value proposition (orders-of-magnitude lower latency than NVIDIA GPUs on open-source LLMs), but they take very different architectural paths:
| Vendor | Architecture | Llama 4 Maverick (tok/s) | Notable model |
|---|---|---|---|
| Cerebras | Wafer-scale SRAM | 2,500+ | gpt-oss-120B at 2,700+ tok/s |
| SambaNova | Reconfigurable Dataflow Units | 794 | DeepSeek-V3 at 198 tok/s |
| Groq | LPU streaming architecture | 549 | Llama 70B steady at ~275 tok/s |
| NVIDIA Blackwell | GPU clusters | ~290 (cloud) | General-purpose flexibility |
Cerebras has consistently topped Artificial Analysis token-per-second leaderboards for the largest open-source models, while Groq tends to win on consistency across context lengths and SambaNova on cost-per-token in dense reasoning workloads. NVIDIA continues to dominate training and the broader ecosystem, while Cerebras and its peers compete primarily on inference performance and total cost of ownership [37][38].
Cerebras's wafer-scale approach offers several distinct advantages:
Limitations include:
As of May 2026, Cerebras is in a strong position. The company completed the largest global IPO of 2026 on May 14, 2026, raising $5.55 billion at a $185 price and trading up roughly 68% on the first day to imply a fully diluted market value around $95 billion. Cerebras has secured anchor partnerships with AWS and OpenAI, continued to set inference speed records on every major open-source model, and is broadening its data center footprint to handle Meta's Llama API traffic and the OpenAI Master Relationship Agreement.
The Condor Galaxy constellation continues to grow, with CG-3 fully operational in Dallas and the Condor Galaxy India site announced in May 2026. The company is on track to deliver more than 40 million tokens per second of aggregate inference capacity by the end of 2026 and is rolling out additional facilities in Oklahoma City, Minneapolis, Montreal, and undisclosed European sites. Cerebras remains the most prominent pure-play challenger to NVIDIA in AI inference and one of only a handful of independent companies capable of supplying frontier-scale AI compute to hyperscalers and frontier labs at the gigawatt level.