SambaNova SN50
Last reviewed
May 31, 2026
Sources
12 citations
Review status
Source-backed
Revision
v3 · 1,879 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
12 citations
Review status
Source-backed
Revision
v3 · 1,879 words
Add missing citations, update stale details, or suggest a clearer explanation.
The SambaNova SN50 is a reconfigurable dataflow accelerator that SambaNova Systems unveiled in late February 2026 as the successor to its SN40L chip. The company positions the part for large-model and agentic inference, and it announced the SN50 alongside a partnership with Intel and a funding round of more than $350 million. SambaNova calls the SN50 a Reconfigurable Dataflow Unit, or RDU, the same architecture family it has used since its first generation. The chip was announced on February 24, 2026, and SambaNova says it will start shipping to customers in the second half of 2026. [1][2][3]
Much of what is public about the SN50 comes from SambaNova's own announcement and from trade press that covered the launch. The headline performance numbers are vendor claims rather than independently benchmarked results, so this article attributes specifications to their sources and flags what remains unverified.
SambaNova Systems was founded in 2017 by Rodrigo Liang, who is chief executive, along with Stanford computer scientists Kunle Olukotun and Christopher Re. The company sells AI systems built around its own silicon rather than around general-purpose GPUs, and over its life it has raised well over $1 billion from investors including SoftBank Vision Fund 2, GV, BlackRock, and Intel Capital. Its core idea is a Reconfigurable Dataflow Architecture, which the company shortens to RDA. The processor at the center of that architecture is the Reconfigurable Dataflow Unit. [4][5][8]
A dataflow approach differs from how a conventional GPU runs work. A GPU executes a long stream of kernels, writing intermediate results back to memory between each step. An RDU instead maps a whole model, or large parts of one, onto the chip as a spatial dataflow graph, so that the output of one operation can stream directly into the next without a round trip through external memory. SambaNova's compiler stack, SambaFlow, takes a model defined in a framework like PyTorch and lays it out across the reconfigurable fabric. The pitch is that keeping data on chip cuts the memory traffic that often limits inference, which in turn helps with latency and energy use. [4][5]
The other piece SambaNova emphasizes is a tiered memory system. Rather than relying on a single pool of high-bandwidth memory the way many GPUs do, its RDUs combine a small, fast tier of on-chip SRAM, a middle tier of HBM, and a large tier of DDR. The large DDR tier is meant to hold the weights of very big models, or the cached state of many models at once, so that a single node can serve workloads that would otherwise need many accelerators. [1][5]
The SN50's direct predecessor is the SN40L, which SambaNova introduced in September 2023. The SN40L is built on TSMC's 5 nanometer process, packs about 102 billion transistors across two compute chiplets, and reaches roughly 638 BF16 TFLOPS per socket. It brought the three-tier memory design to market with 520 MB of on-chip SRAM, 64 GB of HBM, and up to 1.5 TB of DDR, and SambaNova used that memory to keep many model weights resident close to the processor. The company marketed the SN40L around running very large models and hosting many expert models on a single system, a setup it branded as a Composition of Experts. The SN50 keeps the same dataflow lineage and the same tiered-memory philosophy, and SambaNova frames it as a generational step up in compute, memory bandwidth, and interconnect rather than a change of architecture. [4][10]
According to SambaNova and the outlets that covered the launch, the SN50 is built on TSMC's 3 nanometer process and uses a dual-chiplet design. Reporting puts its FP8 compute at about 3.2 PFLOPS. The memory system follows the familiar three tiers: 64 GB of HBM, 432 MB of on-chip SRAM, and a DDR5 tier that scales from 256 GB up to 2 TB. [1][2][6][7]
On networking, SambaNova says the SN50 offers about four times the network bandwidth of the previous generation, and that up to 256 accelerators can be linked over a multi-terabyte-per-second interconnect. The company describes a system product called SambaRack that integrates 16 SN50 accelerators and draws about 20 kW, a figure it says is low enough for air-cooled data centers rather than requiring liquid cooling. [2][6][7]
SambaNova frames the SN50 around scale. It says the three-tier memory architecture is designed to support models with more than 10 trillion parameters and context lengths beyond 10 million tokens, which the company ties to longer reasoning chains and more autonomous agent workflows. These figures describe an addressable ceiling for the memory design rather than a model SambaNova ships, so they are best read as design targets. [3][6][8]
The table below collects the specifications as reported. Every figure is a vendor claim or a number drawn from launch coverage, not an independent measurement.
| Attribute | Reported value | Source |
|---|---|---|
| Vendor | SambaNova Systems | [1] |
| Announcement | February 24, 2026 | [3][8] |
| Process | TSMC 3 nm (N3) | [6][7] |
| Packaging | Dual-chiplet | [6][7] |
| FP8 compute | About 3.2 PFLOPS | [6][7] |
| HBM | 64 GB | [6][7] |
| On-chip SRAM | 432 MB | [6][7] |
| DDR5 | 256 GB to 2 TB | [6][7] |
| Interconnect | Multi-TB/s, up to 256 accelerators linked | [6][7] |
| Network bandwidth | About 4x the prior generation | [6] |
| SambaRack | 16 accelerators, about 20 kW, air-cooled | [6][7] |
| Design ceiling | More than 10T parameters, more than 10M token context | [3][6] |
| Shipping | Second half of 2026 | [1][2] |
SambaNova has aimed the SN50 squarely at inference for large models and for agentic systems, meaning workloads where a model plans, calls tools, and runs many steps in a loop rather than answering a single prompt. The argument the company makes is that agentic work multiplies the amount of context a system has to hold and the number of model calls it has to serve, which strains both memory capacity and interconnect. The large DDR tier is meant to keep more weights and more cached state on hand, and the wider interconnect is meant to let a cluster behave more like a single large machine. [1][3]
The launch also tied the SN50 to real deployments and partners. SambaNova named SoftBank as the first customer to deploy the SN50, in next-generation AI data centers in Japan, and it announced a collaboration with Intel to pair the accelerator with Intel Xeon CPUs in a heterogeneous inference platform, where different parts of a workload run on the hardware best suited to them. In the design SambaNova later described, GPUs handle the prefill stage, Intel Xeon 6 CPUs run agent actions and host the system, and SN50 RDUs handle token generation in the decode stage. The accompanying Series E raise of more than $350 million was led by Vista Equity Partners and Cambium Capital with participation from Intel Capital, and SambaNova said the money would go toward scaling manufacturing and its cloud. [2][3][8][9]
The SN50 enters a market that NVIDIA dominates, and SambaNova's framing is explicitly comparative. The company says the SN50 delivers up to five times the compute performance of competing accelerators and roughly three times the throughput, and it claims the chip is about three times more efficient than NVIDIA's B200 on the workloads it cares about, with total cost of ownership it says can be up to three times lower than GPU-based systems. The most concrete figure it offered at launch was a per-user throughput comparison on Llama 3.3 70B at FP8 with 1K input and output, where SambaNova reported 895 tokens per second per user on the SN50 against 184 on a B200. These are SambaNova's own comparisons, made at launch, and they had not been independently verified at the time of the announcement. [3][6][7]
Beyond NVIDIA, the SN50 competes with other inference-focused silicon. Cerebras takes a wafer-scale approach with its WSE-3, keeping an enormous amount of SRAM on a single piece of silicon. Groq builds a deterministic LPU tuned for low-latency token generation. SambaNova's bet is different from both: rather than maximizing on-chip SRAM or minimizing latency through determinism, it leans on the tiered-memory design to fit very large models and many models on comparatively few chips. Whether that translates into a durable advantage depends on benchmarks the public does not yet have.
The most important caveat is that the SN50's performance claims come from SambaNova. The five-times and three-times figures, the efficiency comparison against the B200, and the cost-of-ownership claims are vendor numbers tied to workloads the company selected, and no independent benchmarks were available at announcement. The 10-trillion-parameter and 10-million-token figures describe what the memory architecture is meant to address, not a configuration SambaNova has demonstrated end to end.
Some specifics also vary between sources or remain thin. The detailed silicon figures, the 3 nanometer node, the dual-chiplet layout, the 3.2 PFLOPS FP8 number, and the per-tier memory capacities, come mostly from Tom's Hardware and DataCenterDynamics rather than from SambaNova's own headline materials, which lean on relative claims like five times the compute. Those figures should be treated as preliminary until SambaNova publishes a datasheet or third parties run the part. Pricing, exact clock speeds, real power under sustained load, and per-customer availability beyond the stated second-half-2026 window were not disclosed in the launch coverage, and SambaNova had made no MLPerf submission for the SN50 at announcement. There is also a governance wrinkle worth noting: Intel both invested in the round and partnered on the platform, and Intel chief executive Lip-Bu Tan has been reported as chairman of SambaNova's board, which several outlets flagged when covering the deal. Until the SN50 ships and is measured outside the company, the chip is best understood through its announced design and SambaNova's claims rather than through proven results. [3][6][7][11]