SambaNova SN40L

AI Hardware AI Infrastructure

25 min read

Updated Jul 12, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 12, 2026

Fact-checked

In review queue

Sources

50 citations

Revision

v5 · 5,013 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The SambaNova SN40L is a reconfigurable dataflow AI accelerator designed by SambaNova Systems and unveiled on September 19, 2023.^[1]^[2] Marketed as a fourth-generation Reconfigurable Dataflow Unit (RDU) and the first member of SambaNova's "Cerulean" architectural family, the SN40L combines two TSMC 5 nm logic dies, 64 GB of co-packaged HBM3, and up to 1.5 TB of direct-attached DDR5 in a three-tier memory hierarchy intended to run trillion-parameter generative models on a single 8-socket node.^[1]^[3]^[4] The "40" in the product name denotes SambaNova's fourth chip generation, while the "L" indicates that the silicon is specifically tuned for large language model workloads.^[5]

Where contemporary GPU accelerators such as the NVIDIA H100 rely on a thread-and-kernel programming model, the SN40L instead exposes a spatial mesh of 1,040 Pattern Compute Units (PCUs) and 1,040 Pattern Memory Units (PMUs) that the SambaFlow compiler configures into long, statically-scheduled pipelines.^[4]^[6] Each socket reaches a peak of 638 BF16 TFLOPS and contains 102 billion transistors fabricated on TSMC's N5 node and integrated using 2.5D Chip-on-Wafer-on-Substrate (CoWoS-S) packaging.^[7]^[4]^[47] At the system level, a SambaRack SN40L-16 chassis aggregates 16 RDU sockets to support models up to roughly five trillion parameters and a sequence length above 256k tokens, while drawing an average of just 10 kW during inference, low enough to be air-cooled in conventional 19-inch racks.^[3]^[8]^[9]

The SN40L has become the workhorse of SambaNova's pivot away from training: it underlies the SambaCloud public inference service launched on September 10, 2024,^[10] the SambaNova Suite enterprise stack, and on-premises deployments at customers including SoftBank, Saudi Aramco, Analog Devices, Stanford University research groups, OTP Bank, the RIKEN Center for Computational Science, and U.S. Department of Energy national laboratories such as Argonne, Oak Ridge, and Lawrence Livermore.^[11]^[12]^[13] The chip's most-cited public benchmark is its September 2024 world record of 132 output tokens per second on Meta's 405-billion-parameter Llama 3.1 model at full 16-bit precision, achieved on a single 16-socket SN40L node.^[14]^[10]^[48]

Background: SambaNova Systems and the road to the SN40L

SambaNova Systems was founded in Palo Alto, California, in 2017 by three Stanford-affiliated technologists: Rodrigo Liang, a former senior vice president for SPARC and other server processors at Sun Microsystems and Oracle; Stanford Cadence Design Professor of Electrical Engineering and Computer Science Kunle Olukotun, widely credited as the academic father of the multicore microprocessor through his 1990s work on the Hydra chip-multiprocessor at Stanford; and Christopher Ré, a Stanford computer-science associate professor and MacArthur Fellow whose work in the Stanford DAWN lab on weak supervision and learned systems shaped the company's software direction.^[15]^[16]^[17] All three remain involved with the company, with Liang serving as chief executive, Olukotun as chief technologist, and Ré as a technical adviser.^[15]^[17]

The company emerged from stealth in March 2018 with a $56 million Series A round led by Walden International and Google Ventures, raised a $150 million Series B led by Intel Capital in April 2019, a $250 million Series C led by BlackRock in February 2020, and a $676 million Series D led by SoftBank Vision Fund 2 in April 2021 that valued the firm at approximately $5.1 billion.^[18] Total disclosed venture funding through the SN40L's launch was about $1.1 billion, making SambaNova one of the best-funded AI-silicon startups of its era alongside Cerebras and Graphcore.^[18]

The SN40L is the fourth chip in SambaNova's RDU lineage. The first-generation SN10, introduced in 2020, established the dataflow architecture and was followed by the SN20 and the die-to-die SN30, which together carried the "Cardinal" architectural codename, a reference to Stanford's school colour.^[5]^[19] The SN40L, by contrast, is the first member of a new family that SambaNova internally calls "Cerulean," and it is the first SambaNova chip to integrate on-package High Bandwidth Memory.^[5]^[19]

How does the SN40L's dataflow architecture work?

The SN40L is a dataflow accelerator: instead of executing a stream of instructions through a fixed pipeline, it spatially maps each layer of a deep-learning graph onto a physical region of the chip and streams activations through that mapping in a producer-consumer fashion.^[20]^[21] This is the same principle SambaNova's prior RDUs used, but the SN40L expands both the size of the spatial fabric and the reachable working set.

Tile structure and the RDU array

Each SN40L socket exposes a two-dimensional, packet-switched mesh of reconfigurable tiles built from two main element types.^[4]^[6] Pattern Compute Units (PCUs) are configurable matrix engines that can be programmed at compile time to behave as either a systolic array (for general matrix multiplication) or as a pipelined SIMD lane with cross-lane reduction, supporting BF16, FP32 and INT32 numerics together with a tail-section that implements transcendental functions and stochastic rounding.^[4] Pattern Memory Units (PMUs) are banked SRAM scratchpads of programmable depth, paired with their own address-generation ALU, banking-and-predication logic and a data-alignment unit that performs transpose and other tensor reshaping in-place without round-tripping through DRAM.^[4]

The SN40L socket aggregates 1,040 PCUs and 1,040 PMUs distributed across two identical accelerator dies that are joined edge-to-edge via the 2.5D CoWoS interposer, yielding the published peak of 638 BF16 TFLOPS per socket.^[4]^[7] Compared with the prior-generation SN30, the SN40L contains about 18.8 percent fewer functional units but achieves comparable raw throughput thanks to the 7 nm-to-5 nm shrink and an approximately 12 percent clock-speed lift.^[22]

Reconfigurable Dataflow Network

PCUs and PMUs are stitched together by a Reconfigurable Dataflow Network (RDN): a mesh-based packet-switched interconnect with three physically separate fabrics carrying vector data, scalar data, and control packets respectively.^[4] The RDN supports multi-cast routing, dynamic packet ordering using sequence identifiers, and is what allows the compiler to fuse hundreds of operators into a single dataflow kernel without committing intermediate tensors to off-chip memory.^[4]^[21]

Address Generation and Coalescing Unit (AGCU)

To bridge the on-chip fabric to the rest of the system, the SN40L incorporates Address Generation and Coalescing Units (AGCUs) that mediate accesses to local HBM and DDR, host memory across PCIe, and to other RDUs in a node.^[4] The AGCU implements a peer-to-peer protocol enabling direct, point-to-point reads and writes between sockets without traversing host memory or the DDR tier, which SambaNova reports is essential for sharding very large models across the 16 sockets of a SambaRack.^[4]

Spatial kernel fusion and Composition of Experts

The architectural payoff of the dataflow model is a programming style SambaNova calls streaming spatial kernel fusion. Because the compiler can lay an entire transformer decoder block (and often whole sequences of blocks) onto the fabric as a single dataflow graph, the chip avoids the per-kernel launch overhead, register-file traffic, and on-chip-cache pressure that limits GPU kernel fusion. SambaNova's own measurements report 2x to 13x speed-ups on individual operator fusion microbenchmarks versus a hand-tuned GPU baseline.^[4]

The SN40L's three-tier memory was, in turn, designed to make a model-serving pattern that SambaNova calls Composition of Experts (CoE) practical: large numbers of independently-trained dense models (each typically 7 billion to 70 billion parameters) are stored in DDR, paged into HBM on demand, and routed per-prompt by a lightweight gating model that decides which "expert" model to invoke.^[4]^[23] The SN40L paper describes Composition of Experts as "a modular approach that lowers the cost and complexity of training and serving."^[4] This is distinct from the more familiar Mixture of Experts (MoE) approach, in which a single trained network contains sparsely-activated expert sublayers; in CoE, the experts are entirely separate, separately-trained models, and SambaNova reports its Samba-1 product packaged more than fifty such experts into a 1.3-trillion-parameter logical model.^[23]^[24]

What is the SN40L's three-tier memory architecture?

The SN40L's defining hardware feature is a three-tier memory hierarchy that combines large on-chip SRAM, on-package HBM3, and direct-attached DDR5 within a single socket.^[4]^[25] Earlier SambaNova RDUs had only SRAM and DDR; the addition of HBM was the principal motivation for the "L" variant's existence.^[25]

Tier	Capacity per socket	Implementation	Aggregate bandwidth
On-chip SRAM	520 MiB	Distributed across 1,040 PMUs	Hundreds of TB/s
HBM3	64 GiB	Co-packaged via CoWoS-S interposer	~2 TB/s
DDR5 DRAM	up to 1.5 TiB	Pluggable DIMMs, direct-attached	>200 GB/s per socket

Source: SambaNova arXiv whitepaper (May 2024).^[4]

The paper states that "the HBM tier has 64 GiB of capacity with a peak bandwidth of about 2 TB/s per socket," that "the DDR tier can have a peak memory capacity of 1.5 TiB at a peak bandwidth of over 200 GB/s," and that "models are loaded from DDR to HBM at over 1 TB/s in a single SN40L Node."^[4]

Tier 1: 520 MiB of on-chip SRAM

The 520 MiB of on-die SRAM in each SN40L socket is, at the time of the chip's launch, the largest on-chip memory in any commercial dataflow or AI ASIC other than wafer-scale designs such as the Cerebras WSE-3.^[25]^[4] It is not arranged as a single L1 or L2 cache; instead, each PMU contains a programmable scratchpad bank that the SambaFlow compiler explicitly partitions among intermediate tensors and weights, delivering hundreds of TB/s of aggregate on-chip bandwidth.^[4] This high-density, banked SRAM is what makes streaming kernel fusion viable: rather than spilling intermediate activations to HBM between operators (as a GPU must), the SN40L can keep them resident in the local PMU bank that produced them, then forward them via the RDN to the next consuming PCU on the next clock-domain.^[4]^[21]

Tier 2: 64 GiB of co-packaged HBM3

Above the SRAM sits 64 GiB of HBM3 integrated onto the package via the 2.5D CoWoS-S interposer, providing about 2 TB/s of bandwidth per socket.^[4]^[7] In SambaNova's own framing, HBM acts as a large L3 cache rather than as primary model storage; the company architects deployments so that HBM holds the activations and currently active expert weights for a workload, while persistent storage of model weights happens in DDR.^[22]^[25] HBM3 was a deliberate choice for the SN40L: SambaNova's earlier chips used larger but slower DDR-only configurations, and the addition of HBM was driven by the bandwidth requirements of large-language-model decode steps in which the entire context tensor must be re-read every token.^[25]

Tier 3: up to 1.5 TiB of DDR5

The third tier is up to 1.5 TiB of direct-attached DDR5 DRAM per socket in pluggable DIMM form factor, with a peak bandwidth of over 200 GB/s.^[4]^[3] This is an unusually large attached-DRAM capacity for any AI accelerator, and it is the architectural pivot point of SambaNova's CoE pitch: at 1.5 TB per socket and eight sockets per DataScale node, a single node holds up to about 12 TB of DDR, enough to keep hundreds of full-precision dense models resident without paging from disk.^[4]^[25] SambaNova reports that the DDR-to-HBM transfer rate exceeds 1 TB/s in aggregate within a single SN40L node, which it cites as the key enabler for sub-millisecond model swap latency in CoE serving.^[4]

The hierarchy as a whole is what SambaNova has marketed as a solution to the "AI memory wall": by trading some peak FLOPS density for more on-package and direct-attached capacity, the SN40L can keep the entire weight set of even multi-hundred-billion-parameter dense LLMs within a single node, eliminating the cross-node communication that bottlenecks GPU deployments at similar parameter counts.^[4]

How is the SN40L built, and what are its specifications?

The SN40L is fabricated on TSMC's N5 (5 nm) process and integrated using TSMC's 2.5D CoWoS-S advanced packaging.^[7]^[25] Each socket contains two identical 600 mm² accelerator dies plus the HBM3 stacks on a common silicon interposer.^[7] SambaNova reports a per-socket transistor count of approximately 102 billion, comparable in transistor density to NVIDIA's contemporaneous data-center GPUs, with 520 MB of on-chip SRAM implemented in high-density memory cells.^[7]

The published headline performance of each socket is 638 BF16 TFLOPS, with the same units also citing 640 BF16 TFLOPS and an FP16 figure of approximately 688 TFLOPS depending on operating mode.^[4]^[7]^[26] The chip notably lacks dedicated low-precision accelerators for INT8 and FP8, an architectural choice consistent with SambaNova's emphasis on inference at "no-quantization" BF16 / FP16 / mixed precision for accuracy-sensitive enterprise workloads.^[26]^[14] Estimated thermal-design power for the accelerator socket is reported at roughly 600 W in third-party coverage, though SambaNova itself emphasises rack-level power: a 16-socket SambaRack averages about 10 kW under inference load, an order of magnitude lower than typical 140-kW NVIDIA HGX racks of similar parameter capacity.^[8]^[26]

SambaNova detailed the SN40L publicly at two flagship chip conferences. At Hot Chips 2024 the design was presented as "SambaNova SN40L RDU: Breaking the Barrier of Trillion+ Parameter Scale," and at the 2025 International Solid-State Circuits Conference (ISSCC) it appeared as "SambaNova SN40L: A 5nm 2.5D Dataflow Accelerator with Three Memory Tiers for Trillion Parameter AI."^[47]^[7] Both presentations, along with the company's arXiv whitepaper, are the primary technical sources for the transistor count, memory-tier capacities and peak-throughput figures cited here.^[4]^[7]^[47]

What systems is the SN40L sold in?

The SN40L is not sold as a discrete chip but as part of integrated rack systems originally branded SambaNova DataScale and, since 2024, increasingly branded SambaRack.^[27]^[9]

The canonical building block is the DataScale SN40L-2 module, which integrates two RDUs together with their host components.^[27] Four such modules combine into the eight-RDU SN40L node, the company's reference unit for serving a five-trillion-parameter Composition-of-Experts workload.^[3]^[9] The largest standard configuration is the SambaRack SN40L-16, a single 19-inch rack containing 16 RDU sockets across eight SN40L-2 modules, which is the platform SambaNova quotes for its world-record Llama 3.1 405B and DeepSeek-R1 671B inference benchmarks.^[9]^[14]^[28]

A SambaRack SN40L-16 therefore aggregates approximately 8 GB of on-chip SRAM, 1 TB of HBM3, and up to 24 TB of attached DDR5 across its 16 sockets, sufficient to host hundreds of distinct foundation models simultaneously.^[27]^[4] Air cooling at ~10 kW per rack allows the systems to be deployed in standard enterprise data-center facilities without specialised liquid-cooling retrofits.^[8]^[29]

How is the SN40L programmed?

The SN40L would be unusable without its compiler stack, because there is no public ISA for the RDU; users do not write CUDA-style kernels but rather submit standard model graphs that SambaFlow lowers to RDU configuration bitstreams.^[30]^[31]

SambaFlow is the core compiler and runtime. It ingests model definitions from PyTorch or TensorFlow, traces them into a dataflow graph, and produces a binary "PEF" file that contains the spatial mapping of every operator onto specific PCUs and PMUs as well as the routing configuration for the RDN.^[30] Within SambaFlow, the compiler performs operator fusion, tiling, weight partitioning and inter-socket sharding automatically; SambaNova emphasises that an entire transformer decoder layer is typically compiled as a single kernel call, eliminating the per-kernel launch overhead that limits GPU performance on long generation sequences.^[30]^[4]

SambaStudio is a graphical, browser-based platform layered on top of SambaFlow that gives data scientists a model-management workflow: dataset upload, fine-tuning, deployment, and inspection of running model endpoints without leaving the GUI.^[30]^[32]

SambaCloud (originally announced as "SambaNova Cloud") is the public-internet inference service launched on September 10, 2024.^[10] It exposes Meta's Llama-family open-source models, DeepSeek-R1, and others via an OpenAI-compatible REST API hosted on SambaRack SN40L-16 nodes.^[10]^[14] SambaCloud comes in Free, Developer, and Enterprise tiers, with the Enterprise tier offering a SambaNova-managed instance of the cloud stack deployable inside a customer's own data centre under what SambaNova calls SambaManaged, which was launched in July 2025.^[33] An adjacent product, the original SambaNova Suite announced on February 28, 2023, is the full-stack offering combining DataScale hardware, SambaFlow and a curated set of pre-trained open-source models for enterprise on-premises deployment.^[34]

How fast is the SN40L?

The SN40L's most heavily-publicised benchmarks have been generative-model token-throughput records measured at full 16-bit precision on standard open-source LLMs.

On Llama 3.1 405B, a 16-socket SambaRack SN40L-16 first reached 132 output tokens/second/user at SambaCloud's September 2024 launch, then 129 output tokens/second/user in a later SambaNova publication, and is reported in the company's blog and arXiv papers at peak rates exceeding 100 tokens/s/user across batch sizes up to four concurrent requests.^[14]^[10]^[4] A separate Artificial Analysis-verified test reported 114 tokens/s on the same model.^[14] At the time these results were published, the fastest comparable GPU-based service was measured by Artificial Analysis at approximately 72 tokens/s on the same model.^[14] SambaNova chief executive Rodrigo Liang framed the achievement around precision as much as speed, saying the platform delivered "tokens at a rate of 132 per second and at the full 16-bit precision it was trained at."^[10] Contemporaneous coverage noted one caveat: the September 2024 405B record was measured with the model's context window reduced to roughly 8,000 tokens rather than its full 128k window.^[10]

On Llama 3.1 70B, the same 16-socket node reached 457 to 461 output tokens/second, while on the smaller Llama 3.1 8B SambaNova has cited rates above 1,042 tokens/second.^[14]^[10]^[35] All three measurements are reported at native BF16 precision without quantisation.^[14]

For the Composition of Experts workload pattern that the SN40L was specifically architected to accelerate, SambaNova's arXiv whitepaper reports CoE serving speed-ups of 3.7x over an NVIDIA DGX H100 and 6.6x over an NVIDIA DGX A100 in their own measurements, together with 15x to 31x faster model-switching latency versus the same DGX systems and up to 19x lower machine footprint for the same aggregate parameter count.^[4]

The SN40L has not, as of the time of writing, posted official entries to the MLPerf Inference suite; this is in line with SambaNova's stated preference for end-to-end token-rate benchmarks on real LLMs over the historically training-centric MLPerf metrics.

Who uses the SN40L?

Public customer references for the SN40L span enterprise, sovereign, financial and scientific deployments.

SoftBank has been a multi-generation customer and was, separately, the lead investor in SambaNova's Series D in 2021.^[18]^[11] SoftBank has used SambaNova DataScale systems as part of its Japan-based AI computing platform and was named as the first announced deployment partner for SambaNova's next-generation SN50 RDU.^[11]

Saudi Aramco, the Saudi national oil company, signed a memorandum of understanding with SambaNova to deploy on-site SN40L systems for Metabrain, an internal large language model trained on roughly 90 years of Aramco's operational and exploration data and used for industrial AI applications.^[12]

Analog Devices announced on January 10, 2024 that it would deploy the SambaNova Suite enterprise-wide to support generative AI applications across the global semiconductor company, making ADI one of the first publicly-named industrial enterprise users of the SN40L platform.^[36]

JPMorgan Chase was disclosed in July 2026 as an SN40L inference customer. SambaNova chief executive Rodrigo Liang described the bank's decision to run its inference workloads on SambaNova hardware as "a big deal" that "sends a message to the banking industry," positioning the SN40L as an on-premises alternative for regulated financial institutions that prefer not to depend entirely on public cloud GPUs.^[49]

In the U.S. national-laboratory complex, Argonne National Laboratory announced on November 18, 2024 (at the SC24 supercomputing conference in Atlanta) that the Argonne Leadership Computing Facility had deployed a new 16-RDU SambaNova DataScale SN40L cluster as part of its AI Testbed, available to the U.S. scientific community via project allocations and the National AI Research Resource Pilot.^[13] Argonne uses the system to support inference for projects including the AuroraGPT foundation model and applications in drug discovery, climate science and brain mapping.^[13] Oak Ridge and Lawrence Livermore National Laboratories are also listed by SambaNova as customers, along with Japan's RIKEN Center for Computational Science, OTP Bank, Accenture and NetApp.^[12]

OVHcloud, the French sovereign-cloud provider, announced in late 2025 that it had selected SambaNova RDUs to power its AI Endpoints inference service.^[37]

How much does the SN40L cost?

SambaNova has consistently declined to publish unit pricing for either the SN40L chip or the DataScale and SambaRack systems built around it; per-system prices are quoted privately and have been described in trade-press reporting as ranging from approximately $500,000 to several million dollars depending on the configuration.^[22]^[38]

The company offers three main commercial constructs:

CapEx purchase of DataScale or SambaRack hardware for on-premises deployment, supplemented by SambaFlow software licences.^[38]
A subscription model under which customers pay an annual fee for use of DataScale hardware together with the Dataflow-as-a-Service software stack and pre-trained foundation models, removing the capital-acquisition step.^[38]
SambaCloud consumption-based access via OpenAI-compatible APIs in Free, Developer and Enterprise tiers, billed per token for paid tiers.^[10]^[33]

Beginning in July 2025, SambaManaged added a turnkey hybrid model in which SambaNova operates a private SambaCloud instance inside the customer's own facility, blending the security profile of on-premises hardware with the operational economics of cloud consumption pricing.^[33]

Company status and strategic shift (2024 to 2026)

The SN40L was launched into a market that subsequently shifted decisively in favour of inference-only workloads dominated by NVIDIA, and SambaNova's business model was restructured in response.

In April 2025, SambaNova laid off 77 employees, about 15 percent of its global workforce, in conjunction with a strategic pivot away from training workloads and toward an inference-cloud business model. The company filed two WARN notices in California and Washington.^[39]^[40] Chief executive Rodrigo Liang described the move in subsequent interviews as a response to the realisation that "the industry has shifted to primarily inference" and that the AI inference market would dwarf the training market in dollar volume.^[41]

Subsequent reporting from October and December 2025 indicated that SambaNova had retained an investment bank and entered exclusive acquisition negotiations with Intel at an enterprise value of approximately $1.6 billion including debt, a steep mark-down from the $5.1 billion 2021 valuation.^[42]^[43] The deal ultimately did not close. Instead, on February 24, 2026, SambaNova announced a $350 million Series E funding round co-led by Vista Equity Partners and Cambium Capital, with participation from Intel Capital, Battery Ventures and other new and existing investors.^[44]^[45] The round implied a post-money valuation of approximately $2.2 billion, materially below the 2021 peak.^[44]

The same February 2026 announcement introduced the SN50, SambaNova's next-generation RDU and the successor to the SN40L. The SN50 retains the same three-tier memory philosophy with 64 GB of HBM, 432 MB of on-chip SRAM (slightly smaller than the SN40L's 520 MB) and 256 GB to 2 TB of DDR5 per socket, while delivering an advertised 5x peak performance and 4x network bandwidth relative to the SN40L, and allowing up to 256 accelerators to be linked over a multi-terabyte-per-second interconnect to target models above 10 trillion parameters.^[45]^[46] SambaNova has stated that the SN50 will begin shipping in the second half of 2026, with SoftBank named as the first customer for SN50-powered low-latency inference services in Asia-Pacific.^[45]^[46] An Intel collaboration combining Xeon CPUs with SambaNova RDUs was announced simultaneously.^[44]

The financial picture then reversed sharply. On July 8, 2026, SambaNova announced the first close of a $1 billion Series F round at an $11 billion post-money valuation, led by General Atlantic with participation from Intel, Vista Equity Partners, BlackRock, the Qatar Investment Authority, T. Rowe Price, Capital Group and more than a dozen other investors.^[49]^[50] The round valued the company at roughly five times its February 2026 Series E mark and above its 2021 peak, a swing Liang attributed to enterprise demand for alternatives to NVIDIA and to marquee customer wins such as JPMorgan Chase; he said the new capital would be used to "secure the supply chain."^[49]^[50]

For at least the next product cycle, the SN40L will remain the chip running both SambaCloud's public endpoints and the great majority of installed customer systems, while the SN50 ramps to volume production.

How was the SN40L received?

Trade-press reception of the SN40L at launch was broadly positive, with reviewers focusing on three architectural choices that distinguish it from contemporary GPU and ASIC competitors.

First, the three-tier memory hierarchy was widely highlighted as the chip's defining contribution. ServeTheHome described the system as the first commercial accelerator to combine large on-chip SRAM, on-package HBM and direct-attached pluggable DDR in a single coherent address space.^[25] The Next Platform noted that this architecture amounts to an explicit rejection of the GPU industry's assumption that AI weight sets must fit in HBM, and observed that the SN40L's eight-socket node can hold approximately 71 separate Llama-2-70B-class models simultaneously without paging.^[22]

Second, the streaming dataflow programming model was characterised by analysts as both the chip's central strength and its central commercial risk, since the SambaFlow compiler is the only path to using the silicon and the chip cannot run hand-written CUDA-style kernels.^[21]^[31] SambaNova has argued in response that the compiler-only model is precisely what allows automatic spatial fusion of arbitrarily complex graphs and that customers benefit from being insulated from low-level optimisation work.^[4]^[21]

Third, analysts noted the chip's emphasis on full-precision inference. By focusing the compute units on BF16 and FP32 (with no first-class INT8 or FP8 path), the SN40L positions itself for enterprise and scientific customers who want to deploy open-source models without the accuracy regressions associated with aggressive quantisation, a positioning that has been validated by its choice of Llama 3.1 405B "at full 16-bit precision" as its headline benchmark.^[14]^[26]

Among the SN40L's most direct architectural competitors, the Groq LPU targets the same low-latency inference market with a different trade-off, prioritising deterministic single-stream latency over high on-package memory capacity; the Cerebras WSE-3 takes the opposite extreme to SambaNova by integrating an entire wafer of compute and SRAM with no on-package HBM; and the Etched Sohu takes the most extreme position by hard-wiring transformer inference into the silicon. The SN40L sits between these in design space: a programmable spatial dataflow with a balanced multi-tier memory rather than either fully model-specific silicon or a single-tier memory architecture.

References

Business Wire (SambaNova press release), "SambaNova Unveils New AI Chip, the SN40L, Powering its Full Stack AI Platform," September 19, 2023. https://www.businesswire.com/news/home/20230919534495/en/SambaNova-Unveils-New-AI-Chip-the-SN40L-Powering-its-Full-Stack-AI-Platform ↩
HPCwire / AIwire, "SambaNova Unveils New AI Chip, the SN40L, Powering Its Full Stack AI Platform," September 19, 2023. https://www.hpcwire.com/aiwire/2023/09/19/sambanova-unveils-new-ai-chip-the-sn40l-powering-its-full-stack-ai-platform/ ↩
SiliconANGLE, "SambaNova debuts self-configuring AI chip with 1,040 cores and high-speed memory," September 19, 2023. https://siliconangle.com/2023/09/19/sambanova-debuts-self-configuring-ai-chip-140-cores-high-speed-memory/ ↩
Prabhakar, R., et al., "SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts," arXiv:2405.07518, May 13, 2024. https://arxiv.org/html/2405.07518v1 ↩
The Next Platform, "SambaNova Tackles Generative AI With New Chip And New Approach," September 20, 2023. https://www.nextplatform.com/2023/09/20/sambanova-tackles-generative-ai-with-new-chip-and-new-approach/ ↩
ServeTheHome, "SambaNova SN40L RDU for Trillion Parameter AI Models," 2023. https://www.servethehome.com/sambanova-sn40l-rdu-for-trillion-parameter-ai-models/ ↩
IEEE Xplore (ISSCC 2025), "16.4: SambaNova SN40L: A 5nm 2.5D Dataflow Accelerator with Three Memory Tiers for Trillion Parameter AI," February 16, 2025. https://ieeexplore.ieee.org/document/10904578 ↩
SambaNova, "SN40L RDU | Next-Gen AI Chip for Inference at Scale" (product page). https://sambanova.ai/products/rdu-ai-chips ↩
SambaNova, "SambaRack SN40L-16" datasheet (2025). https://sambanova.ai/hubfs/SambaRack%20data%20sheet%20template%2007%2009%2025.pdf ↩
The Register, "SambaNova Cloud serves up Llama 3.1 405B at 100+ token/s," September 10, 2024. https://www.theregister.com/2024/09/10/sambanovas_inference_cloud/ ↩
Data Center Dynamics, "SambaNova unveils SN50 AI chip, Intel partnership, and $350m fundraise," February 24, 2026. https://www.datacenterdynamics.com/en/news/sambanova-unveils-sn50-ai-chip-intel-partnership-and-350m-fundraise/ ↩
Arab News, "Startup of the Week - US-based SambaNova sets sights on Saudi market." https://www.arabnews.com/node/2571362/business-economy ↩
SambaNova / Argonne Leadership Computing Facility, "Argonne National Laboratory Deploys a New SambaNova Inference-Optimized Cluster to Support AI-Driven Science," November 18, 2024. https://www.alcf.anl.gov/news/argonne-national-laboratory-deploys-new-sambanova-inference-optimized-cluster-support-ai ↩
SambaNova, "Llama 3.1 405B 4X faster on SambaNova | World Record" (blog). https://sambanova.ai/blog/speed-record-on-llama-3.1-405b ↩
SambaNova, "Meet Our Talented AI Team" (corporate team page). https://sambanova.ai/about/ai-team ↩
University of Michigan CSE, "SambaNova, founded by alumnus Kunle Olukotun, emerges from stealth mode with AI-accelerated HPC system," 2018. https://cse.engin.umich.edu/stories/sambanova-founded-by-alumnus-kunle-olukotun-emerges-from-stealth-mode-with-ai-accelerated-hpc-system ↩
Stanford HAI / Stanford Data Science, "Christopher Re" faculty profile. https://datascience.stanford.edu/people/chris-re ↩
Sacra, "SambaNova Systems valuation, funding & news." https://sacra.com/c/sambanova-systems/ ↩
SambaNova, "Why SambaNova's SN40L Chip Is the Best for Inference" (blog). https://sambanova.ai/blog/sn40l-chip-best-inference-solution ↩
SambaNova, "Accelerated Computing with a Reconfigurable Dataflow Architecture" (whitepaper). https://sambanova.ai/hubfs/23945802/SambaNova_Accelerated-Computing-with-a-Reconfigurable-Dataflow-Architecture_Whitepaper_English-1.pdf ↩
TechTarget, "SambaNova AI launches new chip: the SN40L." https://www.techtarget.com/searchenterpriseai/news/366552594/SambaNova-AI-launches-new-chip-the-SN40L ↩
The Next Platform, "SambaNova Tackles Generative AI With New Chip And New Approach" (process and transistor comparison versus SN30). https://www.nextplatform.com/2023/09/20/sambanova-tackles-generative-ai-with-new-chip-and-new-approach/ ↩
SambaNova, "Samba-1 datasheet." https://sambanova.ai/hubfs/White%20papers/Samba-1-Datasheet.pdf ↩
VentureBeat, "SambaNova debuts 1 trillion parameter Composition of Experts model for enterprise gen AI." https://venturebeat.com/ai/sambanova-debuts-1-trillion-parameter-composition-of-experts-model-for-enterprise-gen-ai ↩
EE Times, "SambaNova Adds HBM for LLM Inference Chip." https://www.eetimes.com/sambanova-adds-hbm-for-llm-inference-chip/ ↩
TechInsights, "SambaNova Releases Fourth-Gen Chip." https://www.techinsights.com/blog/sambanova-releases-fourth-gen-chip ↩
SambaNova, "SambaNova DataScale" datasheet. https://sambanova.ai/hubfs/23945802/downloads/Product%20Collateral/SambaNova_DataScale_Data-Sheet_10.08.21EN.pdf ↩
SambaNova, "Why SambaNova's SN40L Chip Is the Best for Inference" (DeepSeek R1 671B and Llama 4 Maverick). https://sambanova.ai/blog/sn40l-chip-best-inference-solution ↩
SiliconANGLE, "Robust AI infrastructure powered by SambaNova SN40L," June 23, 2025. https://siliconangle.com/2025/06/23/sambanova-sn40l-advances-robust-ai-infrastructure-efficiency-roboticsaiinfrastructure/ ↩
SambaNova Documentation, "SambaNova resources." https://docs.sambanova.ai/resources/latest/index.html ↩
SambaNova Documentation (legacy), "Compiler optimization modes." https://docs-legacy.sambanova.ai/developer/latest/compiler-o1.html ↩
SambaNova Documentation, "SambaNova glossary." https://docs.sambanova.ai/resources/latest/glossary.html ↩
SambaNova, "Introducing SambaManaged: A Turnkey Path to AI for Data Centers" (July 2025). https://sambanova.ai/blog/introducing-sambamanaged-a-turnkey-path-to-ai-for-data-centers ↩
SambaNova, "SambaNova Delivers Generative AI Capabilities to the Enterprise: the New SambaNova Suite," February 28, 2023. https://sambanova.ai/press/sambanova-delivers-generative-ai-capabilities-to-the-enterprise ↩
Analytics India Magazine, "SambaNova's Llama 3.1 405B Model Hits 114 Tokens Per Second, Setting Speed Record." https://analyticsindiamag.com/ai-news-updates/sambanovas-llama-3-1-405b-model-hits-114-tokens-per-second-setting-speed-record/ ↩
Analog Devices, "Analog Devices Deploys SambaNova Suite to Facilitate Breakthrough Generative AI Capabilities at Enterprise Scale," January 10, 2024. https://www.analog.com/en/about-adi/news-room/press-releases/2024/1-10-2024-adi-deploys-sambanova-suite.html ↩
OVHcloud, "OVHcloud chooses SambaNova to optimize its AI Endpoints inference service." https://corporate.ovhcloud.com/en/newsroom/news/sambanova/ ↩
SambaNova, "SambaNova offers flexibility and choice with subscription pricing." https://sambanova.ai/blog/subscription-pricing/ ↩
Data Center Dynamics, "SambaNova lays off 77 employees as company pivots focus from training to inference," April 2025. https://www.datacenterdynamics.com/en/news/sambanova-lays-off-77-employees-as-company-pivots-focus-from-training-to-inference/ ↩
EE Times, "SambaNova Lays Off 15% of Workforce To Refocus on Inference," April 2025. https://www.eetimes.com/sambanova-lays-off-15-of-workforce-to-refocus-on-inference/ ↩
EE Times, "SambaNova Shifts To Inference, Courts Cloud Customers." https://www.eetimes.com/sambanova-shifts-to-inference-but-leaves-cloud-to-customers/ ↩
Bloomberg, "Intel Is Said to Near $1.6 Billion Deal for Chip Firm SambaNova," December 12, 2025. https://www.bloomberg.com/news/articles/2025-12-12/intel-nears-1-6-billion-deal-for-ai-chip-startup-sambanova ↩
SiliconANGLE, "Report: Intel could acquire inference chip startup SambaNova for $1.6B," December 12, 2025. https://siliconangle.com/2025/12/12/report-intel-acquire-inference-chip-startup-sambanova-1-6b/ ↩
The Register, "SambaNova raises $350M with Intel backing," February 24, 2026. https://www.theregister.com/2026/02/24/sambanova_intel_funding/ ↩
Intel Capital, "SambaNova Unveils Fastest Chip for Agentic AI, Collaborates with Intel, and Raises $350M+." https://www.intelcapital.com/sambanova-unveils-fastest-chip-for-agentic-ai-collaborates-with-intel-and-raises-350m/ ↩
HPCwire, "SambaNova Eyes 10-Trillion Parameter Models for Agentic AI with New Chip," February 24, 2026. https://www.hpcwire.com/2026/02/24/sambanova-eyes-10-trillion-parameter-models-for-agentic-ai-with-new-chip/ ↩
Prabhakar, R., et al. (SambaNova), "SambaNova SN40L RDU: Breaking the Barrier of Trillion+ Parameter Scale," Hot Chips 2024 (HC2024) presentation. https://hc2024.hotchips.org/assets/program/conference/day1/48_HC2024.Sambanova.Prabhakar.final-withoutvideo.pdf ↩
SambaNova, "SambaNova Launches Fastest AI Platform with Llama 3.1 405B at 132 Tokens per Second," September 10, 2024. https://sambanova.ai/news/fastest-ai-platform-with-llama-3.1-405b-at-132-tokens-per-second ↩
TechCrunch, "AI chip maker SambaNova raises $1B at $11B valuation, 5 months after last mega round," July 8, 2026. https://techcrunch.com/2026/07/08/sambanova-draws-1b-at-11b-valuation-in-series-f-first-close/ ↩
SiliconANGLE, "Inference chip startup SambaNova valued at $11B in $1B funding round," July 8, 2026. https://siliconangle.com/2026/07/08/inference-chip-startup-sambanova-valued-11b-1b-funding-round/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributor · full history

Suggest edit

What links here

Etched Sohu SambaNova SN50 Untether AI

Background: SambaNova Systems and the road to the SN40L

How does the SN40L's dataflow architecture work?

Tile structure and the RDU array

Reconfigurable Dataflow Network

Address Generation and Coalescing Unit (AGCU)

Spatial kernel fusion and Composition of Experts

What is the SN40L's three-tier memory architecture?

Tier 1: 520 MiB of on-chip SRAM

Tier 2: 64 GiB of co-packaged HBM3

Tier 3: up to 1.5 TiB of DDR5

How is the SN40L built, and what are its specifications?

What systems is the SN40L sold in?

How is the SN40L programmed?

How fast is the SN40L?

Who uses the SN40L?

How much does the SN40L cost?

Company status and strategic shift (2024 to 2026)

How was the SN40L received?

See also

References

Improve this article

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here