# SambaNova SN40L

> Source: https://aiwiki.ai/wiki/sambanova_sn40l
> Updated: 2026-06-09
> Categories: AI Hardware, AI Infrastructure
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

The **SambaNova SN40L** is a reconfigurable dataflow AI accelerator designed by [SambaNova Systems](/wiki/sambanova) and unveiled on September 19, 2023.[^1][^2] Marketed as a fourth-generation Reconfigurable Dataflow Unit (RDU) and the first member of SambaNova's "Cerulean" architectural family, the SN40L combines two TSMC 5 nm logic dies, 64 GB of co-packaged HBM3, and up to 1.5 TB of direct-attached DDR5 in a three-tier memory hierarchy intended to run trillion-parameter generative models on a single 8-socket node.[^1][^3][^4] The "40" in the product name denotes SambaNova's fourth chip generation, while the "L" indicates that the silicon is specifically tuned for large language model workloads.[^5]

Where contemporary GPU accelerators such as the [NVIDIA H100](/wiki/nvidia_h100) rely on a thread-and-kernel programming model, the SN40L instead exposes a spatial mesh of 1,040 Pattern Compute Units (PCUs) and 1,040 Pattern Memory Units (PMUs) that the SambaFlow compiler configures into long, statically-scheduled pipelines.[^4][^6] Each socket reaches a peak of 638 BF16 TFLOPS and contains 102 billion transistors fabricated on TSMC's N5 node and integrated using 2.5D Chip-on-Wafer-on-Substrate (CoWoS-S) packaging.[^7][^4] At the system level, a SambaRack SN40L-16 chassis aggregates 16 RDU sockets to support models up to roughly five trillion parameters and a sequence length above 256k tokens, while drawing an average of just 10 kWh during inference, low enough to be air-cooled in conventional 19-inch racks.[^3][^8][^9]

The SN40L has become the workhorse of SambaNova's pivot away from training: it underlies the SambaCloud public inference service launched on September 10, 2024,[^10] the SambaNova Suite enterprise stack, and on-premises deployments at customers including [SoftBank](/wiki/softbank_group), Saudi Aramco, Analog Devices, [Stanford University](/wiki/stanford_university) research groups, OTP Bank, the RIKEN Center for Computational Science, and U.S. Department of Energy national laboratories such as Argonne, Oak Ridge, and Lawrence Livermore.[^11][^12][^13] The chip's most-cited public benchmark is its September 2024 world record of 132 output tokens per second on Meta's 405-billion-parameter Llama 3.1 model at full 16-bit precision, achieved on a single 16-socket SN40L node.[^14][^10]

## Background: SambaNova Systems and the road to the SN40L

SambaNova Systems was founded in Palo Alto, California, in 2017 by three Stanford-affiliated technologists: Rodrigo Liang, a former senior vice president for SPARC and other server processors at Sun Microsystems and Oracle; [Stanford](/wiki/stanford_university) Cadence Design Professor of Electrical Engineering and Computer Science Kunle Olukotun, widely credited as the academic father of the multicore microprocessor through his 1990s work on the Hydra chip-multiprocessor at Stanford; and Christopher Ré, a Stanford computer-science associate professor and MacArthur Fellow whose work in the Stanford DAWN lab on weak supervision and learned systems shaped the company's software direction.[^15][^16][^17] All three remain involved with the company, with Liang serving as chief executive, Olukotun as chief technologist, and Ré as a technical adviser.[^15][^17]

The company emerged from stealth in March 2018 with a $56 million Series A round led by Walden International and Google Ventures, raised a $150 million Series B led by Intel Capital in April 2019, a $250 million Series C led by BlackRock in February 2020, and a $676 million Series D led by SoftBank Vision Fund 2 in April 2021 that valued the firm at approximately $5.1 billion.[^18] Total disclosed venture funding through the SN40L's launch was about $1.1 billion, making SambaNova one of the best-funded AI-silicon startups of its era alongside Cerebras and Graphcore.[^18]

The SN40L is the fourth chip in SambaNova's RDU lineage. The first-generation SN10, introduced in 2020, established the dataflow architecture and was followed by the SN20 and the die-to-die SN30, which together carried the "Cardinal" architectural codename, a reference to Stanford's school colour.[^5][^19] The SN40L, by contrast, is the first member of a new family that SambaNova internally calls "Cerulean," and it is the first SambaNova chip to integrate on-package High Bandwidth Memory.[^5][^19]

## Reconfigurable dataflow architecture

The SN40L is a **dataflow** accelerator: instead of executing a stream of instructions through a fixed pipeline, it spatially maps each layer of a deep-learning graph onto a physical region of the chip and streams activations through that mapping in a producer-consumer fashion.[^20][^21] This is the same principle SambaNova's prior RDUs used, but the SN40L expands both the size of the spatial fabric and the reachable working set.

### Tile structure and the RDU array

Each SN40L socket exposes a two-dimensional, packet-switched mesh of reconfigurable tiles built from two main element types.[^4][^6] **Pattern Compute Units (PCUs)** are configurable matrix engines that can be programmed at compile time to behave as either a systolic array (for general matrix multiplication) or as a pipelined SIMD lane with cross-lane reduction, supporting BF16, FP32 and INT32 numerics together with a tail-section that implements transcendental functions and stochastic rounding.[^4] **Pattern Memory Units (PMUs)** are banked SRAM scratchpads of programmable depth, paired with their own address-generation ALU, banking-and-predication logic and a data-alignment unit that performs transpose and other tensor reshaping in-place without round-tripping through DRAM.[^4]

The SN40L socket aggregates **1,040 PCUs and 1,040 PMUs** distributed across two identical accelerator dies that are joined edge-to-edge via the 2.5D CoWoS interposer, yielding the published peak of 638 BF16 TFLOPS per socket.[^4][^7] Compared with the prior-generation SN30, the SN40L contains about 18.8 percent fewer functional units but achieves comparable raw throughput thanks to the 7 nm-to-5 nm shrink and an approximately 12 percent clock-speed lift.[^22]

### Reconfigurable Dataflow Network

PCUs and PMUs are stitched together by a **Reconfigurable Dataflow Network (RDN)**: a mesh-based packet-switched interconnect with three physically separate fabrics carrying vector data, scalar data, and control packets respectively.[^4] The RDN supports multi-cast routing, dynamic packet ordering using sequence identifiers, and is what allows the compiler to fuse hundreds of operators into a single dataflow kernel without committing intermediate tensors to off-chip memory.[^4][^21]

### Address Generation and Coalescing Unit (AGCU)

To bridge the on-chip fabric to the rest of the system, the SN40L incorporates **Address Generation and Coalescing Units (AGCUs)** that mediate accesses to local HBM and DDR, host memory across PCIe, and to other RDUs in a node.[^4] The AGCU implements a peer-to-peer protocol enabling direct, point-to-point reads and writes between sockets without traversing host memory or the DDR tier, which SambaNova reports is essential for sharding very large models across the 16 sockets of a SambaRack.[^4]

### Spatial kernel fusion and Composition of Experts

The architectural payoff of the dataflow model is a programming style SambaNova calls **streaming spatial kernel fusion**. Because the compiler can lay an entire transformer decoder block (and often whole sequences of blocks) onto the fabric as a single dataflow graph, the chip avoids the per-kernel launch overhead, register-file traffic, and on-chip-cache pressure that limits GPU kernel fusion. SambaNova's own measurements report 2× to 13× speed-ups on individual operator fusion microbenchmarks versus a hand-tuned GPU baseline.[^4]

The SN40L's three-tier memory was, in turn, designed to make a model-serving pattern that SambaNova calls **Composition of Experts (CoE)** practical: large numbers of independently-trained dense models (each typically 7 billion to 70 billion parameters) are stored in DDR, paged into HBM on demand, and routed per-prompt by a lightweight gating model that decides which "expert" model to invoke.[^4][^23] This is distinct from the more familiar [Mixture of Experts (MoE)](/wiki/mixture_of_experts) approach, in which a single trained network contains sparsely-activated expert sublayers; in CoE, the experts are entirely separate, separately-trained models, and SambaNova reports its Samba-1 product packaged more than fifty such experts into a 1.3-trillion-parameter logical model.[^23][^24]

## Three-tier memory architecture

The SN40L's defining hardware feature is a **three-tier memory hierarchy** that combines large on-chip SRAM, on-package HBM3, and direct-attached DDR5 within a single socket.[^4][^25] Earlier SambaNova RDUs had only SRAM and DDR; the addition of HBM was the principal motivation for the "L" variant's existence.[^25]

| Tier | Capacity per socket | Implementation | Aggregate bandwidth |
|---|---|---|---|
| On-chip SRAM | 520 MiB | Distributed across 1,040 PMUs | Hundreds of TB/s |
| HBM3 | 64 GiB | Co-packaged via CoWoS-S interposer | ~2 TB/s |
| DDR5 DRAM | up to 1.5 TiB | Pluggable DIMMs, direct-attached | >1 TB/s aggregate |

Source: SambaNova arXiv whitepaper (May 2024).[^4]

### Tier 1: 520 MiB of on-chip SRAM

The 520 MiB of on-die SRAM in each SN40L socket is, at the time of the chip's launch, the largest on-chip memory in any commercial dataflow or AI ASIC other than wafer-scale designs such as the [Cerebras WSE-3](/wiki/cerebras_wse_3).[^25][^4] It is not arranged as a single L1 or L2 cache; instead, each PMU contains a programmable scratchpad bank that the SambaFlow compiler explicitly partitions among intermediate tensors and weights.[^4] This high-density, banked SRAM is what makes streaming kernel fusion viable: rather than spilling intermediate activations to HBM between operators (as a GPU must), the SN40L can keep them resident in the local PMU bank that produced them, then forward them via the RDN to the next consuming PCU on the next clock-domain.[^4][^21]

### Tier 2: 64 GiB of co-packaged HBM3

Above the SRAM sits **64 GiB of HBM3** integrated onto the package via the 2.5D CoWoS-S interposer.[^4][^7] In SambaNova's own framing, HBM acts as a **large L3 cache** rather than as primary model storage; the company architects deployments so that HBM holds the activations and currently active expert weights for a workload, while persistent storage of model weights happens in DDR.[^22][^25] HBM3 was a deliberate choice for the SN40L: SambaNova's earlier chips used larger but slower DDR-only configurations, and the addition of HBM was driven by the bandwidth requirements of large-language-model decode steps in which the entire context tensor must be re-read every token.[^25]

### Tier 3: up to 1.5 TiB of DDR5

The third tier is up to **1.5 TiB of direct-attached DDR5 DRAM per socket** in pluggable DIMM form factor.[^4][^3] This is an unusually large attached-DRAM capacity for any AI accelerator, and it is the architectural pivot point of SambaNova's CoE pitch: at 1.5 TB per socket and twelve sockets per typical eight-RDU DataScale node plus host, a single rack can hold hundreds of full-precision dense models without paging from disk.[^4][^25] SambaNova reports that the DDR-to-HBM transfer rate exceeds 1 TB/s in aggregate within a single SN40L node, which it cites as the key enabler for sub-millisecond model swap latency in CoE serving.[^4]

The hierarchy as a whole is what SambaNova has marketed as a solution to the **"AI memory wall"**: by trading some peak FLOPS density for more on-package and direct-attached capacity, the SN40L can keep the entire weight set of even multi-hundred-billion-parameter dense LLMs within a single node, eliminating the cross-node communication that bottlenecks GPU deployments at similar parameter counts.[^4]

## Manufacturing and physical specifications

The SN40L is **fabricated on TSMC's N5 (5 nm) process** and integrated using TSMC's 2.5D **CoWoS-S** advanced packaging.[^7][^25] Each socket contains two identical 600 mm² accelerator dies plus the HBM3 stacks on a common silicon interposer.[^7] SambaNova reports a per-socket transistor count of approximately **102 billion**, comparable in transistor density to NVIDIA's contemporaneous data-center GPUs, with **520 MB of on-chip SRAM** implemented in high-density memory cells.[^7]

The published headline performance of each socket is **638 BF16 TFLOPS**, with the same units also citing 640 BF16 TFLOPS and an FP16 figure of approximately 688 TFLOPS depending on operating mode.[^4][^7][^26] The chip notably lacks dedicated low-precision accelerators for INT8 and FP8, an architectural choice consistent with SambaNova's emphasis on inference at "no-quantization" BF16 / FP16 / mixed precision for accuracy-sensitive enterprise workloads.[^26][^14] Estimated thermal-design power for the accelerator socket is reported at roughly 600 W in third-party coverage, though SambaNova itself emphasises rack-level power: a 16-socket SambaRack averages about 10 kW under inference load, an order of magnitude lower than typical 140-kW NVIDIA HGX racks of similar parameter capacity.[^8][^26]

## DataScale and SambaRack system configurations

The SN40L is not sold as a discrete chip but as part of integrated rack systems originally branded **SambaNova DataScale** and, since 2024, increasingly branded **SambaRack**.[^27][^9]

The canonical building block is the **DataScale SN40L-2** module, which integrates two RDUs together with their host components.[^27] Four such modules combine into the eight-RDU **SN40L node**, the company's reference unit for serving a five-trillion-parameter Composition-of-Experts workload.[^3][^9] The largest standard configuration is the **SambaRack SN40L-16**, a single 19-inch rack containing 16 RDU sockets across eight SN40L-2 modules, which is the platform SambaNova quotes for its world-record Llama 3.1 405B and DeepSeek-R1 671B inference benchmarks.[^9][^14][^28]

A SambaRack SN40L-16 therefore aggregates approximately **8 GB of on-chip SRAM, 1 TB of HBM3, and up to 24 TB of attached DDR5** across its 16 sockets, sufficient to host hundreds of distinct foundation models simultaneously.[^27][^4] Air cooling at ~10 kW per rack allows the systems to be deployed in standard enterprise data-center facilities without specialised liquid-cooling retrofits.[^8][^29]

## Software stack: SambaFlow, SambaStudio, SambaCloud

The SN40L would be unusable without its compiler stack, because there is no public ISA for the RDU; users do not write CUDA-style kernels but rather submit standard model graphs that SambaFlow lowers to RDU configuration bitstreams.[^30][^31]

**SambaFlow** is the core compiler and runtime. It ingests model definitions from [PyTorch](/wiki/pytorch) or [TensorFlow](/wiki/tensorflow), traces them into a dataflow graph, and produces a binary "PEF" file that contains the spatial mapping of every operator onto specific PCUs and PMUs as well as the routing configuration for the RDN.[^30] Within SambaFlow, the compiler performs operator fusion, tiling, weight partitioning and inter-socket sharding automatically; SambaNova emphasises that an entire transformer decoder layer is typically compiled as a single kernel call, eliminating the per-kernel launch overhead that limits GPU performance on long generation sequences.[^30][^4]

**SambaStudio** is a graphical, browser-based platform layered on top of SambaFlow that gives data scientists a model-management workflow: dataset upload, fine-tuning, deployment, and inspection of running model endpoints without leaving the GUI.[^30][^32]

**SambaCloud** (originally announced as "SambaNova Cloud") is the public-internet inference service launched on September 10, 2024.[^10] It exposes Meta's Llama-family open-source models, DeepSeek-R1, and others via an OpenAI-compatible REST API hosted on SambaRack SN40L-16 nodes.[^10][^14] SambaCloud comes in Free, Developer, and Enterprise tiers, with the Enterprise tier offering a SambaNova-managed instance of the cloud stack deployable inside a customer's own data centre under what SambaNova calls **SambaManaged**, which was launched in July 2025.[^33] An adjacent product, the original **SambaNova Suite** announced on February 28, 2023, is the full-stack offering combining DataScale hardware, SambaFlow and a curated set of pre-trained open-source models for enterprise on-premises deployment.[^34]

## Performance and benchmarks

The SN40L's most heavily-publicised benchmarks have been generative-model token-throughput records measured at full 16-bit precision on standard open-source LLMs.

On **Llama 3.1 405B**, a 16-socket SambaRack SN40L-16 first reached **132 output tokens/second/user** at SambaCloud's September 2024 launch, then **129 output tokens/second/user** in a later SambaNova publication, and is reported in the company's blog and arXiv papers at peak rates exceeding 100 tokens/s/user across batch sizes up to four concurrent requests.[^14][^10][^4] A separate Artificial Analysis-verified test reported **114 tokens/s** on the same model.[^14] At the time these results were published, the fastest comparable GPU-based service was measured by Artificial Analysis at approximately 72 tokens/s on the same model.[^14]

On **Llama 3.1 70B**, the same 16-socket node reached **457 to 461 output tokens/second**, while on the smaller **Llama 3.1 8B** SambaNova has cited rates above **1,042 tokens/second**.[^14][^10][^35] All three measurements are reported at native BF16 precision without quantisation.[^14]

For the **Composition of Experts** workload pattern that the SN40L was specifically architected to accelerate, SambaNova's arXiv whitepaper reports CoE serving speed-ups of **3.7× over an NVIDIA DGX H100** and **6.6× over an NVIDIA DGX A100** in their own measurements, together with **15× to 31× faster model-switching latency** versus the same DGX systems and **up to 19× lower machine footprint** for the same aggregate parameter count.[^4]

The SN40L has not, as of the time of writing, posted official entries to the [MLPerf](/wiki/mlperf) Inference suite; this is in line with SambaNova's stated preference for end-to-end token-rate benchmarks on real LLMs over the historically training-centric MLPerf metrics.

## Customers and deployments

Public customer references for the SN40L span enterprise, sovereign and scientific deployments.

**SoftBank** has been a multi-generation customer and was, separately, the lead investor in SambaNova's Series D in 2021.[^18][^11] SoftBank has used SambaNova DataScale systems as part of its Japan-based AI computing platform and was named as the first announced deployment partner for SambaNova's next-generation SN50 RDU.[^11]

**Saudi Aramco**, the Saudi national oil company, signed a memorandum of understanding with SambaNova to deploy on-site SN40L systems for **Metabrain**, an internal large language model trained on roughly 90 years of Aramco's operational and exploration data and used for industrial AI applications.[^12]

**Analog Devices** announced on January 10, 2024 that it would deploy the SambaNova Suite enterprise-wide to support generative AI applications across the global semiconductor company, making ADI one of the first publicly-named industrial enterprise users of the SN40L platform.[^36]

In the U.S. national-laboratory complex, **Argonne National Laboratory** announced on November 18, 2024 (at the SC24 supercomputing conference in Atlanta) that the Argonne Leadership Computing Facility had deployed a new **16-RDU SambaNova DataScale SN40L** cluster as part of its AI Testbed, available to the U.S. scientific community via project allocations and the National AI Research Resource Pilot.[^13] Argonne uses the system to support inference for projects including the AuroraGPT foundation model and applications in drug discovery, climate science and brain mapping.[^13] **Oak Ridge** and **Lawrence Livermore National Laboratories** are also listed by SambaNova as customers, along with **Japan's RIKEN Center for Computational Science**, **OTP Bank**, **Accenture** and **NetApp**.[^12]

OVHcloud, the French sovereign-cloud provider, announced in late 2025 that it had selected SambaNova RDUs to power its **AI Endpoints** inference service.[^37]

## Pricing and business model

SambaNova has consistently declined to publish unit pricing for either the SN40L chip or the DataScale and SambaRack systems built around it; per-system prices are quoted privately and have been described in trade-press reporting as ranging from approximately **$500,000 to several million dollars** depending on the configuration.[^22][^38]

The company offers three main commercial constructs:

1. **CapEx purchase** of DataScale or SambaRack hardware for on-premises deployment, supplemented by SambaFlow software licences.[^38]
2. A **subscription model** under which customers pay an annual fee for use of DataScale hardware together with the **Dataflow-as-a-Service** software stack and pre-trained foundation models, removing the capital-acquisition step.[^38]
3. **SambaCloud** consumption-based access via OpenAI-compatible APIs in Free, Developer and Enterprise tiers, billed per token for paid tiers.[^10][^33]

Beginning in July 2025, **SambaManaged** added a turnkey hybrid model in which SambaNova operates a private SambaCloud instance inside the customer's own facility, blending the security profile of on-premises hardware with the operational economics of cloud consumption pricing.[^33]

## Company status and strategic shift (2024-2026)

The SN40L was launched into a market that subsequently shifted decisively in favour of inference-only workloads dominated by NVIDIA, and SambaNova's business model was restructured in response.

In **April 2025**, SambaNova laid off **77 employees, about 15 percent of its global workforce**, in conjunction with a strategic pivot away from training workloads and toward an inference-cloud business model. The company filed two WARN notices in California and Washington.[^39][^40] Chief executive Rodrigo Liang described the move in subsequent interviews as a response to the realisation that "the industry has shifted to primarily inference" and that the AI inference market would dwarf the training market in dollar volume.[^41]

Subsequent reporting from October and December 2025 indicated that SambaNova had retained an investment bank and entered exclusive acquisition negotiations with Intel at an enterprise value of approximately **$1.6 billion** including debt - a steep mark-down from the $5.1 billion 2021 valuation.[^42][^43] The deal ultimately did not close. Instead, on February 24, 2026, SambaNova announced a **$350 million Series E** funding round co-led by Vista Equity Partners and Cambium Capital with continued participation from Intel Capital, the Qatar Investment Authority, Saudi Arabia's sovereign wealth fund, GV, Battery Ventures, T. Rowe Price and BlackRock.[^44][^45] The round implies a post-money valuation of approximately **$2.2 billion**, materially below the 2021 peak.[^44]

The same February 2026 announcement introduced the **SN50**, SambaNova's next-generation RDU and the successor to the SN40L. The SN50 retains the same three-tier memory philosophy with 64 GB of HBM, 432 MB of on-chip SRAM (slightly smaller than the SN40L's 520 MB) and **256 GB to 2 TB** of DDR5 per socket, while delivering an advertised 5× peak performance and 4× network bandwidth relative to the SN40L.[^45][^46] SambaNova has stated that the SN50 will begin shipping in the second half of 2026, with **SoftBank** named as the first customer for SN50-powered low-latency inference services in Asia-Pacific.[^45][^46] An Intel collaboration combining Xeon CPUs with SambaNova RDUs was announced simultaneously.[^44]

For at least the next product cycle, the SN40L will remain the chip running both SambaCloud's public endpoints and the great majority of installed customer systems, while the SN50 ramps to volume production.

## Reception and analysis

Trade-press reception of the SN40L at launch was broadly positive, with reviewers focusing on three architectural choices that distinguish it from contemporary GPU and ASIC competitors.

First, the **three-tier memory hierarchy** was widely highlighted as the chip's defining contribution. ServeTheHome described the system as the first commercial accelerator to combine large on-chip SRAM, on-package HBM and direct-attached pluggable DDR in a single coherent address space.[^25] The Next Platform noted that this architecture amounts to an explicit rejection of the GPU industry's assumption that AI weight sets must fit in HBM, and observed that the SN40L's eight-socket node can hold approximately 71 separate Llama-2-70B-class models simultaneously without paging.[^22]

Second, the **streaming dataflow programming model** was characterised by analysts as both the chip's central strength and its central commercial risk, since the SambaFlow compiler is the only path to using the silicon and the chip cannot run hand-written CUDA-style kernels.[^21][^31] SambaNova has argued in response that the compiler-only model is precisely what allows automatic spatial fusion of arbitrarily complex graphs and that customers benefit from being insulated from low-level optimisation work.[^4][^21]

Third, analysts noted the chip's **emphasis on full-precision inference**. By focusing the compute units on BF16 and FP32 (with no first-class INT8 or FP8 path), the SN40L positions itself for enterprise and scientific customers who want to deploy open-source models without the accuracy regressions associated with aggressive quantisation, a positioning that has been validated by its choice of Llama 3.1 405B "at full 16-bit precision" as its headline benchmark.[^14][^26]

Among the SN40L's most direct architectural competitors, the [Groq LPU](/wiki/groq_lpu) targets the same low-latency inference market with a different trade-off, prioritising deterministic single-stream latency over high on-package memory capacity; the [Cerebras WSE-3](/wiki/cerebras_wse_3) takes the opposite extreme to SambaNova by integrating an entire wafer of compute and SRAM with no on-package HBM; and the [Etched Sohu](/wiki/etched_sohu) takes the most extreme position by hard-wiring transformer inference into the silicon. The SN40L sits between these in design space: a programmable spatial dataflow with a balanced multi-tier memory rather than either fully model-specific silicon or a single-tier memory architecture.

## See also

- [SambaNova Systems](/wiki/sambanova)
- [NVIDIA H100](/wiki/nvidia_h100)
- [NVIDIA Blackwell B200](/wiki/nvidia_blackwell_b200)
- [Cerebras WSE-3](/wiki/cerebras_wse_3)
- [Groq LPU](/wiki/groq_lpu)
- [Etched Sohu](/wiki/etched_sohu)
- [Tenstorrent](/wiki/tenstorrent)
- [High Bandwidth Memory](/wiki/high_bandwidth_memory)
- [TSMC](/wiki/tsmc)
- [Mixture of Experts](/wiki/mixture_of_experts)
- [Llama 3.1](/wiki/llama_3_1)
- [MLPerf](/wiki/mlperf)

## References

[^1]: Business Wire (SambaNova press release), "SambaNova Unveils New AI Chip, the SN40L, Powering its Full Stack AI Platform," September 19, 2023. https://www.businesswire.com/news/home/20230919534495/en/SambaNova-Unveils-New-AI-Chip-the-SN40L-Powering-its-Full-Stack-AI-Platform
[^2]: HPCwire / AIwire, "SambaNova Unveils New AI Chip, the SN40L, Powering Its Full Stack AI Platform," September 19, 2023. https://www.hpcwire.com/aiwire/2023/09/19/sambanova-unveils-new-ai-chip-the-sn40l-powering-its-full-stack-ai-platform/
[^3]: SiliconANGLE, "SambaNova debuts self-configuring AI chip with 1,040 cores and high-speed memory," September 19, 2023. https://siliconangle.com/2023/09/19/sambanova-debuts-self-configuring-ai-chip-140-cores-high-speed-memory/
[^4]: Prabhakar, R., et al., "SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts," arXiv:2405.07518, May 13, 2024. https://arxiv.org/html/2405.07518v1
[^5]: The Next Platform, "SambaNova Tackles Generative AI With New Chip And New Approach," September 20, 2023. https://www.nextplatform.com/2023/09/20/sambanova-tackles-generative-ai-with-new-chip-and-new-approach/
[^6]: ServeTheHome, "SambaNova SN40L RDU for Trillion Parameter AI Models," 2023. https://www.servethehome.com/sambanova-sn40l-rdu-for-trillion-parameter-ai-models/
[^7]: IEEE Xplore (ISSCC 2025), "16.4: SambaNova SN40L: A 5nm 2.5D Dataflow Accelerator with Three Memory Tiers for Trillion Parameter AI," February 16, 2025. https://ieeexplore.ieee.org/document/10904578
[^8]: SambaNova, "SN40L RDU | Next-Gen AI Chip for Inference at Scale" (product page). https://sambanova.ai/products/rdu-ai-chips
[^9]: SambaNova, "SambaRack SN40L-16" datasheet (2025). https://sambanova.ai/hubfs/SambaRack%20data%20sheet%20template%2007%2009%2025.pdf
[^10]: The Register, "SambaNova Cloud serves up Llama 3.1 405B at 100+ token/s," September 10, 2024. https://www.theregister.com/2024/09/10/sambanovas_inference_cloud/
[^11]: Data Center Dynamics, "SambaNova unveils SN50 AI chip, Intel partnership, and $350m fundraise," February 24, 2026. https://www.datacenterdynamics.com/en/news/sambanova-unveils-sn50-ai-chip-intel-partnership-and-350m-fundraise/
[^12]: Arab News, "Startup of the Week - US-based SambaNova sets sights on Saudi market." https://www.arabnews.com/node/2571362/business-economy
[^13]: SambaNova / Argonne Leadership Computing Facility, "Argonne National Laboratory Deploys a New SambaNova Inference-Optimized Cluster to Support AI-Driven Science," November 18, 2024. https://www.alcf.anl.gov/news/argonne-national-laboratory-deploys-new-sambanova-inference-optimized-cluster-support-ai
[^14]: SambaNova, "Llama 3.1 405B 4X faster on SambaNova | World Record" (blog). https://sambanova.ai/blog/speed-record-on-llama-3.1-405b
[^15]: SambaNova, "Meet Our Talented AI Team" (corporate team page). https://sambanova.ai/about/ai-team
[^16]: University of Michigan CSE, "SambaNova, founded by alumnus Kunle Olukotun, emerges from stealth mode with AI-accelerated HPC system," 2018. https://cse.engin.umich.edu/stories/sambanova-founded-by-alumnus-kunle-olukotun-emerges-from-stealth-mode-with-ai-accelerated-hpc-system
[^17]: Stanford HAI / Stanford Data Science, "Christopher Re" faculty profile. https://datascience.stanford.edu/people/chris-re
[^18]: Sacra, "SambaNova Systems valuation, funding & news." https://sacra.com/c/sambanova-systems/
[^19]: SambaNova, "Why SambaNova's SN40L Chip Is the Best for Inference" (blog). https://sambanova.ai/blog/sn40l-chip-best-inference-solution
[^20]: SambaNova, "Accelerated Computing with a Reconfigurable Dataflow Architecture" (whitepaper). https://sambanova.ai/hubfs/23945802/SambaNova_Accelerated-Computing-with-a-Reconfigurable-Dataflow-Architecture_Whitepaper_English-1.pdf
[^21]: TechTarget, "SambaNova AI launches new chip: the SN40L." https://www.techtarget.com/searchenterpriseai/news/366552594/SambaNova-AI-launches-new-chip-the-SN40L
[^22]: The Next Platform, "SambaNova Tackles Generative AI With New Chip And New Approach" (process and transistor comparison versus SN30). https://www.nextplatform.com/2023/09/20/sambanova-tackles-generative-ai-with-new-chip-and-new-approach/
[^23]: SambaNova, "Samba-1 datasheet." https://sambanova.ai/hubfs/White%20papers/Samba-1-Datasheet.pdf
[^24]: VentureBeat, "SambaNova debuts 1 trillion parameter Composition of Experts model for enterprise gen AI." https://venturebeat.com/ai/sambanova-debuts-1-trillion-parameter-composition-of-experts-model-for-enterprise-gen-ai
[^25]: EE Times, "SambaNova Adds HBM for LLM Inference Chip." https://www.eetimes.com/sambanova-adds-hbm-for-llm-inference-chip/
[^26]: TechInsights, "SambaNova Releases Fourth-Gen Chip." https://www.techinsights.com/blog/sambanova-releases-fourth-gen-chip
[^27]: SambaNova, "SambaNova DataScale" datasheet. https://sambanova.ai/hubfs/23945802/downloads/Product%20Collateral/SambaNova_DataScale_Data-Sheet_10.08.21EN.pdf
[^28]: SambaNova, "Why SambaNova's SN40L Chip Is the Best for Inference" (DeepSeek R1 671B and Llama 4 Maverick). https://sambanova.ai/blog/sn40l-chip-best-inference-solution
[^29]: SiliconANGLE, "Robust AI infrastructure powered by SambaNova SN40L," June 23, 2025. https://siliconangle.com/2025/06/23/sambanova-sn40l-advances-robust-ai-infrastructure-efficiency-roboticsaiinfrastructure/
[^30]: SambaNova Documentation, "SambaNova resources." https://docs.sambanova.ai/resources/latest/index.html
[^31]: SambaNova Documentation (legacy), "Compiler optimization modes." https://docs-legacy.sambanova.ai/developer/latest/compiler-o1.html
[^32]: SambaNova Documentation, "SambaNova glossary." https://docs.sambanova.ai/resources/latest/glossary.html
[^33]: SambaNova, "Introducing SambaManaged: A Turnkey Path to AI for Data Centers" (July 2025). https://sambanova.ai/blog/introducing-sambamanaged-a-turnkey-path-to-ai-for-data-centers
[^34]: SambaNova, "SambaNova Delivers Generative AI Capabilities to the Enterprise: the New SambaNova Suite," February 28, 2023. https://sambanova.ai/press/sambanova-delivers-generative-ai-capabilities-to-the-enterprise
[^35]: Analytics India Magazine, "SambaNova's Llama 3.1 405B Model Hits 114 Tokens Per Second, Setting Speed Record." https://analyticsindiamag.com/ai-news-updates/sambanovas-llama-3-1-405b-model-hits-114-tokens-per-second-setting-speed-record/
[^36]: Analog Devices, "Analog Devices Deploys SambaNova Suite to Facilitate Breakthrough Generative AI Capabilities at Enterprise Scale," January 10, 2024. https://www.analog.com/en/about-adi/news-room/press-releases/2024/1-10-2024-adi-deploys-sambanova-suite.html
[^37]: OVHcloud, "OVHcloud chooses SambaNova to optimize its AI Endpoints inference service." https://corporate.ovhcloud.com/en/newsroom/news/sambanova/
[^38]: SambaNova, "SambaNova offers flexibility and choice with subscription pricing." https://sambanova.ai/blog/subscription-pricing/
[^39]: Data Center Dynamics, "SambaNova lays off 77 employees as company pivots focus from training to inference," April 2025. https://www.datacenterdynamics.com/en/news/sambanova-lays-off-77-employees-as-company-pivots-focus-from-training-to-inference/
[^40]: EE Times, "SambaNova Lays Off 15% of Workforce To Refocus on Inference," April 2025. https://www.eetimes.com/sambanova-lays-off-15-of-workforce-to-refocus-on-inference/
[^41]: EE Times, "SambaNova Shifts To Inference, Courts Cloud Customers." https://www.eetimes.com/sambanova-shifts-to-inference-but-leaves-cloud-to-customers/
[^42]: Bloomberg, "Intel Is Said to Near $1.6 Billion Deal for Chip Firm SambaNova," December 12, 2025. https://www.bloomberg.com/news/articles/2025-12-12/intel-nears-1-6-billion-deal-for-ai-chip-startup-sambanova
[^43]: SiliconANGLE, "Report: Intel could acquire inference chip startup SambaNova for $1.6B," December 12, 2025. https://siliconangle.com/2025/12/12/report-intel-acquire-inference-chip-startup-sambanova-1-6b/
[^44]: The Register, "SambaNova raises $350M with Intel backing," February 24, 2026. https://www.theregister.com/2026/02/24/sambanova_intel_funding/
[^45]: Intel Capital, "SambaNova Unveils Fastest Chip for Agentic AI, Collaborates with Intel, and Raises $350M+." https://www.intelcapital.com/sambanova-unveils-fastest-chip-for-agentic-ai-collaborates-with-intel-and-raises-350m/
[^46]: HPCwire, "SambaNova Eyes 10-Trillion Parameter Models for Agentic AI with New Chip," February 24, 2026. https://www.hpcwire.com/2026/02/24/sambanova-eyes-10-trillion-parameter-models-for-agentic-ai-with-new-chip/

