d-Matrix Corsair

AI Hardware AI Inference

17 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v2 · 3,450 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Corsair is the first commercial AI accelerator product from d-Matrix, a Silicon Valley AI inference hardware startup based in Santa Clara, California. Announced on November 19, 2024 at the SC24 supercomputing conference in Atlanta, Corsair is a PCIe Gen5 accelerator card built around the company's Digital In-Memory Computing (DIMC) architecture, a design that tightly couples SRAM memory with multiply-accumulate logic to address the so-called memory wall that limits large language model inference on conventional GPU hardware.^[1] The product targets generative AI inference workloads in datacenters, with d-Matrix claiming up to 10 times faster interactive token-generation speed, 3 times better performance per total cost of ownership, and 3 times greater energy efficiency compared with leading GPU-based inference platforms.^[1]

The Corsair platform combines eight 6 nm chiplets per card with 2 GB of on-chip SRAM operating at roughly 150 TB/s of aggregate bandwidth, alongside up to 256 GB of off-chip LPDDR5X capacity memory.^[4] The architecture natively supports block floating point numerical formats now standardized under the Open Compute Project as MX (Micro-scaling), enabling efficient INT4 and INT8 inference at reduced precision without significant accuracy loss. Corsair is delivered with d-Matrix's Aviator software stack, which is built on open-source components such as MLIR, PyTorch, Triton DSL, and OpenBMC.^[2]

Background and company history

d-Matrix was founded in 2019 by Sid Sheth and Sudeep Bhoja, who had previously worked together at Inphi (acquired by Marvell) and Broadcom.^[13] Sheth serves as President and CEO, and Bhoja as Chief Technology Officer. The pair set out to build an AI inference platform optimized specifically for transformer-based models, betting early that generative AI workloads would be dominated by inference economics rather than training throughput.

The company's silicon roadmap evolved across multiple generations of test chips before Corsair. Nighthawk, taped out in 2020, was d-Matrix's first chiplet prototype demonstrating SRAM-based in-memory compute. Jayhawk followed in 2021 as a companion chiplet that added high-speed chiplet-to-chiplet interconnect, allowing multiple Nighthawk-class compute tiles to scale up and scale out. Jayhawk II, which followed in 2023, extended these capabilities and demonstrated a working transformer inference pipeline. Corsair represents the commercialization of this multi-generation chiplet research.^[13]

d-Matrix raised an initial $44 million Series A in 2022, followed by a $110 million Series B in 2023 that was led by Temasek with participation from Microsoft's M12 venture fund, Playground Global, and other investors.^[14]^[19] On November 12, 2025, d-Matrix closed an oversubscribed $275 million Series C round at a $2 billion post-money valuation, bringing total capital raised to roughly $450 million.^[5]^[7] The Series C was co-led by Bullhound Capital, Triatomic Capital, and Temasek, with participation from the Qatar Investment Authority, EDBI, M12, Nautilus Venture Partners, Industry Ventures, and Mirae Asset.^[6]

Round	Date	Amount	Lead investors
Series A	April 2022	$44 million	M12, Microsoft Venture Fund
Series B	September 2023	$110 million	Temasek
Series C	November 2025	$275 million	Bullhound Capital, Triatomic Capital, Temasek

Architecture

Digital In-Memory Computing

The core innovation in Corsair is d-Matrix's Digital In-Memory Computing (DIMC) approach. Unlike conventional accelerators that fetch weights and activations from external HBM or DDR memory through a relatively narrow memory interface, DIMC embeds compute directly within the SRAM macros. Multiply-accumulate operations execute next to the bit cells that store model weights, eliminating most of the data movement that dominates power consumption and latency in GPU inference.^[3]

The DIMC implementation in Corsair is digital rather than analog, distinguishing it from prior analog in-memory compute startups such as Mythic. d-Matrix argues that digital DIMC retains the deterministic precision needed for production deployment of transformer models, while still delivering the bandwidth and energy advantages of integrating compute into the memory array. Each DIMC tile inside Corsair can perform a 64 by 64 INT8 matrix multiply, or a 64 by 128 INT4 matrix multiply, in a single cycle.^[3]

Chiplet design

Corsair is built from chiplets fabricated on TSMC's 6 nm process. Each Corsair card carries two compute packages with four chiplets each, for a total of eight chiplets per card. The chiplets are connected in an all-to-all topology that supports roughly 1 TB/s of die-to-die bandwidth.^[3]

Within each chiplet, the compute hierarchy is organized as follows:

Level	Component	Role
Chiplet	4 Quads	Top-level compute clusters
Quad	4 Slices, 1 RISC-V control core, 1 Dispatch Engine	Coordinates execution within a quad
Slice	DIMC cores, SIMD cores, Data Reshape Engine	Executes matrix and vector operations

A full Corsair card therefore contains 2,048 DIMC cores across its eight chiplets.^[3] The RISC-V cores function as control processors and dispatch engines, scheduling operations onto the DIMC arrays and managing data layout transformations through the Data Reshape Engines.

Memory hierarchy

Corsair uses a two-tier memory model designed to keep frequently-accessed model state on-chip while still supporting large capacity for full model weights:

Tier	Capacity	Bandwidth	Purpose
Performance Memory (SRAM)	2 GB per card	~150 TB/s	Active weights, KV cache hot data
Capacity Memory (LPDDR5X)	Up to 256 GB per card	Per-chiplet LPDDR5X interface	Full model weights, large KV caches

The choice of LPDDR5X over HBM is significant. HBM has been the standard external memory for high-end AI accelerators for years, but the chronic supply constraints, cost, and packaging complexity of HBM have made it a bottleneck for new entrants.^[8] d-Matrix uses commodity mobile-class LPDDR5X soldered around each chiplet, trading raw external bandwidth for the much higher on-chip SRAM bandwidth delivered by DIMC. The bet is that for transformer inference, the right answer is to keep more state in SRAM and use external DRAM only for capacity, rather than streaming through HBM at every layer.^[4]

Numerical formats

Corsair was among the earliest commercial accelerators to natively support block floating point formats. These formats share a single exponent across a small group of mantissas, providing most of the dynamic range of floating point at a fraction of the storage and compute cost. The Open Compute Project standardized this family of formats as MX (Micro-scaling) in 2023, with d-Matrix among the founding contributors.^[3]

Corsair supports the following native formats:

Format	Description	Use
MXINT8	8-bit block floating point	High-accuracy inference
MXINT4	4-bit block floating point	Maximum-throughput inference
INT8	Standard 8-bit integer	Legacy quantized models
INT4	Standard 4-bit integer	Aggressive quantization

Using MXINT4, Corsair achieves a peak compute throughput of 9,600 TFLOPS per card, which d-Matrix sometimes calls 9.6 PFLOPS. At MXINT8 the peak is 2,400 TFLOPS per card. These figures double when two cards are bridged into a single 16-chiplet domain.^[3]

Card-level integration

A Corsair card occupies a standard PCIe Gen5 x16 full-height full-length slot. The card draws between 275 W at 800 MHz and 550 W at the maximum 1.2 GHz boost frequency, depending on workload and configuration. Two cards can be combined into a single logical device using a passive DMX Bridge card, which physically connects adjacent PCIe slots and lets the two cards present 4 GB of SRAM, 4,096 DIMC cores, and 16 chiplets in an all-to-all configuration to the host.^[3]

The practical effect is that a standard 8-slot PCIe server can hold four bridged Corsair pairs, giving roughly 16 GB of SRAM, 2 TB of LPDDR5X, and 38,400 TFLOPS of MXINT4 compute in a single host. This is the configuration that d-Matrix and partners such as Supermicro and GigaIO use for the company's reference inference servers.^[3]

Aviator software stack

Corsair ships with the Aviator software stack, a co-designed runtime and toolchain that exposes the underlying DIMC hardware to standard machine learning frameworks. Aviator is built on widely-used open-source components rather than a proprietary stack, which d-Matrix has emphasized as a deliberate strategy to lower the adoption barrier for customers comparing Corsair against the CUDA ecosystem.^[2]

The major components of Aviator are:

Component	Function
Compiler	MLIR-based code generator producing DIMC-optimized binaries
Model Factory	PyTorch template library of common generative models, pre-tuned for Corsair
Compressor	Quantization and pruning tools that convert FP16 or BF16 checkpoints to MX formats
Inference Engine	Distributed runtime that schedules requests across multiple cards, servers, and racks
BMC	OpenBMC-based platform management for the card and chassis

The Model Factory includes prebuilt configurations for popular open models including Llama 2 and 3, Mistral, Mixtral, and OpenAI-style transformer architectures. Customers can also bring their own PyTorch models, with Aviator handling the conversion to MX numerical formats and DIMC-friendly layouts through the Compressor and Compiler.

A distinguishing feature of Aviator is its native support for distributed inference across multiple Corsair cards. The Inference Engine treats a cluster of Corsair cards as a single logical accelerator, handling tensor parallelism, pipeline parallelism, and expert parallelism for mixture-of-experts models without requiring the developer to write custom collective communication code. This is particularly important for serving large models that exceed the capacity of a single card or a single bridged pair.^[2]

Performance

d-Matrix has published a series of performance claims for Corsair, primarily framed against NVIDIA H100 GPUs as the reference inference platform at the time of Corsair's launch.^[1]

Headline figures include:

Metric	Claimed result	Reference
Time per output token, Llama 3 70B	~2 ms	At single-batch interactive serving
Interactive speed advantage	Up to 10x	Versus H100 baseline
TCO per query	3x better	Versus H100-based inference
Energy per token	3x better	Versus H100-based inference
Peak MXINT4 throughput	9,600 TFLOPS	Per card
Peak MXINT8 throughput	2,400 TFLOPS	Per card

In Hot Chips 2025 presentations and white papers, d-Matrix has demonstrated Corsair sustaining interactive token-generation rates substantially above the ~30 to 60 tokens per second typical of well-tuned GPU inference for Llama 3 70B. Independent reviews from outlets including ServeTheHome, Chips and Cheese, and EE Times have generally validated the architectural advantages while noting that production validation on a wider variety of models is still underway.^[3]

NotebookCheck and other outlets reported Corsair benchmarking at roughly 9 times faster than an H100 on certain generative AI workloads, although direct comparisons depend heavily on batch size, sequence length, and quantization level.^[17] d-Matrix's emphasis on interactive single-batch latency rather than maximum batch-mode throughput is consistent with the company's positioning as an inference specialist optimized for chatbot, agent, and other real-time use cases.

Rack-scale and system integration

While Corsair began as a card-level product, d-Matrix has rapidly built out a rack-scale story to compete with vertically integrated systems from NVIDIA and other vendors. Two announcements in 2025 expanded the Corsair platform from a single card to a full datacenter solution.

JetStream

In September 2025, d-Matrix announced JetStream, a companion I/O accelerator card that provides ultra-low-latency device-initiated communication between Corsair accelerators across standard Ethernet networks.^[10]^[20] JetStream sits in the same server as Corsair and offloads collective operations from the host CPU, supporting accelerator-to-accelerator communication patterns directly over the network without going through host memory. This allows large models that exceed a single server's Corsair capacity to be sharded across many servers while preserving the interactive latency targets that Corsair is designed for.

SquadRack

On October 14, 2025, d-Matrix unveiled SquadRack, described as the industry's first standards-based rack-scale reference architecture purpose-built for AI inference.^[9] SquadRack is a collaboration with Arista, Broadcom, and Supermicro, combining:

Component	Vendor	Role
Corsair AI accelerators	d-Matrix	Inference compute
JetStream NICs	d-Matrix	Accelerator-to-accelerator networking
X14 AI server platforms	Supermicro	Host servers, 8 Corsair cards each
PCIe switches	Broadcom	Intra-node scale-up
Leaf Ethernet switches	Arista	Multi-node scale-out fabric

A single 8-node SquadRack configuration is rated to serve generative models up to roughly 100 billion parameters with interactive token-generation latency. SquadRack configurations are scheduled to be available for purchase through Supermicro starting in Q1 2026.^[9]

GigaIO acquisition

In early 2026 d-Matrix acquired the data center business of GigaIO, a composable infrastructure company. The acquisition added GigaIO's FabreX-based PCIe fabric technology to d-Matrix's portfolio, complementing JetStream for high-bandwidth scale-up scenarios within a rack. GigaIO and d-Matrix had previously collaborated on a reference SuperNODE system that combined Corsair with GigaIO's fabric to demonstrate GPU-free inference at rack scale.^[15]

Comparison with competing inference platforms

The market for dedicated AI inference accelerators has grown crowded since 2023, with several startups taking distinct architectural approaches. Corsair occupies a particular niche emphasizing in-memory compute, chiplet-based scaling, and commodity LPDDR5X capacity memory.

Vendor	Architecture	On-chip memory	Off-chip memory	Form factor
d-Matrix Corsair	Digital In-Memory Compute, 8 chiplets	2 GB SRAM @ 150 TB/s	256 GB LPDDR5X	PCIe Gen5 card
Groq LPU	Deterministic dataflow, single core	230 MB SRAM @ 80 TB/s (GroqChip1)	None	Card and rack
Groq 3 LPU	Deterministic dataflow	500 MB SRAM @ 150 TB/s	None	Card and rack
Cerebras WSE-3	Wafer-scale	44 GB SRAM @ 21 PB/s	External MemoryX system	Full wafer system
SambaNova SN40L	Reconfigurable dataflow	520 MB SRAM	HBM3 + DDR	Rack-scale system
NVIDIA H100	GPU	~50 MB L2	80 GB HBM3	SXM5 / PCIe

Compared with Groq, d-Matrix offers an order of magnitude more on-chip SRAM per card, although Groq's deterministic single-core architecture has been validated at production scale across hundreds of thousands of deployed LPUs. Compared with Cerebras and SambaNova, Corsair is delivered as a PCIe card rather than a full rack-scale system, which lowers the entry cost and lets customers slot Corsair into existing server fleets. Compared with conventional GPUs, Corsair sacrifices training capability and ecosystem breadth in exchange for inference-specific efficiency.

A practical differentiator for d-Matrix is its avoidance of HBM. The HBM supply chain has been a chronic bottleneck for AI hardware vendors competing for SK Hynix, Samsung, and Micron capacity. By building around LPDDR5X plus large on-chip SRAM, d-Matrix can scale production without being gated on HBM allocation, although this comes at the cost of off-chip bandwidth.^[8]

Customers, partners, and adoption

d-Matrix began sampling Corsair to early-access customers shortly after the November 2024 announcement, with broader availability beginning in Q2 2025. The company has not disclosed all of its customers publicly, but several partners and prospective deployments have been named.

Microsoft, via its M12 venture fund, has been an investor since the Series A and continues to participate in subsequent rounds. M12 Managing Partner Michael Stewart has publicly described d-Matrix as offering differentiated memory-compute unit economics for LLM inference.^[5] While Microsoft has not announced a production Corsair deployment, the long-running investment relationship and the partnership ecosystem suggest internal evaluation.

Named partners in the broader Corsair ecosystem include:

Partner	Role
Supermicro	Server OEM for X14 AI platform and SquadRack
Arista Networks	Ethernet leaf switches for SquadRack scale-out
Broadcom	PCIe switching, ecosystem
GigaIO	PCIe fabric (acquired by d-Matrix in 2026)
Andes Technology	RISC-V control core IP^[16]
TSMC	6 nm foundry process
Microsoft (M12)	Strategic investor

Qatar Investment Authority and EDBI's participation in the Series C signals interest from sovereign wealth and government-linked investors in Asia and the Middle East.^[6] This mirrors a broader pattern in AI infrastructure financing, where Saudi Arabia's Public Investment Fund and similar entities have backed Groq, G42, and other inference specialists as part of national AI strategies.

Roadmap

d-Matrix's published roadmap extends beyond Corsair to next-generation chips that push the in-memory compute paradigm further. The next major platform, codenamed Raptor, is positioned for reasoning-heavy workloads where memory capacity per chip becomes a binding constraint on serving frontier reasoning models.^[11]^[12]

Raptor is designed around 3D in-memory computing, with DRAM stacked directly on top of compute chiplets rather than placed alongside them in 2.5D packaging. d-Matrix has called this approach 3DIMC. Stacking DRAM vertically allows much higher memory capacity per chiplet while preserving the short, wide interconnect that gives DIMC its bandwidth advantage. Raptor is expected to substantially extend the model sizes that Corsair-class hardware can serve in a single node, particularly for chain-of-thought and tree-of-thought reasoning workloads with large activation footprints.

d-Matrix announced at Hot Chips 2025 that it had spent the year executing on Corsair DIMC and Raptor 3DIMC in parallel, with Raptor silicon expected in subsequent generations.^[11]

Significance and reception

Corsair has been widely covered as one of the more architecturally interesting AI inference platforms to emerge in 2024 and 2025. Coverage in EE Times, ServeTheHome, HPCwire, Chips and Cheese, The Register, EE News Europe, and SiliconANGLE has generally focused on the DIMC architecture, the avoidance of HBM, and the chiplet-based scaling approach as the most distinctive elements.

The broader industry context is a market shifting from training-dominated to inference-dominated economics. Industry analysts estimate that inference will consume substantially more compute cycles than training as deployed AI models serve billions of users, which has attracted significant venture and strategic capital to inference-specialized hardware. d-Matrix's $2 billion valuation, while a fraction of Groq's, Cerebras's, or SambaNova's, reflects the early stage of Corsair's commercial deployment.^[5]

A recurring theme in coverage is whether Corsair's compelling architectural story can translate into the scale of deployment necessary to compete with NVIDIA's CUDA ecosystem. d-Matrix's response has been to ship with an open-source-based software stack, deliver Corsair in a standard PCIe form factor, and partner with mainstream server OEMs and networking vendors rather than build a closed appliance. The SquadRack reference design is the clearest expression of this strategy, presenting Corsair as a drop-in inference fabric for existing datacenters rather than a standalone system that requires new operations.

Key uncertainties for Corsair as of 2026 include the breadth of the supported model catalog beyond well-tuned open transformer models, the maturity of Aviator for customer-developed architectures, and the company's ability to scale manufacturing through TSMC's 6 nm process. The Series C funding is intended in part to address all three concerns through expanded engineering hiring and customer support.

References

"d-Matrix Unveils Corsair, the World's Most Efficient AI Computing Platform for Inference in Datacenters." d-Matrix. November 19, 2024. https://www.d-matrix.ai/announcements/d-matrix-unveils-corsair-the-worlds-most-efficient-ai-computing-platform-for-inference-in-datacenters/ ↩
"d-Matrix Corsair AI Platform." d-Matrix product page. https://www.d-matrix.ai/product/ ↩
"d-Matrix Corsair In-Memory Computing For AI Inference at Hot Chips 2025." ServeTheHome. August 2025. https://www.servethehome.com/d-matrix-corsair-in-memory-computing-for-ai-inference-at-hot-chips-2025/ ↩
"d-Matrix Corsair: 256GB of LPDDR for AI Models." Chips and Cheese. December 31, 2024. https://old.chipsandcheese.com/2024/12/31/d-matrix-corsair-256gb-of-lpddr-for-ai-models/ ↩
"d-Matrix Raises $275 Million to Power the Age of AI Inference." d-Matrix Announcements. November 12, 2025. https://www.d-matrix.ai/announcements/d-matrix-raises-275-million-to-power-the-age-of-ai-inference/ ↩
"Bullhound Capital leads $275m investment into AI inference leader d-Matrix." Bullhound Capital. November 2025. https://bullhoundcapital.com/articles/bullhoundcapital-leads-275m-investment-into-ai-inference-leader-d-matrix/ ↩
"Chip startup d-Matrix raises $275M to speed up inference with in-memory compute." SiliconANGLE. November 12, 2025. https://siliconangle.com/2025/11/12/chip-startup-d-matrix-raises-275m-speed-inference-memory-compute/ ↩
"d-Matrix launches Corsair for AI inference without GPUs, HBM." EE News Europe. https://www.eenewseurope.com/en/d-matrix-launches-corsair-for-ai-inference-without-gpus-hbm/ ↩
"d-Matrix Announces SquadRack, Industry's First Rack-Scale Solution Purpose-Built for AI Inference at Datacenter Scale." d-Matrix Announcements. October 14, 2025. https://www.d-matrix.ai/announcements/squadrack/ ↩
"d-Matrix Announces JetStream I/O Accelerators Enabling Ultra-Low Latency for AI Inference at Scale." d-Matrix Announcements. September 2025. https://www.d-matrix.ai/announcements/jetstream/ ↩
"D-Matrix reveals plan to break through AI's memory wall with 3D DRAM-based chip architecture." SiliconANGLE. August 25, 2025. https://siliconangle.com/2025/08/25/d-matrix-reveals-plan-scale-ais-memory-wall-3d-dram-based-chip-architecture/ ↩
"d-Matrix Takes On AI Memory Wall with 3D Stacked In-Memory Compute." HPCwire. September 2, 2025. https://www.hpcwire.com/2025/09/02/d-matrix-takes-on-ai-memory-wall-with-3d-stacked-in-memory-compute/ ↩
"D-Matrix Targets Fast LLM Inference for Real World Scenarios." EE Times. https://www.eetimes.com/d-matrix-targets-fast-llm-inference-for-real-world-scenarios/ ↩
"d-Matrix Announces $110 Million in Series B Funding." d-Matrix. September 2023. https://www.d-matrix.ai/announcements/d-matrix-announces-110-million-in-series-b-funding-to-make-generative-ai-commercially-viable-with-first-of-its-kind-inference-compute-platform/ ↩
"d-Matrix Boosts Rack-scale AI Capabilities With Acquisition of GigaIO Data Center Business." d-Matrix Announcements. https://www.d-matrix.ai/announcements/acquisition-of-gigaio/ ↩
"d-Matrix and Andes Team on World's Highest Performing, Most Efficient Accelerator for AI Inference at Scale." d-Matrix Announcements. https://www.d-matrix.ai/announcements/d-matrix-and-andes-team-on-worlds-highest-performing-most-efficient-accelerator-for-ai-inference-at-scale/ ↩
"d-Matrix presents Corsair C8 card 9x faster than Nvidia's H100 GPU in generative AI workloads." NotebookCheck. https://www.notebookcheck.net/d-Matrix-presents-Corsair-C8-card-9x-faster-than-Nvidia-s-H100-GPU-in-generative-AI-workloads.747930.0.html ↩
"d-Matrix Technical White Paper." d-Matrix. https://d-matrix.ai/pdf/d-Matrix-WhitePaper-Technical-FINAL.pdf
"d-Matrix scores $110M to undercut Nvidia in AI." The Register. September 7, 2023. https://www.theregister.com/2023/09/07/dmatrix_in_memory/ ↩
"d-Matrix aspires to rack scale AI with JetStream I/O cards." The Register. September 8, 2025. https://www.theregister.com/2025/09/08/dmatrix_jetstream_nic/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Rain AI