d-Matrix Corsair
Last reviewed
May 17, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,450 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,450 words
Add missing citations, update stale details, or suggest a clearer explanation.
Corsair is the first commercial AI accelerator product from d-Matrix, a Silicon Valley AI inference hardware startup based in Santa Clara, California. Announced on November 19, 2024 at the SC24 supercomputing conference in Atlanta, Corsair is a PCIe Gen5 accelerator card built around the company's Digital In-Memory Computing (DIMC) architecture, a design that tightly couples SRAM memory with multiply-accumulate logic to address the so-called memory wall that limits large language model inference on conventional GPU hardware. The product targets generative AI inference workloads in datacenters, with d-Matrix claiming up to 10 times faster interactive token-generation speed, 3 times better performance per total cost of ownership, and 3 times greater energy efficiency compared with leading GPU-based inference platforms.
The Corsair platform combines eight 6 nm chiplets per card with 2 GB of on-chip SRAM operating at roughly 150 TB/s of aggregate bandwidth, alongside up to 256 GB of off-chip LPDDR5X capacity memory. The architecture natively supports block floating point numerical formats now standardized under the Open Compute Project as MX (Micro-scaling), enabling efficient INT4 and INT8 inference at reduced precision without significant accuracy loss. Corsair is delivered with d-Matrix's Aviator software stack, which is built on open-source components such as MLIR, PyTorch, Triton DSL, and OpenBMC.
d-Matrix was founded in 2019 by Sid Sheth and Sudeep Bhoja, who had previously worked together at Inphi (acquired by Marvell) and Broadcom. Sheth serves as President and CEO, and Bhoja as Chief Technology Officer. The pair set out to build an AI inference platform optimized specifically for transformer-based models, betting early that generative AI workloads would be dominated by inference economics rather than training throughput.
The company's silicon roadmap evolved across multiple generations of test chips before Corsair. Nighthawk, taped out in 2020, was d-Matrix's first chiplet prototype demonstrating SRAM-based in-memory compute. Jayhawk followed in 2021 as a companion chiplet that added high-speed chiplet-to-chiplet interconnect, allowing multiple Nighthawk-class compute tiles to scale up and scale out. Jayhawk II, which followed in 2023, extended these capabilities and demonstrated a working transformer inference pipeline. Corsair represents the commercialization of this multi-generation chiplet research.
d-Matrix raised an initial $44 million Series A in 2022, followed by a $110 million Series B in 2023 that was led by Temasek with participation from Microsoft's M12 venture fund, Playground Global, and other investors. On November 12, 2025, d-Matrix closed an oversubscribed $275 million Series C round at a $2 billion post-money valuation, bringing total capital raised to roughly $450 million. The Series C was co-led by Bullhound Capital, Triatomic Capital, and Temasek, with participation from the Qatar Investment Authority, EDBI, M12, Nautilus Venture Partners, Industry Ventures, and Mirae Asset.
| Round | Date | Amount | Lead investors |
|---|---|---|---|
| Series A | April 2022 | $44 million | M12, Microsoft Venture Fund |
| Series B | September 2023 | $110 million | Temasek |
| Series C | November 2025 | $275 million | Bullhound Capital, Triatomic Capital, Temasek |
The core innovation in Corsair is d-Matrix's Digital In-Memory Computing (DIMC) approach. Unlike conventional accelerators that fetch weights and activations from external HBM or DDR memory through a relatively narrow memory interface, DIMC embeds compute directly within the SRAM macros. Multiply-accumulate operations execute next to the bit cells that store model weights, eliminating most of the data movement that dominates power consumption and latency in GPU inference.
The DIMC implementation in Corsair is digital rather than analog, distinguishing it from prior analog in-memory compute startups such as Mythic. d-Matrix argues that digital DIMC retains the deterministic precision needed for production deployment of transformer models, while still delivering the bandwidth and energy advantages of integrating compute into the memory array. Each DIMC tile inside Corsair can perform a 64 by 64 INT8 matrix multiply, or a 64 by 128 INT4 matrix multiply, in a single cycle.
Corsair is built from chiplets fabricated on TSMC's 6 nm process. Each Corsair card carries two compute packages with four chiplets each, for a total of eight chiplets per card. The chiplets are connected in an all-to-all topology that supports roughly 1 TB/s of die-to-die bandwidth.
Within each chiplet, the compute hierarchy is organized as follows:
| Level | Component | Role |
|---|---|---|
| Chiplet | 4 Quads | Top-level compute clusters |
| Quad | 4 Slices, 1 RISC-V control core, 1 Dispatch Engine | Coordinates execution within a quad |
| Slice | DIMC cores, SIMD cores, Data Reshape Engine | Executes matrix and vector operations |
A full Corsair card therefore contains 2,048 DIMC cores across its eight chiplets. The RISC-V cores function as control processors and dispatch engines, scheduling operations onto the DIMC arrays and managing data layout transformations through the Data Reshape Engines.
Corsair uses a two-tier memory model designed to keep frequently-accessed model state on-chip while still supporting large capacity for full model weights:
| Tier | Capacity | Bandwidth | Purpose |
|---|---|---|---|
| Performance Memory (SRAM) | 2 GB per card | ~150 TB/s | Active weights, KV cache hot data |
| Capacity Memory (LPDDR5X) | Up to 256 GB per card | Per-chiplet LPDDR5X interface | Full model weights, large KV caches |
The choice of LPDDR5X over HBM is significant. HBM has been the standard external memory for high-end AI accelerators for years, but the chronic supply constraints, cost, and packaging complexity of HBM have made it a bottleneck for new entrants. d-Matrix uses commodity mobile-class LPDDR5X soldered around each chiplet, trading raw external bandwidth for the much higher on-chip SRAM bandwidth delivered by DIMC. The bet is that for transformer inference, the right answer is to keep more state in SRAM and use external DRAM only for capacity, rather than streaming through HBM at every layer.
Corsair was among the earliest commercial accelerators to natively support block floating point formats. These formats share a single exponent across a small group of mantissas, providing most of the dynamic range of floating point at a fraction of the storage and compute cost. The Open Compute Project standardized this family of formats as MX (Micro-scaling) in 2023, with d-Matrix among the founding contributors.
Corsair supports the following native formats:
| Format | Description | Use |
|---|---|---|
| MXINT8 | 8-bit block floating point | High-accuracy inference |
| MXINT4 | 4-bit block floating point | Maximum-throughput inference |
| INT8 | Standard 8-bit integer | Legacy quantized models |
| INT4 | Standard 4-bit integer | Aggressive quantization |
Using MXINT4, Corsair achieves a peak compute throughput of 9,600 TFLOPS per card, which d-Matrix sometimes calls 9.6 PFLOPS. At MXINT8 the peak is 2,400 TFLOPS per card. These figures double when two cards are bridged into a single 16-chiplet domain.
A Corsair card occupies a standard PCIe Gen5 x16 full-height full-length slot. The card draws between 275 W at 800 MHz and 550 W at the maximum 1.2 GHz boost frequency, depending on workload and configuration. Two cards can be combined into a single logical device using a passive DMX Bridge card, which physically connects adjacent PCIe slots and lets the two cards present 4 GB of SRAM, 4,096 DIMC cores, and 16 chiplets in an all-to-all configuration to the host.
The practical effect is that a standard 8-slot PCIe server can hold four bridged Corsair pairs, giving roughly 16 GB of SRAM, 2 TB of LPDDR5X, and 38,400 TFLOPS of MXINT4 compute in a single host. This is the configuration that d-Matrix and partners such as Supermicro and GigaIO use for the company's reference inference servers.
Corsair ships with the Aviator software stack, a co-designed runtime and toolchain that exposes the underlying DIMC hardware to standard machine learning frameworks. Aviator is built on widely-used open-source components rather than a proprietary stack, which d-Matrix has emphasized as a deliberate strategy to lower the adoption barrier for customers comparing Corsair against the CUDA ecosystem.
The major components of Aviator are:
| Component | Function |
|---|---|
| Compiler | MLIR-based code generator producing DIMC-optimized binaries |
| Model Factory | PyTorch template library of common generative models, pre-tuned for Corsair |
| Compressor | Quantization and pruning tools that convert FP16 or BF16 checkpoints to MX formats |
| Inference Engine | Distributed runtime that schedules requests across multiple cards, servers, and racks |
| BMC | OpenBMC-based platform management for the card and chassis |
The Model Factory includes prebuilt configurations for popular open models including Llama 2 and 3, Mistral, Mixtral, and OpenAI-style transformer architectures. Customers can also bring their own PyTorch models, with Aviator handling the conversion to MX numerical formats and DIMC-friendly layouts through the Compressor and Compiler.
A distinguishing feature of Aviator is its native support for distributed inference across multiple Corsair cards. The Inference Engine treats a cluster of Corsair cards as a single logical accelerator, handling tensor parallelism, pipeline parallelism, and expert parallelism for mixture-of-experts models without requiring the developer to write custom collective communication code. This is particularly important for serving large models that exceed the capacity of a single card or a single bridged pair.
d-Matrix has published a series of performance claims for Corsair, primarily framed against NVIDIA H100 GPUs as the reference inference platform at the time of Corsair's launch.
Headline figures include:
| Metric | Claimed result | Reference |
|---|---|---|
| Time per output token, Llama 3 70B | ~2 ms | At single-batch interactive serving |
| Interactive speed advantage | Up to 10x | Versus H100 baseline |
| TCO per query | 3x better | Versus H100-based inference |
| Energy per token | 3x better | Versus H100-based inference |
| Peak MXINT4 throughput | 9,600 TFLOPS | Per card |
| Peak MXINT8 throughput | 2,400 TFLOPS | Per card |
In Hot Chips 2025 presentations and white papers, d-Matrix has demonstrated Corsair sustaining interactive token-generation rates substantially above the ~30 to 60 tokens per second typical of well-tuned GPU inference for Llama 3 70B. Independent reviews from outlets including ServeTheHome, Chips and Cheese, and EE Times have generally validated the architectural advantages while noting that production validation on a wider variety of models is still underway.
NotebookCheck and other outlets reported Corsair benchmarking at roughly 9 times faster than an H100 on certain generative AI workloads, although direct comparisons depend heavily on batch size, sequence length, and quantization level. d-Matrix's emphasis on interactive single-batch latency rather than maximum batch-mode throughput is consistent with the company's positioning as an inference specialist optimized for chatbot, agent, and other real-time use cases.
While Corsair began as a card-level product, d-Matrix has rapidly built out a rack-scale story to compete with vertically integrated systems from NVIDIA and other vendors. Two announcements in 2025 expanded the Corsair platform from a single card to a full datacenter solution.
In September 2025, d-Matrix announced JetStream, a companion I/O accelerator card that provides ultra-low-latency device-initiated communication between Corsair accelerators across standard Ethernet networks. JetStream sits in the same server as Corsair and offloads collective operations from the host CPU, supporting accelerator-to-accelerator communication patterns directly over the network without going through host memory. This allows large models that exceed a single server's Corsair capacity to be sharded across many servers while preserving the interactive latency targets that Corsair is designed for.
On October 14, 2025, d-Matrix unveiled SquadRack, described as the industry's first standards-based rack-scale reference architecture purpose-built for AI inference. SquadRack is a collaboration with Arista, Broadcom, and Supermicro, combining:
| Component | Vendor | Role |
|---|---|---|
| Corsair AI accelerators | d-Matrix | Inference compute |
| JetStream NICs | d-Matrix | Accelerator-to-accelerator networking |
| X14 AI server platforms | Supermicro | Host servers, 8 Corsair cards each |
| PCIe switches | Broadcom | Intra-node scale-up |
| Leaf Ethernet switches | Arista | Multi-node scale-out fabric |
A single 8-node SquadRack configuration is rated to serve generative models up to roughly 100 billion parameters with interactive token-generation latency. SquadRack configurations are scheduled to be available for purchase through Supermicro starting in Q1 2026.
In early 2026 d-Matrix acquired the data center business of GigaIO, a composable infrastructure company. The acquisition added GigaIO's FabreX-based PCIe fabric technology to d-Matrix's portfolio, complementing JetStream for high-bandwidth scale-up scenarios within a rack. GigaIO and d-Matrix had previously collaborated on a reference SuperNODE system that combined Corsair with GigaIO's fabric to demonstrate GPU-free inference at rack scale.
The market for dedicated AI inference accelerators has grown crowded since 2023, with several startups taking distinct architectural approaches. Corsair occupies a particular niche emphasizing in-memory compute, chiplet-based scaling, and commodity LPDDR5X capacity memory.
| Vendor | Architecture | On-chip memory | Off-chip memory | Form factor |
|---|---|---|---|---|
| d-Matrix Corsair | Digital In-Memory Compute, 8 chiplets | 2 GB SRAM @ 150 TB/s | 256 GB LPDDR5X | PCIe Gen5 card |
| Groq LPU | Deterministic dataflow, single core | 230 MB SRAM @ 80 TB/s (GroqChip1) | None | Card and rack |
| Groq 3 LPU | Deterministic dataflow | 500 MB SRAM @ 150 TB/s | None | Card and rack |
| Cerebras WSE-3 | Wafer-scale | 44 GB SRAM @ 21 PB/s | External MemoryX system | Full wafer system |
| SambaNova SN40L | Reconfigurable dataflow | 520 MB SRAM | HBM3 + DDR | Rack-scale system |
| NVIDIA H100 | GPU | ~50 MB L2 | 80 GB HBM3 | SXM5 / PCIe |
Compared with Groq, d-Matrix offers an order of magnitude more on-chip SRAM per card, although Groq's deterministic single-core architecture has been validated at production scale across hundreds of thousands of deployed LPUs. Compared with Cerebras and SambaNova, Corsair is delivered as a PCIe card rather than a full rack-scale system, which lowers the entry cost and lets customers slot Corsair into existing server fleets. Compared with conventional GPUs, Corsair sacrifices training capability and ecosystem breadth in exchange for inference-specific efficiency.
A practical differentiator for d-Matrix is its avoidance of HBM. The HBM supply chain has been a chronic bottleneck for AI hardware vendors competing for SK Hynix, Samsung, and Micron capacity. By building around LPDDR5X plus large on-chip SRAM, d-Matrix can scale production without being gated on HBM allocation, although this comes at the cost of off-chip bandwidth.
d-Matrix began sampling Corsair to early-access customers shortly after the November 2024 announcement, with broader availability beginning in Q2 2025. The company has not disclosed all of its customers publicly, but several partners and prospective deployments have been named.
Microsoft, via its M12 venture fund, has been an investor since the Series A and continues to participate in subsequent rounds. M12 Managing Partner Michael Stewart has publicly described d-Matrix as offering differentiated memory-compute unit economics for LLM inference. While Microsoft has not announced a production Corsair deployment, the long-running investment relationship and the partnership ecosystem suggest internal evaluation.
Named partners in the broader Corsair ecosystem include:
| Partner | Role |
|---|---|
| Supermicro | Server OEM for X14 AI platform and SquadRack |
| Arista Networks | Ethernet leaf switches for SquadRack scale-out |
| Broadcom | PCIe switching, ecosystem |
| GigaIO | PCIe fabric (acquired by d-Matrix in 2026) |
| Andes Technology | RISC-V control core IP |
| TSMC | 6 nm foundry process |
| Microsoft (M12) | Strategic investor |
Qatar Investment Authority and EDBI's participation in the Series C signals interest from sovereign wealth and government-linked investors in Asia and the Middle East. This mirrors a broader pattern in AI infrastructure financing, where Saudi Arabia's Public Investment Fund and similar entities have backed Groq, G42, and other inference specialists as part of national AI strategies.
d-Matrix's published roadmap extends beyond Corsair to next-generation chips that push the in-memory compute paradigm further. The next major platform, codenamed Raptor, is positioned for reasoning-heavy workloads where memory capacity per chip becomes a binding constraint on serving frontier reasoning models.
Raptor is designed around 3D in-memory computing, with DRAM stacked directly on top of compute chiplets rather than placed alongside them in 2.5D packaging. d-Matrix has called this approach 3DIMC. Stacking DRAM vertically allows much higher memory capacity per chiplet while preserving the short, wide interconnect that gives DIMC its bandwidth advantage. Raptor is expected to substantially extend the model sizes that Corsair-class hardware can serve in a single node, particularly for chain-of-thought and tree-of-thought reasoning workloads with large activation footprints.
d-Matrix announced at Hot Chips 2025 that it had spent the year executing on Corsair DIMC and Raptor 3DIMC in parallel, with Raptor silicon expected in subsequent generations.
Corsair has been widely covered as one of the more architecturally interesting AI inference platforms to emerge in 2024 and 2025. Coverage in EE Times, ServeTheHome, HPCwire, Chips and Cheese, The Register, EE News Europe, and SiliconANGLE has generally focused on the DIMC architecture, the avoidance of HBM, and the chiplet-based scaling approach as the most distinctive elements.
The broader industry context is a market shifting from training-dominated to inference-dominated economics. Industry analysts estimate that inference will consume substantially more compute cycles than training as deployed AI models serve billions of users, which has attracted significant venture and strategic capital to inference-specialized hardware. d-Matrix's $2 billion valuation, while a fraction of Groq's, Cerebras's, or SambaNova's, reflects the early stage of Corsair's commercial deployment.
A recurring theme in coverage is whether Corsair's compelling architectural story can translate into the scale of deployment necessary to compete with NVIDIA's CUDA ecosystem. d-Matrix's response has been to ship with an open-source-based software stack, deliver Corsair in a standard PCIe form factor, and partner with mainstream server OEMs and networking vendors rather than build a closed appliance. The SquadRack reference design is the clearest expression of this strategy, presenting Corsair as a drop-in inference fabric for existing datacenters rather than a standalone system that requires new operations.
Key uncertainties for Corsair as of 2026 include the breadth of the supported model catalog beyond well-tuned open transformer models, the maturity of Aviator for customer-developed architectures, and the company's ability to scale manufacturing through TSMC's 6 nm process. The Series C funding is intended in part to address all three concerns through expanded engineering hiring and customer support.