Rebellions REBEL-Quad
Last reviewed
Jun 2, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,676 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,676 words
Add missing citations, update stale details, or suggest a clearer explanation.
REBEL-Quad is a chiplet-based AI inference accelerator developed by Rebellions, a South Korean AI-chip company. It was presented at the Hot Chips 2025 symposium in August 2025, where Rebellions described it as the first neural processing unit (NPU) to use the UCIe-Advanced standard for die-to-die interconnect between its compute chiplets [1][2][3]. The design fuses four identical NPU dies into a single package and pairs them with 144 GB of HBM3E memory, targeting large-scale language-model and Mixture-of-Experts inference in the data center [2][4]. Rebellions later detailed the architecture in a peer-reviewed paper at ISSCC 2026, where the silicon is identified as Rebel100 [5].
REBEL-Quad is a single-package accelerator built from four homogeneous compute chiplets linked by UCIe-Advanced, an "advanced-package" profile of the Universal Chiplet Interconnect Express standard [1][3]. The stated goal is to deliver throughput in the class of leading data-center GPUs while drawing less power, a trade-off Rebellions frames as avoiding the "energy tax" of scaling on GPUs alone [1][2]. The company demonstrated a working board at Hot Chips 2025 running Llama 3.3 70B and the Qwen3 235B Mixture of Experts model [2][4][6].
The product sits at the top of Rebellions' chiplet roadmap and follows two earlier chips, ATOM and REBEL. It is fabricated on a Samsung 4nm process and uses Samsung's I-Cube S advanced packaging to integrate the chiplets and memory [2][5]. As of mid-2026 the part has been shown publicly and described in a conference paper, but Rebellions had not announced general commercial availability, positioning it for proof-of-concept deployments with prospective customers [2][5].
Rebellions is a fabless semiconductor company founded in September 2020 by Sunghyun Park and several co-founders; Park, the chief executive, holds a Ph.D. from MIT's Computer Science and Artificial Intelligence Laboratory [7][8]. The company concentrates on inference accelerators rather than training hardware and works with Samsung Foundry as its manufacturing partner [9][10].
Its first data-center NPU, ATOM, launched in 2023 and targeted models of up to roughly 7 billion parameters, covering vision, recommendation, and smaller language workloads [9][10]. In MLPerf Inference v3.0 results published in April 2023, Rebellions reported that ATOM led its class on single-stream latency for language and vision tasks [11]. The follow-on chip, REBEL, was announced as a co-development with Samsung and introduced a scalable chiplet architecture with 144 GB of HBM3E aimed at large language models and benchmarked against NVIDIA's H200 [9][12].
Rebellions' corporate profile expanded alongside its hardware. In January 2024 it raised about US$124 million in a Series B round led by the telecom operator KT to co-develop REBEL with Samsung [12][13]. In late 2024 it merged with SAPEON Korea to form what the companies described as Korea's first AI-chip unicorn, valued at roughly 1.3 trillion won [7][8]. Later rounds drew investment from Arm and Samsung [14].
REBEL-Quad assembles four identical NPU dies, each around 320 mm², into one system-in-package; the "Quad" name refers to this four-chiplet layout [5]. Rather than build one large monolithic die, Rebellions partitions the compute across smaller chiplets and stitches them together so the package behaves as a single accelerator, an approach intended to improve yield and allow the design to scale across product variants [2][4].
The chiplets communicate over UCIe-Advanced die-to-die links. UCIe is an open interconnect standard for connecting dies within a package, and the "Advanced" profile is intended for fine-pitch advanced-packaging interposers rather than standard organic substrates. Rebellions and its IP partners describe REBEL-Quad as the first NPU to ship with UCIe-Advanced [1][3][6]. According to the ISSCC 2026 disclosure, the die-to-die links run at 16 Gbps and provide an aggregated bandwidth of about 4 TB/s, with a flit-aware die-to-die latency of roughly 11 ns [5]. The interconnect and physical-layer IP were sourced from third parties: Alphawave Semi and Synopsys are both credited as IP and validation partners [2][3].
Each chiplet carries two neural-core clusters of eight cores apiece, 32 MB of shared memory per cluster, and the package totals 256 MB of on-die scratchpad memory [5]. Rebellions emphasizes a unified mixed-precision compute pipeline that handles FP8 and FP16 without switching instruction paths, which it says raises compute density [4]. The four chiplets are paired with four HBM3E stacks (one 12-high 36 GB stack per die) for 144 GB total, integrated using Samsung's I-Cube S 2.5D packaging on a Samsung SF4X process node [2][5].
The table below lists figures as disclosed by Rebellions at Hot Chips 2025 and in the ISSCC 2026 paper. Performance claims are vendor-reported and, where noted, drawn from internal testing.
| Attribute | Disclosed value |
|---|---|
| Configuration | 4 homogeneous NPU chiplets in one package [2][5] |
| Die size (per chiplet) | ~320 mm² [5] |
| Process node | Samsung SF4X (4nm class) [2][5] |
| Packaging | Samsung I-Cube S (2.5D) [2][5] |
| Die-to-die interconnect | UCIe-Advanced, 16 Gbps, ~4 TB/s aggregate, ~11 ns latency [5] |
| Peak compute (FP8) | 2,048 TFLOPS (no sparsity) [1][4][5] |
| Peak compute (FP16) | 1,024 TFLOPS (no sparsity) [4][5] |
| Memory | 144 GB HBM3E (4 x 36 GB, 12-high stacks) [2][5] |
| Memory bandwidth | ~4.8 TB/s [4] |
| On-die scratchpad | 256 MB total (32 MB per cluster) [5] |
| Host interface | 2 x PCIe Gen5 x16 [1][5] |
| Power envelope | ~600 W [5] |
| Reported Llama 3.3 70B latency | ~35.5 ms/token (average) [1] |
| Reported Llama 3.3 70B throughput | 56.8 tokens/s, single batch, 2k/2k input/output [5] |
REBEL-Quad is aimed at inference for frontier-scale language models in the data center, with particular emphasis on Mixture-of-Experts architectures whose large parameter counts strain memory capacity and bandwidth [2][4]. The 144 GB of HBM3E is described as enough to hold tens of billions of parameters on a single accelerator, reducing the need to shard a model across many devices [2]. The Hot Chips demonstrations used dense and sparse models, including Llama 3.3 70B and Qwen3 235B, to illustrate both compute-bound and memory-bound serving [2][4][6].
Rebellions showed REBEL-Quad as working silicon at Hot Chips 2025 in August 2025, running live model inference on a development board [1][6]. Synopsys, one of the validation partners, reported that emulation predicted silicon performance within about 98 percent and that Rebellions went from first silicon to a live demonstration in roughly five weeks [4]. At ISSCC 2026 the company published the full architecture under the name Rebel100 and positioned its single-package performance against NVIDIA's H200 at a lower power envelope, citing 600 W versus the H200's roughly 700 W [5].
Rebellions framed REBEL-Quad as the start of a broader chiplet family, naming follow-on products REBEL-IO, oriented toward trillion-parameter models, and REBEL-CPU for multi-node systems [2]. The company said proof-of-concept engagements with global customers were expected to follow the announcement [2]. As of mid-2026, no general-availability date or pricing had been published.
REBEL-Quad is notable mainly for two reasons. First, it is among the earliest commercial accelerators to adopt UCIe-Advanced for its internal interconnect, making it a reference point for how open chiplet standards might be used in production AI silicon rather than proprietary die-to-die links [1][3][5]. Second, it is one of the more ambitious products from a non-incumbent vendor outside the United States, reflecting a wider effort by Samsung-aligned Korean suppliers to field GPU-class inference hardware [2][9].
The competitive field for data-center inference silicon includes NVIDIA's Blackwell GPUs as well as alternative architectures from companies such as Cerebras, SambaNova, Groq, and Tenstorrent [2]. Rebellions positions REBEL-Quad on energy efficiency, claiming in internal tests on Llama 3.3 70B in FP8 about 1.6 times higher throughput, 50 percent lower power, and therefore roughly 3.2 times the transactions per second per watt of unspecified top-tier GPUs [4].
Most of the performance evidence for REBEL-Quad is vendor-supplied and based on internal benchmarks rather than independent or standardized results such as MLPerf, so the throughput and efficiency comparisons should be read with that in mind. The power and efficiency claims reference "top-tier GPUs" without always naming the exact comparison part or test conditions [4]. Coverage from independent hardware press also noted that the card uses PCIe Gen5 rather than the Gen6 host interface arriving on contemporaneous NVIDIA platforms, a possible bandwidth limitation for some deployments [1]. Finally, as of mid-2026 the product had been demonstrated and published in detail but not announced for general sale, so real-world software maturity, ecosystem support, and volume availability remained unproven [2][5].