Tenstorrent Galaxy Blackhole

AI Hardware AI Inference

8 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

7 citations

Revision

v1 · 1,615 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Tenstorrent Galaxy Blackhole is an AI inference server built by Tenstorrent, the fabless semiconductor company led by chief executive Jim Keller. The system packs 32 Blackhole accelerators into a 6U air-cooled chassis and was placed into general availability on April 28, 2026, with a public launch event on May 1. Tenstorrent positions the Galaxy as a lower-cost alternative to Nvidia for running large language models and generative video, leaning on three deliberate departures from the GPU mainstream: GDDR6 memory instead of HBM, standard Ethernet instead of a proprietary interconnect, and a fully open-source RISC-V based software stack. A single server lists at $110,000, and four of them form a base "supercluster" priced at $440,000.^[1]^[2]^[3]

The machine is the productized version of a reference design that Tenstorrent had been showing for more than a year. Earlier descriptions of a 32-chip Blackhole "Galaxy" mesh delivering roughly 23.8 petaFLOPS appeared alongside the chip's Hot Chips 2024 debut, but the shipping product announced in April 2026 is the first time the company offered the system for sale with published specifications, named launch partners, and benchmark claims. The Register summarized the arrival bluntly, noting that Tenstorrent's "Galaxy Blackhole AI servers are finally out."^[1]^[4]

What it is

Each Galaxy is a self-contained inference appliance rather than a host plus a tray of coprocessors. The 32 Blackhole ASICs sit in a single 6U air-cooled box and talk to one another over an on-board Ethernet mesh, so a buyer does not need a separate head node or a proprietary switch fabric to make the chips cooperate. That design choice matters because Blackhole itself was built to act as a standalone computer: every chip carries sixteen large SiFive Intelligence X280 RISC-V cores that can boot Linux directly, removing the host CPU from the critical path for orchestration.^[1]^[5]

Tenstorrent's framing for the product is that inference does not need to be carved into the elaborate, multi-tier hardware stacks that have become common. Keller put it this way in the launch announcement: "Every company in the industry is pairing up to build the accelerator accelerator accelerator. CPUs run code. GPUs accelerate CPUs. TPUs accelerate GPUs. LPUs accelerate TPUs. And so on. This leads to complex solutions which are unlikely to be compatible with changes in AI models and uses. At Tenstorrent, we thought something more general and simpler would work."^[2]

A related selling point is that the Galaxy runs both phases of LLM inference, the compute-heavy prefill and the memory-bound decode, on the same hardware. Much of the industry has moved toward disaggregated inference, where separate pools of machines handle prefill and decode. Tenstorrent argues that its on-chip data flow lets one system do both well, which simplifies deployment for teams that would rather not operate two specialized fleets.^[2]^[6]

Building on the Blackhole architecture

The Galaxy inherits everything that defines the Blackhole generation. A single Blackhole die is fundamentally a RISC-V multiprocessor with matrix engines attached. It contains 140 Tensix++ compute cores plus a large CPU island, and counting every embedded controller core Tenstorrent reports 752 small "baby" RISC-V cores alongside the 16 big SiFive cores. The silicon is manufactured on a 6 nanometer TSMC process, pairs each chip with 32 GB of GDDR6, and exposes ten 400 Gbps Ethernet links for roughly 1 TB/s of off-chip bandwidth. At the chip level Blackhole is rated at 745 teraFLOPS of FP8 throughput.^[5]

Stacking 32 of those chips is what produces the Galaxy's headline numbers. The per-chip Ethernet becomes a dense in-chassis fabric, and the per-chip GDDR6 pools into a terabyte of memory addressable across the system. Because the scale-out fabric is ordinary Ethernet rather than NVLink or Infinity Fabric, the same cabling that links chips inside one box also links boxes together, which is how Tenstorrent extends a single server into a multi-rack cluster without specialized switch silicon.^[1]^[3]^[5]

Specifications

The figures below come from Tenstorrent's published Galaxy product page and the company's general availability announcement.^[2]^[3]

Specification	Tenstorrent Galaxy Blackhole
Form factor	6U rackmount, air-cooled
Accelerators	32 Blackhole ASICs
Compute	23 PFLOPS Block FP8
On-chip SRAM	6.2 GB at 2.9 PB/s
DRAM	1 TB GDDR6 at 16 TB/s
On-chip fabric	10 x 400 GbE per ASIC, 32 TB/s aggregate
Scale-out networking	Up to 56 x 800 GbE QSFP-DD ports, 11.2 TB/s
Power	8 to 10 kW average, 12 kW maximum
List price	$110,000
Base supercluster	4 Galaxy systems, from $440,000

The use of GDDR6 rather than the high-bandwidth memory found in Nvidia Blackwell parts is intentional. Analysts at Moor Insights & Strategy noted that Tenstorrent deliberately chose GDDR6 over HBM, standard Ethernet over proprietary fabrics, and air cooling over liquid cooling, all in service of lowering the cost and complexity of inference at scale rather than chasing peak training throughput.^[6]

Open source and RISC-V

Tenstorrent ships the whole software stack as open source, from the compiler down to the kernel level. The toolchain centers on TT-Forge, the company's MLIR-based compiler, layered over the lower-level TT-Metalium runtime and the TT-NN neural network library. Tenstorrent claims that "ninety percent of models from Hugging Face just run on Tenstorrent," and the programming model exposes a Python interface for writing optimized kernels rather than hiding the hardware behind an opaque tensor compiler. Frontier models the company lists as in progress include Moonshot AI's Kimi K2.^[1]^[2]

The openness extends to the hardware design itself. Blackhole boards ship with full schematics and kernel-level access, and the choice to build the entire compute fabric on the open RISC-V instruction set is the foundation of Tenstorrent's pitch that customers can avoid vendor lock-in. For buyers worried about being tied to a single proprietary software ecosystem, the open stack is arguably as important as the price tag.^[2]^[5]

Performance and cost claims

The performance numbers Tenstorrent cites for the Galaxy are vendor figures and should be read as such. On DeepSeek's DeepSeek-R1 0528 671B model, the company reports decode throughput of up to 350 tokens per second per user across batch sizes of 8 to 64 while supporting a 128k token context. Independent reporting put the figure measured at launch closer to 308 tokens per second per user, with a software roadmap toward 500. On the prefill side, Tenstorrent says a four-node supercluster reaches a sub-four-second time to first token on a 100,000 token prompt, which it describes as roughly 166 pages of text processed in under four seconds. For generative video, the company claims it can produce 720p output faster than real time, citing an 81-frame 720p clip in about 2.4 seconds and describing the result as roughly ten times faster than leading GPU systems.^[1]^[2]^[7]

The economic argument is where Tenstorrent is most aggressive. The company says the Galaxy delivers output at about $6 per million tokens against roughly $30 for a comparable Nvidia GB300 setup, which is the basis for its claim of a fivefold total cost of ownership advantage. WCCFtech reported the company vowing to "crush everyone" on inference economics. These are Tenstorrent's own comparisons, and they apply to inference specifically rather than to the full training-plus-inference lifecycle.^[7]

Positioning against Nvidia

Tenstorrent is careful, and so are most reviewers, to frame the Galaxy as an inference machine first. The Register noted that Nvidia's eight-way DGX boxes are "faster and higher capacity" than a single Galaxy, but cost roughly three to five times as much, which is the gap Tenstorrent is trying to exploit. For organizations that need one platform for both large-scale training and inference, Nvidia remains the default. For teams whose problem is serving models cheaply and predictably, the open stack and the lower sticker price are the draw.^[1]^[6]

That positioning showed up in the launch partners. Tenstorrent named cloud and infrastructure providers including Cirrascale and Equinix, Japan's ai&, and customers such as OrionVM, BetterBrain, Virtu Financial, Turiyam, and Prodia Labs as early adopters. Dave Driggers, chief executive of Cirrascale, said the company evaluates a lot of hardware and that "most of it is incremental," adding that "Tenstorrent Galaxy Blackhole is not" and that Tenstorrent "has taken a clean-sheet approach to AI infrastructure." Equinix's Justen Aguillon said the system lets enterprises "stay focused on building differentiated products, not managing infrastructure complexity."^[2]

Availability and scaling

The Galaxy reached general availability on April 28, 2026 and lists at $110,000 for a single 6U server. Customers can deploy configurations ranging from 4 to 36 or more systems, with workloads the company targets including large-scale LLM inference, AI video generation, and private AI infrastructure. The base supercluster bundles four Galaxy systems for $440,000, and because the chips federate over standard Ethernet the architecture scales beyond that to multi-rack deployments, with Tenstorrent highlighting a "supercluster 36" configuration that links 36 boxes into a single system. Reporting around the launch noted the design can support on the order of 32 or more nodes and well over a thousand chips in total.^[1]^[2]^[3]

References

The Register. "Tenstorrent's Galaxy Blackhole AI servers are finally out." April 28, 2026. https://www.theregister.com/2026/04/28/tenstorrent_galaxy_blackhole_ai_servers_ga/ ↩
Tenstorrent. "Tenstorrent Enables AI at Scale with Industry-Leading Performance." Newsroom, April 28, 2026. https://tenstorrent.com/newsroom/tenstorrent-enables-ai-at-scale-with-industry-leading-performance ↩
Tenstorrent. "Tenstorrent Galaxy." Hardware product page. https://tenstorrent.com/en/hardware/galaxy ↩
HPCwire. "Tenstorrent Announces General Availability of Galaxy Blackhole AI System." May 1, 2026. https://www.hpcwire.com/off-the-wire/tenstorrent-announces-general-availability-of-galaxy-blackhole-ai-system/ ↩
The Register. "Tenstorrent details its RISC-V packed Blackhole chips." August 27, 2024. https://www.theregister.com/2024/08/27/tenstorrent_ai_blackhole/ ↩
Moor Insights & Strategy. "Analyst Insight: Tenstorrent Is Disrupting the Inference Market." 2026. https://moorinsightsstrategy.com/analyst-insight-tenstorrent-is-disrupting-the-inference-market/ ↩
WCCFtech. "Tenstorrent Vows to 'Crush Everyone' as Galaxy Blackhole Hits 350 Tokens/s on DeepSeek R1, Undercutting NVIDIA's GB300 5x AI TCO." 2026. https://wccftech.com/tenstorrent-vows-to-crush-everyone-galaxy-blackhole-hits-350-tokens-on-deepseek-r1-undercut-nvidia-gb300-ai-tco/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Groq LPU

What it is

Building on the Blackhole architecture

Specifications

Open source and RISC-V

Performance and cost claims

Positioning against Nvidia

Availability and scaling

References

Improve this article

Related Articles

NVIDIA Picasso

Groq LPU

d-Matrix Corsair

Etched Sohu

Positron AI

AWS Inferentia

What links here

Related Articles

NVIDIA Picasso

Groq LPU

d-Matrix Corsair

Etched Sohu

Positron AI

AWS Inferentia