Wormhole (Tenstorrent)

AI Hardware AI Infrastructure

18 min read

Updated Jul 7, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 7, 2026

Fact-checked

In review queue

Sources

21 citations

Revision

v2 · 3,670 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Wormhole is the second-generation AI accelerator application-specific integrated circuit (ASIC) designed by Tenstorrent, a Toronto-based hardware startup led by chip architect Jim Keller. Announced in 2021 and made commercially available to developers in mid-2024, Wormhole is sold on two PCIe add-in cards, the n150 (single ASIC, 72 Tensix cores, 12 GB GDDR6) and the n300 (dual ASIC, 128 Tensix cores, 24 GB GDDR6), priced at launch from $999. ^[1]^[2]^[4] The chip is notable as one of the first commercially shipping AI accelerators built around a many-core grid of in-house Tensix tiles managed by an array of small RISC-V control cores, and for its emphasis on a fully open-source software stack including the TT-Metalium low-level environment and the TT-Forge and TT-Buda compilers. ^[1]^[5]

Wormhole sits between Tenstorrent's first-generation Grayskull part and its third-generation Blackhole chip in the company's product line. While its raw FP8 throughput per card is well below contemporary flagship parts from NVIDIA and AMD, Wormhole was designed as a building block for scale-out systems, with each ASIC exposing 16 ports of 100 Gigabit Ethernet for direct chip-to-chip communication. ^[7] The 32-chip Galaxy server assembles these links into a single mesh delivering 9.32 PetaFLOPS of FP8 compute behind a unified 384 GB GDDR6 memory pool. ^[3]^[14]

Who developed Wormhole, and when was it released?

Tenstorrent was founded in 2016 by Ljubisa Bajic, Ivan Hamer, and Milos Trajkovic. ^[7] The company set out to design a chip architecture that would scale across a wide range of AI chips deployments, from a single PCIe card in a developer workstation to large multi-rack inference clusters. The first generation product, code-named Grayskull, was a 120-core chip fabricated on GlobalFoundries' 12 nanometer process and targeted as a proof-of-concept developer card.

Wormhole is the follow-on architecture and the first Tenstorrent design to include native high-speed Ethernet for direct chip-to-chip scale-out. The chip was first publicly analyzed in detail by SemiAnalysis in June 2021, which described it as a 670 square millimeter die on the same GlobalFoundries 12 nanometer process as Grayskull, but with the addition of sixteen 100 Gigabit Ethernet ports along the perimeter and an upgraded Tensix core. ^[7]

Jim Keller, the architect behind AMD's Zen CPU family, Apple's A4 and A5 application processors, and the original Tesla Full Self-Driving chip, joined Tenstorrent as Chief Technology Officer in early 2021 and was promoted to Chief Executive Officer in January 2023. ^[1] His arrival raised the company's profile considerably and signaled an intent to build not just an inference accelerator but a long-term competitor to NVIDIA in both AI silicon and licensable RISC-V CPU intellectual property. Keller has been openly combative about the incumbent's software advantage, arguing "Cuda's a swamp, not a moat. x86 was a swamp too," a framing he uses to justify Tenstorrent's fully open stack. ^[17] In December 2024 the company closed a Series D round of more than $693 million at a $2 billion pre-money valuation, led by Samsung Securities and AFW Partners, with participation from investors including Bezos Expeditions, Fidelity, Hyundai Motor Group, and LG Technology Ventures. ^[16]

Tenstorrent quietly shipped Wormhole-based development boards to selected partners and research groups beginning in 2022, and the parts were available through the Tenstorrent DevCloud remote access service. Wide commercial availability of the n150 and n300 PCIe cards arrived on 20 July 2024 with an online store launch. ^[4]

What is inside the Wormhole ASIC?

Physical design

The Wormhole die measures approximately 670 square millimeters and is manufactured on GlobalFoundries' 12 nanometer FinFET process. ^[7]^[13] This is the same node as the prior Grayskull part, a deliberate choice that traded peak transistor density for lower mask costs and faster time to silicon. The Wormhole package exposes a 192 bit GDDR6 memory interface that is wired to six external GDDR6 memory devices on the carrier board. It also brings out 80 SerDes lanes that the on-package controllers configure as sixteen 100 Gigabit Ethernet ports, giving each chip 1.6 terabits per second of aggregate off-chip bandwidth dedicated to direct neighbor links. ^[7]^[13]

Tensix cores

Each Wormhole ASIC contains a grid of Tensix cores. In the n150 product configuration, 72 Tensix cores are enabled (with additional disabled cores on the die for yield), while the full mesh exposed in some internal Tenstorrent documentation is reported as 80 tiles. ^[13] Each Tensix tile is a self-contained processing element that bundles together:

A dense matrix math unit optimized for low precision tensor operations.
A SIMD vector unit for elementwise and reduction operations.
A block of approximately 1.5 megabytes of fast on-tile SRAM, totaling 108 megabytes per ASIC across the 72 active tiles.
Five small RISC-V control processors, nicknamed baby RISC-V cores, that handle instruction issue, network-on-chip transactions, and data movement orchestration.
A pair of bi-directional 2D mesh router endpoints that connect the tile into the on-chip network-on-chip (NoC).

The split into five baby RISC-V cores per tile is a distinctive feature: two of the cores are dedicated to moving data between external GDDR6, neighbor tiles, and the local SRAM scratchpad, while the remaining three orchestrate the matrix and vector compute units. ^[13] This decoupled data-movement model is meant to let the compiler explicitly schedule tensor traffic across the mesh rather than relying on a hardware-managed cache hierarchy, which Tenstorrent argues yields more predictable performance for repetitive AI workloads.

On-chip network and scale-out fabric

The Tensix tiles are arranged in a 2D mesh and stitched together by a bi-directional NoC. At the chip edge, the NoC fabric extends outward through the 16 Ethernet ports, allowing data to traverse from any tile on one chip to any tile on a neighboring chip without going back through the host CPU or PCIe complex. ^[7] Tenstorrent refers to this property as a scale-out architecture, with the chip-to-chip Ethernet links treated by software as an extension of the on-die NoC.

Each Ethernet link is a standard 100 Gigabit Ethernet PHY, and pairs of links are exposed externally through QSFP-DD cages on the n150 and n300 carrier boards. This choice means Wormhole-based systems can be cabled together using off-the-shelf datacenter optics or direct-attach copper, with no proprietary cabling standard analogous to NVIDIA's NVLink or AMD's Infinity Fabric.

Number formats

Wormhole's matrix engines natively support a wide range of numeric formats:

FP8 (used for headline throughput numbers)
FP16 and BFLOAT16
FP32 accumulation and output
Block floating point variants (BFP8, BFP4, BFP2)
Integer formats including INT8
A TF32-like tensor float variant

The combination of dense low precision math, on-tile SRAM, and software-managed data movement is the basis for Tenstorrent's claim that Wormhole achieves a high fraction of its peak FLOPS on real transformer workloads even at small batch sizes.

What are the Wormhole n150 and n300 cards?

Wormhole is sold to developers and small deployments on two three-quarter length PCIe Gen 4 x16 add-in cards. The n150 carries a single Wormhole ASIC and 12 gigabytes of GDDR6; the n300 carries two Wormhole ASICs on a single board for 24 gigabytes of GDDR6 and double the headline compute. ^[2] Each variant is offered in two cooling configurations: the d suffix indicates a desktop active cooler with integrated fan, while the s suffix indicates a passive server cooler designed for chassis with strong directed airflow.

n150 versus n300 specifications

Specification	Wormhole n150 (n150d / n150s)	Wormhole n300 (n300d / n300s)
Wormhole ASICs per card	1	2
Tensix cores	72	128 (64 per ASIC)
On-chip SRAM	108 MB	192 MB (96 MB per ASIC)
GDDR6 memory	12 GB	24 GB
Memory speed	12 GT/sec	12 GT/sec
Memory bandwidth	288 GB/sec	576 GB/sec
FP8 peak throughput	262 TFLOPS	466 TFLOPS
AI clock	1.0 GHz	1.0 GHz
Board power (TDP)	160 W	300 W
QSFP-DD scale-out ports	2 x 200 G active	2 x 200 G active
Host interface	PCIe Gen 4 x16	PCIe Gen 4 x16
Form factor	3/4 length, dual slot	3/4 length, 2.5 slot (active)

A few details on the table merit explanation. The n300's Tensix core count of 128 is lower than two times the n150's 72, because Tenstorrent fuses off additional tiles on the n300 to land at a clean 64 active tiles per ASIC for symmetric mesh routing across the dual-chip card. Similarly, the n300's SRAM of 96 megabytes per ASIC is slightly below the n150's full 108 megabytes for the same reason.

The two QSFP-DD ports on each card carry the chip's high-speed Ethernet links, and the n300 internally also wires its two ASICs together over a chip-to-chip link so that the pair appears as a single tightly coupled compute domain. Multiple n150 or n300 cards in the same chassis can be cabled together through the QSFP-DD ports to form larger meshes without involving the PCIe bus for data plane traffic.

How much did the Wormhole cards cost?

The direct-sale prices at the July 2024 launch were:

Product	Configuration	Launch price (USD)
Wormhole n150s	Single ASIC, passive cooler	$999
Wormhole n300s	Dual ASIC, passive cooler	$1,399
TT-LoudBox workstation	4 x n300s (8 ASICs), tower	$12,000
TT-QuietBox workstation	4 x n300s (8 ASICs), liquid cooled	$15,000

The $999 and $1,399 retail prices placed Wormhole well below the per-card prices of contemporary datacenter AI accelerators (which were generally above $20,000 per H100 SXM) and made it one of the few server-class AI parts that an individual developer could realistically buy. ^[4]^[6] The pricing strategy was explicitly aimed at building a software ecosystem around the chip, in the same way that NVIDIA's consumer GeForce cards seeded CUDA adoption.

What is the Wormhole Galaxy server?

The Wormhole Galaxy server is Tenstorrent's reference design for rack-scale deployment of the chip. The system is a 4U chassis that houses 32 Wormhole Tensix Processors interconnected through their Ethernet links, with an integrated x86 head node for host duties. ^[14]^[3] Galaxy is the first product to fully exploit the chip's scale-out fabric: the 32 ASICs form a single mesh that the software stack presents as one logical accelerator with a pooled memory and combined compute budget.

Galaxy rack specifications

Specification	Wormhole Galaxy
Wormhole ASICs per chassis	32
Chassis form factor	Custom 4U
Aggregate FP8 compute	9.32 PetaFLOPS
Aggregate on-die SRAM	~3.8 GB
Aggregate GDDR6 memory	384 GB (globally accessible)
Per-chip scale-out bandwidth	3.2 Tbps Ethernet (16 x 200 Gb effective via NoC)
Integrated head node	Yes (x86)
Cabling	Standard Ethernet (200 G QSFP-DD)

Because each ASIC contributes 12 gigabytes of GDDR6 to the pool, the full Galaxy presents 384 gigabytes of memory addressable from any compute tile in the mesh through the NoC and Ethernet fabric. ^[3] This is a meaningfully larger working set than a single H100 SXM (80 GB HBM3) and is comparable to the per-GPU memory of AMD's MI300X (192 GB HBM3), though Galaxy's GDDR6 has lower bandwidth per byte than HBM3.

Multiple Galaxy chassis can themselves be cabled together using their QSFP-DD ports, since the underlying transport is standard 100 Gigabit Ethernet PHYs. Tenstorrent positions this property as the principal architectural advantage of Wormhole, since scale-out beyond a single rack does not require proprietary switches.

What software stack does Wormhole use?

A distinguishing feature of Wormhole is that its entire programming environment is open source, with code published on GitHub and developed in the open. ^[12] The stack is layered to expose progressively lower-level control of the hardware. Keller has framed the openness as a recruiting and adoption advantage rather than a giveaway, saying that "if somebody smart says: I want open-source access to your hardware so I can program it the way I want, why would I say no to that?" ^[21]

TT-Forge

TT-Forge is Tenstorrent's current high-level compiler stack and the recommended path for running models on Wormhole and Blackhole. It is an MLIR-based compiler that accepts model graphs from front ends such as PyTorch, ONNX, and JAX, lowers them through TTIR, TTNN, and TTKernel dialects, and emits kernels for the Tensix mesh on top of TT-Metalium. ^[20] Tenstorrent reports that more than 800 model variants are tested in continuous integration on the stack, and by 2026 the company was claiming that "Ninety percent of models from Hugging Face just run on Tenstorrent." ^[18]^[20]

TT-Buda

TT-Buda is the original high-level inference and training framework. It accepts model graphs from PyTorch, TensorFlow, ONNX, and Hugging Face Transformers, lowers them through an internal intermediate representation, and emits kernels for the Tensix mesh. TT-Buda was the recommended entry point at the Wormhole launch, but Tenstorrent has since designated it a legacy stack, positioning the MLIR-based TT-Forge as its successor. ^[20]

TT-Metalium

TT-Metalium (often abbreviated TT-Metal) is the low-level C++ programming environment, analogous in spirit to NVIDIA's CUDA Driver API or AMD's HIP. ^[12] It exposes the Tensix grid, the on-chip NoC, the per-tile SRAM, and the baby RISC-V control cores as first-class entities. Kernels written in TT-Metalium are programs running on the baby RISC-V cores that explicitly orchestrate data movement and matrix engine invocations. This degree of explicit control is intended to let library authors hand-tune critical kernels for transformers, convolutions, and attention.

llama-tt and reference models

llama-tt is Tenstorrent's open source reference implementation of the Llama 2 and Llama 3 model families on Wormhole. It is built on TT-Metalium and is the canonical performance demonstration for the chip on large language model inference, used for the throughput numbers Tenstorrent quotes against competing accelerators.

DevCloud

For developers who want to evaluate Wormhole without buying hardware, Tenstorrent operates a remote access service called DevCloud that hosts n150 cards, n300 cards, and Galaxy systems behind a queueing system. DevCloud has been used by academic groups and prospective customers as the primary on-ramp for evaluating the TT-Buda, TT-Forge, and TT-Metalium stacks.

How does Wormhole compare to the NVIDIA H100 and AMD MI300X?

Wormhole sits in an unusual position in the AI accelerator landscape: its per-card throughput is well below the flagship datacenter parts, but its open software stack, low entry price, and built-in scale-out Ethernet give it a different design point. The table below summarizes how a single n300 compares to a small selection of contemporary 2023-2024 AI accelerators at a high level. ^[4]^[15]

Accelerator	Process	Peak FP8 (per package)	Memory	Memory bandwidth	TDP	Notable interconnect
Tenstorrent Wormhole n300	GF 12 nm	466 TFLOPS	24 GB GDDR6	576 GB/s	300 W	2 x 200 G Ethernet (QSFP-DD)
NVIDIA H100 SXM5	TSMC 4N	~1,979 TFLOPS (dense)	80 GB HBM3	3.35 TB/s	700 W	NVLink 4 (900 GB/s)
AMD Instinct MI300X	TSMC N5/N6	~2,615 TFLOPS (dense)	192 GB HBM3	5.3 TB/s	750 W	Infinity Fabric (896 GB/s)
Groq LPU (v1)	GF 14 nm	N/A (INT8 750 TOPS)	230 MB SRAM (no DRAM)	~80 TB/s on-die	275 W	Proprietary chip-to-chip
Cerebras WSE-3	TSMC 5 nm	125 PFLOPS (sparse FP16)	44 GB on-wafer SRAM	21 PB/s on-wafer	~23 kW (system)	Wafer-scale, no external

On raw arithmetic throughput per chip, Wormhole n300 trails the H100 by roughly four times and the MI300X by roughly five times. Wormhole's memory subsystem uses GDDR6 rather than HBM3, which delivers approximately one fifth to one tenth the bandwidth of the high-end HBM parts on H100 and MI300X. The architectural rebuttal from Tenstorrent is that Wormhole is designed to be deployed at scale: the headline performance number a user should care about, in this view, is the throughput of a 32-card Galaxy mesh rather than a single board, and the throughput per dollar at the rack level is closer to parity than the per-card comparison would suggest. ^[15]

Wormhole's competitive position with respect to inference-only accelerators such as Groq's LPU is more nuanced. Groq's first-generation parts dispense with external DRAM entirely and rely on hundreds of LPUs ganged together to provide model capacity, which yields extremely low latency but constrains practical model sizes. Wormhole's GDDR6 gives it a much larger per-card working set, at the cost of a more conventional memory hierarchy.

How was Wormhole received?

The technical press reaction to Wormhole at its July 2024 launch was generally positive on the architecture and software openness, but cautious on real-world performance. ^[4]^[8]^[9] Reviewers and analysts highlighted four points consistently:

The $999 entry price for the n150 was described as a significant lowering of the cost of entry for hands-on experimentation with a server-class AI ASIC, particularly relative to NVIDIA's datacenter cards. ^[4]
The fully open source TT-Metalium and TT-Buda stack drew comparisons to early-CUDA-era NVIDIA in terms of community access, though the maturity gap relative to CUDA was acknowledged as substantial. ^[5]
The use of 100 Gigabit Ethernet for chip-to-chip scale-out, rather than a proprietary fabric, was viewed as a strategic differentiator that could allow Wormhole-based systems to be deployed in standard datacenter racks.
Per-card peak throughput was acknowledged as well below H100 and MI300X, with most independent commentary noting that direct apples-to-apples benchmarks against NVIDIA's CUDA software stack were difficult to produce and that vendor numbers from both sides should be read carefully.

Independent benchmarks published in 2025 and 2026 by parties such as SemiAnalysis and the Spheron Network blog generally found that Wormhole achieved competitive throughput per dollar on transformer inference at the rack level, particularly for medium-sized language models that fit comfortably in the Galaxy's 384 GB pooled memory, while CUDA remained the dominant software environment for training and for production serving with heterogeneous workloads. ^[15]

What is Wormhole's successor? Blackhole

Tenstorrent announced the Wormhole successor, Blackhole, in August 2024 and launched developer products at the Tenstorrent Dev Day event in San Francisco on 3 April 2025. ^[10]^[11] Blackhole moves the design to a 6 nanometer process, increases the number of integrated general purpose RISC-V cores, raises the on-die NoC bandwidth, and increases memory density. ^[11] At launch the headline Blackhole p150 PCIe card was rated at up to 774 FP8 TFLOPS, roughly 1.7 times the per-board throughput of the Wormhole n300 with which it is otherwise broadly comparable in form factor. In January 2026 Tenstorrent cut the p150's active Tensix core count from 140 to 120 through firmware version 19.5.0, lowering its rated block-FP8 throughput from 774 to 664 TFLOPS while stating that typical workloads see only a 1 to 2 percent difference. ^[19]

Blackhole's developer cards launched at the same $999 and $1,399 price points that Wormhole established: the p100 at $999 (single processor, no Ethernet, active cooled) and the p150 at $1,399 (single processor with Ethernet, offered in passive, active, and liquid cooled variants). ^[10] The TT-QuietBox workstation built around four Blackhole processors launched at $11,999. ^[10] The Galaxy Blackhole rack-scale server reached general availability on 28 April 2026 as a 6U system with 32 Blackhole accelerators, delivering 23 PetaFLOPS of dense FP8 compute, 1 terabyte of GDDR6, 16 terabytes per second of memory bandwidth, and 100 terabits per second of aggregate Ethernet mesh bandwidth, priced at $110,000 per system (or $440,000 for a four-node Supercluster). ^[18] Tenstorrent said "Ninety percent of models from Hugging Face just run on Tenstorrent" and claimed a four-node Blackhole supercluster can process a 100,000 token prompt in under four seconds while generating up to 300 tokens per second per user. ^[18] Wormhole itself remained in the catalog as the lower-priced option in the Tenstorrent product range.

References

Tenstorrent. "Wormhole." Product page. https://tenstorrent.com/hardware/wormhole ↩
Tenstorrent. "Wormhole n150d/n150s/n300d/n300s Tensix Processor: Specifications/Requirements." Tenstorrent Documentation. https://docs.tenstorrent.com/aibs/wormhole/specifications.html ↩
Tenstorrent. "Galaxy." Product page. https://tenstorrent.com/hardware/galaxy ↩
AnandTech (Ryan Smith). "Tenstorrent Launches Wormhole AI Processors: 466 FP8 TFLOPS at 300W." 20 July 2024. https://www.anandtech.com/show/21482/tenstorrent-launches-wormhole-ai-processors-466-fp8-tflops-at-300w ↩
ServeTheHome. "Tenstorrent Wormhole Developer Kits Launched." July 2024. https://www.servethehome.com/tenstorrent-wormhole-developer-kits-launched-risc-v/ ↩
Tom's Hardware. "Tenstorrent's RISC-V-based Wormhole AI accelerators are available for pre-order today; pre-built workstations start at $12,000." 2024. https://www.tomshardware.com/pc-components/gpus/tenstorrents-risc-v-based-wormhole-ai-accelerators-are-available-for-pre-order-today-pre-built-workstations-start-at-dollar12000 ↩
SemiAnalysis. "Tenstorrent Wormhole Analysis: A Scale Out Architecture for Machine Learning That Could Put Nvidia On Their Back Foot." 25 June 2021. https://semianalysis.com/2021/06/25/tenstorrent-wormhole-analysis-a-scale/ ↩
LinuxGizmos. "Tenstorrent Unveils Next Generation Wormhole-based Developer Kits and Workstations." 2024. https://linuxgizmos.com/tenstorrent-unveils-next-generation-wormhole-based-developer-kits-and-workstations/ ↩
HotHardware. "Tenstorrent Wormhole Dev Kits and Workstations Power High-End AI Development." 2024. https://hothardware.com/news/tenstorrent-wormhole-developer-kits-workstations ↩
Tenstorrent. "Tenstorrent Launches Blackhole Developer Products at Tenstorrent Dev Day." 3 April 2025. https://tenstorrent.com/newsroom/tenstorrent-launches-blackhole-developer-products-at-tenstorrent-dev-day ↩
The Register. "Tenstorrent details its RISC-V packed Blackhole chips." 27 August 2024. https://www.theregister.com/2024/08/27/tenstorrent_ai_blackhole/ ↩
Tenstorrent. "tt-metal" (TT-Metalium SDK), GitHub repository. https://github.com/tenstorrent/tt-metal ↩
Corsix. "Tenstorrent Wormhole Series Part 1: Physicalities." https://www.corsix.org/content/tt-wh-part1 ↩
EPCC / RISC-V (HPC Asia 2025). "Introduction to Tenstorrent." http://riscv.epcc.ed.ac.uk/assets/files/hpcasia25/Tenstorrent.pdf ↩
Spheron Network Blog. "Tenstorrent vs NVIDIA: Open-Source AI Hardware Compared for Inference and Training (2026)." https://www.spheron.network/blog/tenstorrent-vs-nvidia-open-source-ai-hardware/ ↩
Tenstorrent. "Tenstorrent closes $693M+ of Series D funding led by Samsung Securities and AFW Partners." Newsroom, December 2024. https://tenstorrent.com/newsroom/tenstorrent-closes-693m-of-series-d-funding-led-by-samsung-securities-and-afw-partners ↩
Tom's Hardware. "Jim Keller criticizes Nvidia's CUDA, x86: 'Cuda's a swamp, not a moat. x86 was a swamp too.'" https://www.tomshardware.com/tech-industry/artificial-intelligence/jim-keller-criticizes-nvidias-cuda-and-x86-cudas-a-swamp-not-a-moat-x86-was-a-swamp-too ↩
The Register. "Tenstorrent's Galaxy Blackhole AI servers are finally out." 28 April 2026. https://www.theregister.com/2026/04/28/tenstorrent_galaxy_blackhole_ai_servers_ga/ ↩
Tom's Hardware. "Jim Keller's Tenstorrent is downgrading Blackhole p150 cards from 140 to 120 tensor cores via firmware update." 2026. https://www.tomshardware.com/tech-industry/semiconductors/jim-kellers-tenstorrent-is-downgrading-blackhole-p150-cards-from-140-to-120-tensor-cores-via-firmware-update ↩
Tenstorrent. "tt-forge" (MLIR-based compiler), GitHub repository. https://github.com/tenstorrent/tt-forge ↩
EE Times. "Jim Keller: 'Whatever Nvidia Does, We'll Do The Opposite.'" https://www.eetimes.com/jim-keller-whatever-nvidia-does-well-do-the-opposite/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Blackhole (Tenstorrent)Tenstorrent Untether AI

Who developed Wormhole, and when was it released?

What is inside the Wormhole ASIC?

Physical design

Tensix cores

On-chip network and scale-out fabric

Number formats

What are the Wormhole n150 and n300 cards?

n150 versus n300 specifications

How much did the Wormhole cards cost?

What is the Wormhole Galaxy server?

Galaxy rack specifications

What software stack does Wormhole use?

TT-Forge

TT-Buda

TT-Metalium

llama-tt and reference models

DevCloud

How does Wormhole compare to the NVIDIA H100 and AMD MI300X?

How was Wormhole received?

What is Wormhole's successor? Blackhole

See also

References

Improve this article

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here