Grand Teton (AI hardware)
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,358 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,358 words
Add missing citations, update stale details, or suggest a clearer explanation.
Grand Teton is an open GPU hardware platform designed by Meta for training and running large AI models. Meta announced it at the 2022 Open Compute Project (OCP) Global Summit on October 18, 2022, and contributed the design to the OCP community as open hardware.[1][2] The platform succeeded Meta's earlier Zion family of AI systems and folded the separate compute, networking, power, and cooling pieces of that design into a single chassis. The first version was built around an NVIDIA HGX baseboard carrying eight NVIDIA H100 Tensor Core GPUs.[3][4] In 2024 Meta extended the platform to support the AMD Instinct MI300X accelerator and contributed that revision to OCP as well.[5]
The name follows Meta's convention of naming AI hardware after geographic features in the western United States, in this case the Grand Teton peak in Wyoming. The platform sits alongside other Meta open designs such as the Open Rack and, later, the Catalina rack.
Meta has built its data center infrastructure on open designs since it helped found the Open Compute Project in 2011, and it released its first GPU server for AI, Big Sur, in 2015.[2] The platform that Grand Teton directly replaced was Zion, and its updated form Zion-EX, an AI training system that Meta had been using for deep learning recommendation models and other workloads.
Zion-EX was not a single machine. It was assembled from three separate subsystems: a CPU "head" node, a GPU system, and a switching system, all tied together with external cabling.[2][6] That arrangement worked but added cabling, failure points, and complexity to every deployment, and it limited how much power and bandwidth the design could carry. Grand Teton was Meta's answer to those limits. The company described it as its next-generation, GPU-based platform for AI at scale, built to handle both memory-bandwidth-bound workloads, such as deep learning recommendation models with tens of trillions of parameters, and compute-bound workloads, such as content understanding.[1]
The central change in Grand Teton was consolidation. Where Zion-EX strung three boxes together with external cables, Grand Teton put the components onto a single integrated chassis with fully integrated power, control, compute, and fabric interfaces.[1][2] NextPlatform described the result as a single motherboard into which the CPUs, GPUs, PCI Express switches, and network interface cards all plug, replacing the cabled multi-box layout of the earlier systems.[6]
Meta argued that the integrated design improved performance, signal integrity, and thermal behavior, while also making the system easier and faster to deploy and roll out across a fleet because there were fewer parts to cable up and fewer points that could fail.[1][2] The single chassis also raised the power and bandwidth ceiling. Relative to Zion-EX, Meta cited:[1][3]
Grand Teton was introduced together with Open Rack v3 (ORv3), Meta's updated rack standard that moved power distribution to a busbar and supported air-assisted liquid cooling and facility water cooling for racks drawing up to about 30 kW.[2] Reporting at the time noted that the GPU tray could be driven toward roughly 1,300 watts per socket using air-assisted liquid cooling combined with facility water, reflecting the higher power envelope.[6] On the system itself, observers documented a pull-out GPU tray, multiple OCP NIC 3.0 network cards capable of 400 Gbps Ethernet or InfiniBand over PCIe Gen5, and EDSFF (E1.S) flash storage in place of the older M.2 drives.[4]
The original 2022 Grand Teton used NVIDIA's HGX platform with eight NVIDIA H100 Tensor Core GPUs built on the Hopper architecture, the data-center generation NVIDIA had introduced earlier that year.[3][4] NVIDIA framed Grand Teton as bringing Hopper into Meta's data centers and pointed to the H100's Transformer Engine for accelerating the neural networks behind large models.[3] NextPlatform noted that the Hopper H100 offered roughly 3x to 6x the performance of the prior Ampere A100 generation, which is why Meta moved to it rather than building Grand Teton around A100.[6] The A100 had powered Meta's earlier AI Research SuperCluster, not Grand Teton.[4]
Meta contributed the Grand Teton specifications to the Open Compute Project, with the design slated to enter the OCP repositories in April 2023 so that other operators and vendors could build the same hardware.[6]
In March 2024 Meta disclosed that it had built two AI training clusters of 24,576 GPUs each on top of Grand Teton, one using an RDMA over Converged Ethernet (RoCE) fabric with Arista switches and the other using NVIDIA Quantum-2 400 Gbps InfiniBand. Meta used these clusters to train its Llama 3 models and said it expected its fleet to reach the equivalent of nearly 600,000 H100 GPUs by the end of 2024.[7][8]
At the OCP Global Summit held October 15 to 17, 2024, Meta announced an updated Grand Teton that supports the AMD Instinct MI300X accelerator, and it contributed this version to OCP.[5][9] The revision kept the single integrated chassis approach, with fully integrated power, control, compute, and fabric interfaces, while adding greater compute capacity, expanded memory to hold larger models locally, and more network bandwidth to scale training clusters. Meta positioned the MI300X variant for large-scale AI inference in particular.[5]
It is worth keeping the 2024 announcements separate, because they are easy to conflate. The AMD Instinct MI300X went into Grand Teton. NVIDIA's Blackwell generation, in the form of the GB200 Grace Blackwell Superchip, went into a different and newer Meta design called Catalina, a full rack-scale system built on a high-power ORv3 rack rated for up to about 140 kW.[5][9] In other words, Grand Teton's accelerator lineup runs NVIDIA H100 (Hopper) and AMD Instinct MI300X, while the NVIDIA Blackwell GB200 belongs to Catalina rather than to Grand Teton.
| Revision | Year announced | GPU / accelerator | Notable changes |
|---|---|---|---|
| Grand Teton (original) | 2022 | 8x NVIDIA H100 (Hopper), HGX baseboard | Single integrated chassis replacing Zion-EX's three boxes; 4x host-to-GPU bandwidth, 2x network bandwidth, 2x power envelope; paired with Open Rack v3; OCP NIC 3.0 and EDSFF storage |
| Grand Teton (MI300X update) | 2024 | AMD Instinct MI300X | Same monolithic chassis; greater compute, expanded local memory, more network bandwidth; aimed at large-scale AI inference; contributed to OCP |
Grand Teton continued Meta's practice of releasing its data center hardware designs through the Open Compute Project rather than keeping them proprietary, so that suppliers and other operators can manufacture and deploy the same systems.[1][2] By integrating the compute, networking, power, and cooling of a GPU server into one chassis, the platform set a template that Meta carried forward into its later rack-scale designs. Within Meta, Grand Teton became the workhorse for generative AI training, including the clusters used for Llama 3, and its open release gave the wider industry a reference design for dense GPU servers during the rapid AI infrastructure buildout of 2023 and 2024.[2][7]