Grand Teton (AI hardware)

AI Hardware Data Centers Meta AI

7 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v3 · 1,362 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Grand Teton is an open GPU hardware platform designed by Meta for training and running large AI models. Meta announced it at the 2022 Open Compute Project (OCP) Global Summit on October 18, 2022, and contributed the design to the OCP community as open hardware.^[1]^[2] The platform succeeded Meta's earlier Zion family of AI systems and folded the separate compute, networking, power, and cooling pieces of that design into a single chassis. The first version was built around an NVIDIA HGX baseboard carrying eight NVIDIA H100 Tensor Core GPUs.^[3]^[4] In 2024 Meta extended the platform to support the AMD Instinct MI300X accelerator and contributed that revision to OCP as well.^[5]

The name follows Meta's convention of naming AI hardware after geographic features in the western United States, in this case the Grand Teton peak in Wyoming. The platform sits alongside other Meta open designs such as the Open Rack and, later, the Catalina rack.

Background and the Zion predecessor

Meta has built its data center infrastructure on open designs since it helped found the Open Compute Project in 2011, and it released its first GPU server for AI, Big Sur, in 2015.^[2] The platform that Grand Teton directly replaced was Zion, and its updated form Zion-EX, an AI training system that Meta had been using for deep learning recommendation models and other workloads.

Zion-EX was not a single machine. It was assembled from three separate subsystems: a CPU "head" node, a GPU system, and a switching system, all tied together with external cabling.^[2]^[6] That arrangement worked but added cabling, failure points, and complexity to every deployment, and it limited how much power and bandwidth the design could carry. Grand Teton was Meta's answer to those limits. The company described it as its next-generation, GPU-based platform for AI at scale, built to handle both memory-bandwidth-bound workloads, such as deep learning recommendation models with tens of trillions of parameters, and compute-bound workloads, such as content understanding.^[1]

Design and the integrated chassis

The central change in Grand Teton was consolidation. Where Zion-EX strung three boxes together with external cables, Grand Teton put the components onto a single integrated chassis with fully integrated power, control, compute, and fabric interfaces.^[1]^[2] NextPlatform described the result as a single motherboard into which the CPUs, GPUs, PCI Express switches, and network interface cards all plug, replacing the cabled multi-box layout of the earlier systems.^[6]

Meta argued that the integrated design improved performance, signal integrity, and thermal behavior, while also making the system easier and faster to deploy and roll out across a fleet because there were fewer parts to cable up and fewer points that could fail.^[1]^[2] The single chassis also raised the power and bandwidth ceiling. Relative to Zion-EX, Meta cited:^[1]^[3]

4x the host-to-GPU bandwidth
2x the compute and data network bandwidth
2x the power envelope

Grand Teton was introduced together with Open Rack v3 (ORv3), Meta's updated rack standard that moved power distribution to a busbar and supported air-assisted liquid cooling and facility water cooling for racks drawing up to about 30 kW.^[2] Reporting at the time noted that the GPU tray could be driven toward roughly 1,300 watts per socket using air-assisted liquid cooling combined with facility water, reflecting the higher power envelope.^[6] On the system itself, observers documented a pull-out GPU tray, multiple OCP NIC 3.0 network cards capable of 400 Gbps Ethernet or InfiniBand over PCIe Gen5, and EDSFF (E1.S) flash storage in place of the older M.2 drives.^[4]

NVIDIA H100 launch configuration

The original 2022 Grand Teton used NVIDIA's HGX platform with eight NVIDIA H100 Tensor Core GPUs built on the Hopper architecture, the data-center generation NVIDIA had introduced earlier that year.^[3]^[4] NVIDIA framed Grand Teton as bringing Hopper into Meta's data centers and pointed to the H100's Transformer Engine for accelerating the neural networks behind large models.^[3] NextPlatform noted that the Hopper H100 offered roughly 3x to 6x the performance of the prior Ampere A100 generation, which is why Meta moved to it rather than building Grand Teton around A100.^[6] The A100 had powered Meta's earlier AI Research SuperCluster, not Grand Teton.^[4]

Meta contributed the Grand Teton specifications to the Open Compute Project, with the design slated to enter the OCP repositories in April 2023 so that other operators and vendors could build the same hardware.^[6]

In March 2024 Meta disclosed that it had built two AI training clusters of 24,576 GPUs each on top of Grand Teton, one using an RDMA over Converged Ethernet (RoCE) fabric with Arista switches and the other using NVIDIA Quantum-2 400 Gbps InfiniBand. Meta used these clusters to train its Llama 3 models and said it expected its fleet to reach the equivalent of nearly 600,000 H100 GPUs by the end of 2024.^[7]^[8]

2024 update for AMD Instinct MI300X

At the OCP Global Summit held October 15 to 17, 2024, Meta announced an updated Grand Teton that supports the AMD Instinct MI300X accelerator, and it contributed this version to OCP.^[5]^[9] The revision kept the single integrated chassis approach, with fully integrated power, control, compute, and fabric interfaces, while adding greater compute capacity, expanded memory to hold larger models locally, and more network bandwidth to scale training clusters. Meta positioned the MI300X variant for large-scale AI inference in particular.^[5]

It is worth keeping the 2024 announcements separate, because they are easy to conflate. The AMD Instinct MI300X went into Grand Teton. NVIDIA's Blackwell generation, in the form of the GB200 Grace Blackwell Superchip, went into a different and newer Meta design called Catalina, a full rack-scale system built on a high-power ORv3 rack rated for up to about 140 kW.^[5]^[9] In other words, Grand Teton's accelerator lineup runs NVIDIA H100 (Hopper) and AMD Instinct MI300X, while the NVIDIA Blackwell GB200 belongs to Catalina rather than to Grand Teton.

Platform revisions

Revision	Year announced	GPU / accelerator	Notable changes
Grand Teton (original)	2022	8x NVIDIA H100 (Hopper), HGX baseboard	Single integrated chassis replacing Zion-EX's three boxes; 4x host-to-GPU bandwidth, 2x network bandwidth, 2x power envelope; paired with Open Rack v3; OCP NIC 3.0 and EDSFF storage
Grand Teton (MI300X update)	2024	AMD Instinct MI300X	Same monolithic chassis; greater compute, expanded local memory, more network bandwidth; aimed at large-scale AI inference; contributed to OCP

Open hardware contribution and significance

Grand Teton continued Meta's practice of releasing its data center hardware designs through the Open Compute Project rather than keeping them proprietary, so that suppliers and other operators can manufacture and deploy the same systems.^[1]^[2] By integrating the compute, networking, power, and cooling of a GPU server into one chassis, the platform set a template that Meta carried forward into its later rack-scale designs. Within Meta, Grand Teton became the workhorse for generative AI training, including the clusters used for Llama 3, and its open release gave the wider industry a reference design for dense GPU servers during the rapid AI infrastructure buildout of 2023 and 2024.^[2]^[7]

References

Björlin, Alexis. "OCP Summit 2022: Open hardware for AI infrastructure." Engineering at Meta, October 18, 2022. https://engineering.fb.com/2022/10/18/open-source/ocp-summit-2022-grand-teton/ ↩
"Meta's open AI hardware vision." Engineering at Meta, October 15, 2024. https://engineering.fb.com/2024/10/15/data-infrastructure/metas-open-ai-hardware-vision/ ↩
"Meta's Grand Teton Brings NVIDIA Hopper to Its Data Centers." NVIDIA Blog, October 18, 2022. https://blogs.nvidia.com/blog/meta-grand-teton/ ↩
"This is the Meta Grand Teton 8x NVIDIA H100 AI Machine." ServeTheHome, 2022. https://www.servethehome.com/this-is-the-meta-grand-teton-8x-nvidia-h100-ai-machine/ ↩
"Meta expands Grand Teton server to support AMD MI300X; unveils Catalina rack to support Nvidia Blackwell chips." Data Center Dynamics, October 2024. https://www.datacenterdynamics.com/en/news/meta-expands-grand-teton-server-to-support-amd-mi300x-unveils-catalina-rack-to-support-nvidia-blackwell-chips/ ↩
Morgan, Timothy Prickett. "The Iron That Will Drive AI At Meta Platforms." The Next Platform, October 20, 2022. https://www.nextplatform.com/2022/10/20/the-iron-that-will-drive-ai-at-meta-platforms/ ↩
"Building Meta's GenAI Infrastructure." Engineering at Meta, March 12, 2024. https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/ ↩
"Meta Unveils 24k GPU AI Infrastructure Design." InfoQ, April 2024. https://www.infoq.com/news/2024/04/meta-ai-infrastructure/ ↩
"Meta Announces AMD Instinct MI300X for AI Inference and NVIDIA GB200 Catalina." ServeTheHome, October 2024. https://www.servethehome.com/meta-announces-amd-mi300x-for-ai-inference-marvell-fbnic-cisco-arista-broadcom/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Catalina (Meta AI rack)

Background and the Zion predecessor

Design and the integrated chassis

NVIDIA H100 launch configuration

2024 update for AMD Instinct MI300X

Platform revisions

Open hardware contribution and significance

See also

References

Improve this article

Related Articles

Meta-Amazon Graviton deal

Catalina (Meta AI rack)

Prometheus (Meta data center)

Hyperion (Meta data center)

Meta Compute

Research SuperCluster (RSC)

What links here

Related Articles

Meta-Amazon Graviton deal

Catalina (Meta AI rack)

Prometheus (Meta data center)

Hyperion (Meta data center)

Meta Compute

Research SuperCluster (RSC)