Research SuperCluster (RSC)

AI Infrastructure Data Centers Meta AI

7 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

6 citations

Revision

v2 · 1,318 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The Research SuperCluster (RSC) is an AI supercomputer built by Meta AI, the artificial intelligence research division of Meta Platforms (the company formerly known as Facebook). Meta announced RSC on January 24, 2022, describing it at launch as among the fastest AI supercomputers then running and stating that, once fully built out, it would be the fastest AI supercomputer in the world. ^[1]^[2] The machine was constructed in two phases: an initial deployment of 6,080 NVIDIA A100 GPUs, expanded by 2023 to a full configuration of 16,000 A100 GPUs. Meta built RSC to train large models on real production data, including data generated by users of its services, and later used it to train models such as LLaMA in early 2023. ^[1]^[3]

Announcement and purpose

Meta presented RSC as the successor to an earlier in-house research cluster that used NVIDIA V100 GPUs. The company said RSC was designed to let researchers train models with trillions of parameters, work across hundreds of languages, and jointly analyze text, images, and video. Meta framed these capabilities partly in terms of its longer-term metaverse ambitions, including translation and augmented reality tools, while the immediate workloads were natural language processing, computer vision, and speech research. ^[1]^[2]

A distinguishing design goal was training on real, sometimes user-generated, data rather than only on open or synthetic datasets. To support this while addressing privacy and security concerns, Meta said data used on RSC was end-to-end encrypted and decrypted only immediately before training, that the facility was isolated from the wider internet with no direct inbound or outbound connections, and that it passed through a privacy review process before reaching the system. ^[1] NVIDIA, Meta's primary hardware partner on the project, said RSC took roughly 18 months to go from concept to a working system and described it as the largest customer installation of NVIDIA DGX A100 systems to that date. ^[2]

Phase 1 configuration (January 2022)

At announcement, the first phase of RSC was operational with 760 NVIDIA DGX A100 systems serving as compute nodes. Each DGX A100 contains eight A100 GPUs, for a total of 6,080 A100 GPUs. ^[1]^[2] The nodes were connected with NVIDIA Quantum InfiniBand in a two-level Clos fabric with no oversubscription. NVIDIA described the per-link rate as 200 Gb/s InfiniBand (the HDR rate of the A100 generation), while Meta's own write-up described the fabric in aggregate as NVIDIA Quantum 1600 Gb/s InfiniBand, reflecting the multiple adapters per node. ^[1]^[2]

For storage, RSC used a high-performance tier built primarily on Pure Storage hardware plus a large cache layer. Meta reported 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage hosted on Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade. ^[1]^[4] NVIDIA put the phase 1 throughput figure at 1,895 petaflops of TF32 performance. ^[2]

Full buildout target (mid-2022)

Meta said that through 2022 it would expand RSC from 6,080 to 16,000 GPUs, an increase the company estimated would raise AI training performance by more than 2.5 times. Once complete, the InfiniBand fabric was designed to connect 16,000 GPU endpoints in a two-layer topology with no oversubscription, which Meta noted would make it one of the largest such networks deployed at the time. The storage and caching system was designed to serve 16 terabytes per second of training data and to scale toward exabyte capacity. Meta projected that the finished machine would deliver nearly 5 exaflops of mixed-precision AI compute. ^[1]^[2]

The table below summarizes the two configurations.

Specification	Phase 1 (Jan 2022)	Full buildout (target / 2023)
DGX A100 systems	760	2,000
A100 GPUs	6,080	16,000
Networking	NVIDIA Quantum InfiniBand, two-level Clos, no oversubscription	NVIDIA Quantum InfiniBand, 16 Tb/s fabric
InfiniBand per-link rate	200 Gb/s (HDR)	200 Gb/s (HDR)
Storage (FlashArray)	175 PB	175 PB
Storage (FlashBlade)	10 PB	10 PB
Cache	46 PB (Penguin Computing Altus)	up to 80 PB cache
Bulk storage capacity	N/A	over half an exabyte
Storage throughput target	N/A	16 TB/s
Mixed-precision compute (target)	1,895 petaflops TF32	nearly 5 exaflops

Performance claims and the "fastest" framing

Meta supported its speed claims with early internal benchmarks comparing RSC against its previous production and research infrastructure. The company reported that RSC ran computer vision workflows up to 20 times faster, ran the NVIDIA Collective Communication Library (NCCL) more than nine times faster, and trained large-scale NLP models about three times faster, such that models with tens of billions of parameters could finish training in roughly three weeks rather than nine. ^[1]^[2]

The "fastest AI supercomputer in the world" language should be read carefully. It was Meta's own forward-looking claim about the fully built-out machine, conditioned on a mid-2022 completion, and was stated in terms of AI-oriented mixed-precision performance rather than the double-precision (FP64) benchmark used to rank the TOP500 list. Independent reporting at the time treated the ranking cautiously. TechCrunch noted that Meta said the full system might rank among the top of the fastest supercomputers, but pointed out that mixed-precision figures (RSC's roughly 1.9 exaflops at phase 1) are not directly comparable to the FP64 results of systems like Fugaku or Summit, and questioned how much such cross-precision comparisons mean. ^[5] In its later materials, Meta described the finished RSC more modestly as "one of the fastest AI supercomputers in the world." ^[3]

Completion and models trained on RSC

Meta reported the second phase complete in 2023. By then RSC comprised 2,000 NVIDIA DGX A100 systems, totaling 16,000 NVIDIA A100 Tensor Core GPUs, connected by an NVIDIA Quantum InfiniBand fabric rated at 16 Tb/s, and reaching close to 5 exaflops of compute at full strength. The completed storage configuration was described as 80 PB of cache and over half an exabyte of bulk storage, with throughput reaching 16 TB/s. ^[3]^[6]

RSC was used to train several notable Meta models. Most prominently, the LLaMA family of large language models was trained on the cluster in early 2023; Meta reported that the largest model, LLaMA 65B, was trained on 2,048 A100 GPUs in about 21 days at a rate of roughly 380 tokens per second per GPU, on 1.4 trillion tokens. ^[3] Meta also cited RSC's role in other research efforts, including the NLLB-200 multilingual translation model (part of the No Language Left Behind project), a universal speech translator for the Hokkien language, and a neural theorem prover. ^[3]

Significance

RSC marked a shift in how Meta provisioned AI research compute, moving from a V100-based cluster to a purpose-built A100 system intended for training very large models on production-scale data with explicit privacy controls. Its completed scale of 16,000 A100 GPUs placed it among the larger AI training clusters of its era and provided the infrastructure behind Meta's first LLaMA release, which in turn seeded a broad ecosystem of open-weight language models. By 2024, Meta had begun describing newer GPU clusters built on later NVIDIA hardware, positioning RSC as one milestone in a continuing buildout of AI infrastructure rather than an endpoint. ^[6]

References

Meta AI, "Introducing the AI Research SuperCluster: Meta's cutting-edge AI supercomputer for AI research," January 24, 2022. https://ai.meta.com/blog/ai-rsc/ ↩
NVIDIA, "Meta Collaborates with NVIDIA to Build Largest Customer DGX A100 AI Supercomputer," NVIDIA Blog, January 24, 2022. https://blogs.nvidia.com/blog/meta-ai-supercomputer-dgx/ ↩
Meta AI, "Pursuing groundbreaking scale and accelerating research using Meta's Research SuperCluster," 2023. https://ai.meta.com/blog/supercomputer-meta-research-supercluster-2023/ ↩
Pure Storage, "Pure Storage Partners with Meta on AI Research SuperCluster (RSC)," press release, 2022. https://www.purestorage.com/company/newsroom/press-releases/pure-partners-with-meta-on-ai-research-supercluster.html ↩
Devin Coldewey, "Meta leaps into the supercomputer game with its AI Research SuperCluster," TechCrunch, January 24, 2022. https://techcrunch.com/2022/01/24/meta-leaps-into-the-supercomputer-game-with-its-ai-research-supercluster/ ↩
HPCwire, "Meta Completes Research SuperCluster, Announces Next-Gen Datacenter," May 18, 2023. https://www.hpcwire.com/2023/05/18/meta-completes-research-supercluster-announces-next-gen-datacenter/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Prometheus (Meta data center)

Announcement and purpose

Phase 1 configuration (January 2022)

Full buildout target (mid-2022)

Performance claims and the "fastest" framing

Completion and models trained on RSC

Significance

References

Improve this article

Related Articles

Prometheus (Meta data center)

Hyperion (Meta data center)

Meta Compute

Meta-Amazon Graviton deal

Grand Teton (AI hardware)

Catalina (Meta AI rack)

What links here

Related Articles

Prometheus (Meta data center)

Hyperion (Meta data center)

Meta Compute

Meta-Amazon Graviton deal

Grand Teton (AI hardware)

Catalina (Meta AI rack)