Meta MTIA

AI Hardware AI Infrastructure

11 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v2 · 2,110 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

MTIA (Meta Training and Inference Accelerator) is a family of custom AI chips that Meta designs in-house to run its largest artificial intelligence workloads, beginning with the deep learning recommendation and ranking models that decide what appears in Facebook and Instagram feeds and ads. Meta announced the first chip, MTIA v1, in May 2023 on a TSMC 7 nanometer process, followed by a next-generation MTIA in April 2024 on TSMC's 5 nanometer process with roughly 3.5 times the dense compute. MTIA is an AI chip built mainly for inference: it runs alongside graphics processing units in Meta's data centers through a software stack built on PyTorch, and it is part of a broader effort to improve performance-per-watt and reduce Meta's dependence on a single GPU supplier.^[1]^[2]

Meta described the program as "a family of recommendation-specific Meta Training and Inference Accelerator (MTIA) ASICs," created because "GPUs were not always optimal for running Meta's specific recommendation workloads at the levels of efficiency required at our scale."^[1] The first generation was fabricated on TSMC 7 nm.^[1]^[3] A next-generation MTIA followed in April 2024 on TSMC's 5 nanometer process, with roughly three and a half times the dense compute of the original and more than double the memory.^[2]^[4] The work is part of a wider shift among hyperscale operators toward in-house silicon, a trend that also produced Google's TPU and Amazon's Trainium and Inferentia lines.^[5]^[6]

What is Meta MTIA?

MTIA is an application-specific integrated circuit (ASIC), a chip purpose-built for one kind of work rather than a general-purpose processor. Meta's specific job for it is serving deep learning recommendation models, often shortened to DLRMs, the systems that rank candidate posts, reels, and ads for billions of users.^[1]^[2] An MTIA accelerator is not a product Meta sells; it is internal silicon deployed across the company's own fleet, co-designed with its compiler and runtime so that the hardware targets Meta's exact model shapes.^[1]^[7] Because MTIA runs through PyTorch, Meta's open-source machine learning framework, model authors can keep working in PyTorch while a compiler maps their code onto the chip.^[1]^[2]

Why did Meta build its own AI chip?

Meta serves recommendation models at a scale that is hard to appreciate from the outside. Ranking the candidate posts, reels, and ads for billions of people happens billions of times a day, and the models behind it have grown steadily larger and more complex.^[1]^[7] These deep learning recommendation models have an unusual shape. They mix dense neural network layers with very large embedding tables, and they are bound as much by memory access and data movement as by raw arithmetic.^[1]^[8]

General-purpose GPUs can run this work, but they are tuned for the dense matrix math of training large neural networks rather than the sparse, memory-heavy access patterns of recommendation inference. For a workload that runs constantly across a fleet of this size, the figures that matter most are performance per watt and performance per total cost of ownership.^[1]^[7] As Meta put it, "Efficiency is one of the most important factors for deploying accelerators in the data center, and TCO is a measure of efficiency."^[1] By designing the chip and its compiler together, Meta could target its own model shapes directly, squeeze more useful work out of each watt, and gain more control over its hardware roadmap rather than depending entirely on outside suppliers.^[2]^[7] Reducing reliance on a single GPU vendor, Nvidia, is a recurring theme in how analysts read the program, though Meta has framed MTIA as a complement to GPUs rather than a replacement.^[4]^[6]

The first chip was presented in a paper at the International Symposium on Computer Architecture in 2023, where Meta described the hardware and software as co-designed for recommendation inference and reported improved performance per watt over GPUs on its internal models.^[3]^[8] At the time of the May 2023 announcement, Meta said the silicon was already running in its data centers.^[1]

When was MTIA released and what are its specifications?

The two announced generations share a family resemblance. Both use an eight by eight grid of processing elements with RISC-V control, on-chip memory measured in hundreds of megabytes, and off-chip LPDDR5. The second generation moves to a denser process node, runs at a higher clock, and roughly triples dense throughput while widening the memory system.^[1]^[2]^[4]

Feature	MTIA v1	Next-generation MTIA
Announced	May 2023	April 2024
Process	TSMC 7 nm	TSMC 5 nm
Clock frequency	800 MHz	1.35 GHz
Processing element grid	8 x 8 (64 PEs)	8 x 8 (64 PEs)
On-chip SRAM	128 MB	256 MB
Off-chip memory	64 GB LPDDR5	128 GB LPDDR5
LPDDR5 bandwidth	not stated	204.8 GB/s
INT8 compute (dense)	102.4 TOPS	354 TOPS
INT8 compute (sparse)	not stated	708 TOPS
FP16/BF16 compute (dense)	51.2 TFLOPS	177 TFLOPS
TDP	25 W	90 W
Die size	not stated	421 mm2

Sources: Meta AI blog and independent reporting.^[1]^[2]^[4]

MTIA v1 ran at 800 MHz and delivered 102.4 TOPS at INT8 precision and 51.2 TFLOPS at FP16, inside a 25 watt thermal envelope.^[1] The next-generation part lifts the clock to 1.35 GHz on a 421 square millimeter die and reports 354 TOPS of dense INT8 throughput, with sparse INT8 reaching 708 TOPS thanks to hardware support for sparsity.^[2]^[4] Meta said the new chip "more than doubles the compute and memory bandwidth of our previous solution while maintaining our close tie-in to our workloads."^[2] Independent coverage put the jump at about three and a half times the dense compute and seven times the sparse compute of the original, with on-chip SRAM doubled to 256 MB and LPDDR5 capacity doubled to 128 GB.^[4] The higher 90 watt power budget reflects the larger, faster chip.^[2]^[4]

How is an MTIA chip built?

An MTIA accelerator is built around a grid of processing elements connected by an on-chip mesh network, with each generation using an eight by eight layout of 64 elements.^[1]^[2] Each processing element contains two RISC-V cores, one of them carrying a vector extension, together with a set of fixed-function blocks that handle common operations such as matrix multiplication and the data movement around the embeddings.^[1]^[3] Choosing the open RISC-V instruction set for control gave Meta flexibility to shape the cores around its own software rather than around a fixed commercial design.^[1]

Memory is organized as a hierarchy. Each processing element has a small amount of local storage, the chip carries a large pool of on-chip SRAM, 128 MB on the first generation and 256 MB on the second, and off-package LPDDR5 provides the bulk capacity, 64 GB and then 128 GB.^[1]^[2] That structure suits recommendation models well, because keeping more of the frequently used embedding data close to the compute reduces the trips to slower memory that otherwise dominate the workload.^[1]^[8]

The chips do not run on their own. The next-generation design is packaged onto boards that go into a rack-scale system. Meta described a configuration of three chassis per rack, twelve boards per chassis, and two accelerators per board, for 72 accelerators in a single system serving production traffic.^[2]^[9]

What software does MTIA use?

Hardware is only useful if Meta's engineers and models can reach it without rewriting everything, so the MTIA effort has always been paired with a software stack built around PyTorch, Meta's open-source machine learning framework.^[1]^[2] The aim is for model authors to keep working in PyTorch while a compiler and runtime map their code onto the accelerator.^[1]^[7]

For the next generation, Meta leaned on Triton, an open-source language and compiler for writing GPU and accelerator kernels, building a Triton-MTIA backend that turns Triton code into instructions for the chip.^[2]^[9] This lets the same high-level kernels target different hardware and frees engineers from hand-writing low-level code for every operation. Tying the silicon to a widely used framework also smooths the path for moving models between GPUs and MTIA inside the same fleet.^[2]^[7]

What is MTIA used for, and is it moving to generative AI?

MTIA started life squarely as a recommendation and ranking engine, and that remains its main job in production today.^[2]^[4] But the boom in generative AI, including Meta's own Llama family of large language models, reshaped the company's priorities and its appetite for compute.^[6]^[7] Meta has said it intends to broaden MTIA to handle more of these generative workloads over time, extending a chip first built for sparse recommendation math toward the dense transformer math that drives text and image generation.^[2]^[9]

That ambition has limits worth stating plainly. The next-generation MTIA was widely read as an inference and internal-workload part rather than a competitor to Nvidia's top training GPUs such as the H100, and several reviewers stressed that it was not meant to be an Nvidia killer.^[4] Meta described the chip as "highly complementary to commercially available GPUs in delivering the optimal mix of performance and efficiency on Meta-specific workloads," and the company continued to buy large quantities of GPUs for training even as it expanded MTIA.^[2]^[4] That fits the picture of custom silicon handling steady, well-understood inference while GPUs cover the frontier of training and experimentation.^[4]^[6]

How does MTIA fit Meta's infrastructure and the hyperscaler trend?

MTIA is one piece of a larger AI infrastructure that Meta has been building out, which also includes massive GPU clusters and the open hardware designs the company contributes through the Open Compute Project.^[2]^[7] Within that mix, the accelerators are positioned as the efficient option for the recommendation and ranking work that runs constantly across Facebook and Instagram, leaving GPUs to absorb the heaviest training jobs and the fast-moving generative experiments.^[2]^[4]

The broader context is a move by the largest cloud and platform companies to design their own AI chips. Google has shipped successive generations of its TPU since the mid-2010s, Amazon offers Trainium for training and Inferentia for inference, and Microsoft introduced its Maia accelerator in 2023.^[5]^[6] The motivations rhyme across all of them, namely better efficiency on their own workloads, more control over the hardware and software stack, and less exposure to the cost and supply constraints of buying every chip from one vendor.^[5]^[6] MTIA is Meta's entry in that group, distinctive for how tightly it is shaped around recommendation models and the embedding-heavy access patterns they create.^[1]^[8]

What is MTIA's significance, and what are its limits?

MTIA shows that a company with a large and well-characterized internal workload can justify designing its own silicon, and that the design can pay off in efficiency when the chip and its compiler are built together for that workload.^[3]^[8] The generation-to-generation jump, from 7 nm to 5 nm and from roughly 100 to over 350 dense INT8 TOPS, suggests the program is a sustained investment rather than a one-off experiment.^[1]^[2]^[4]

The limits are equally clear. MTIA is internal hardware, not a product Meta sells, so its reach is bounded by Meta's own needs and its success is hard to measure from outside the company.^[4]^[7] It targets inference and recommendation rather than the most demanding training, which keeps Meta dependent on GPUs for the leading edge of model development.^[4]^[6] And designing custom chips carries real cost and schedule risk, including the long lead times of taping out new silicon and the work of keeping a software stack current with fast-changing models.^[6]^[7] How well MTIA stretches to cover generative AI, and whether later generations close more of the gap with general-purpose accelerators, will shape how large a role it ends up playing in Meta's infrastructure.^[2]^[4]

References

Meta AI. "MTIA v1: Meta's first-generation AI inference accelerator." May 18, 2023. https://ai.meta.com/blog/meta-training-inference-accelerator-AI-MTIA/ ↩
Meta AI. "Our next-generation Meta Training and Inference Accelerator (MTIA)." April 10, 2024. https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/ ↩
Firoozshahian, Amin, et al. "MTIA: First Generation Silicon Targeting Meta's Recommendation Systems." Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023. https://dl.acm.org/doi/10.1145/3579371.3589350 ↩
Mann, Tobias. "Meta's next-gen AI chip nearly triples performance, but it's no Nvidia killer." The Register, April 10, 2024. https://www.theregister.com/2024/04/10/meta_mtia_ai_chip/ ↩
Futurum Group. "Meta's Second Gen AI Inference Chip." April 2024. https://futurumgroup.com/insights/metas-second-gen-ai-inference-chip/ ↩
SiliconANGLE. "Meta triples power of its AI data centers with new custom silicon." April 10, 2024. https://siliconangle.com/2024/04/10/meta-triples-power-ai-data-centers-new-custom-silicon/ ↩
SiliconANGLE. "Meta unveils MTIA, its first custom AI inference chip." May 18, 2023. https://siliconangle.com/2023/05/18/meta-unveils-mtia-first-custom-ai-inference-chip/ ↩
ACM Digital Library. "MTIA: First Generation Silicon Targeting Meta's Recommendation Systems" (abstract). ISCA 2023. https://dl.acm.org/doi/10.1145/3579371.3589350 ↩
DataCenterDynamics. "Meta's MTIA v1 is a custom AI inference accelerator." May 2023. https://www.datacenterdynamics.com/en/news/metas-mtia-v1-is-a-custom-ai-inference-accelerator/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Etched Sohu Meta MTIA 300 series Meta-Amazon Graviton deal Prometheus (Meta data center)

What is Meta MTIA?

Why did Meta build its own AI chip?

When was MTIA released and what are its specifications?

How is an MTIA chip built?

What software does MTIA use?

What is MTIA used for, and is it moving to generative AI?

How does MTIA fit Meta's infrastructure and the hyperscaler trend?

What is MTIA's significance, and what are its limits?

References

Improve this article

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here