# ASIC

> Source: https://aiwiki.ai/wiki/asic
> Updated: 2026-06-23
> Categories: AI Hardware
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

An **application-specific integrated circuit** (**ASIC**) is a chip designed to perform one job, in contrast to a general-purpose [CPU](/wiki/cpu) or a broadly programmable [GPU](/wiki/gpu). By committing a single workload to fixed silicon, an ASIC gives up flexibility in exchange for higher throughput, lower energy per operation, and lower unit cost at volume. In artificial intelligence, the term has become shorthand for the custom [AI accelerator](/wiki/ai_chips) chips built around the dense matrix arithmetic of neural networks. [Google](/wiki/google)'s [Tensor Processing Unit](/wiki/tpu) (TPU), serving production traffic since 2015, proved the approach at datacenter scale: Google measured the first TPU at 15 to 30 times the inference throughput of contemporary CPUs and GPUs and 30 to 80 times their performance per watt [1]. By 2026 every US hyperscaler fields in-house AI silicon, [OpenAI](/wiki/openai) has commissioned 10 gigawatts of its own accelerators through [Broadcom](/wiki/broadcom) [7], and industry projections cited in May 2026 put custom ASICs at 27.8 percent of AI accelerator shipments for the year, growing nearly three times as fast as merchant GPUs [13].

## Overview

An ASIC implements its target computation directly in dedicated logic, recovering the overhead that general-purpose processors spend on instruction decode, cache hierarchies, and dynamic scheduling. The category dates to the 1980s gate-array era, but the economics have always been the defining feature: design costs are enormous and fixed, so custom silicon pays off only when a workload is large, stable, and predictable enough to amortize them.

Modern AI accelerators sit on a spectrum rather than at a binary extreme. [FPGAs](/wiki/fpga), reconfigurable chips that Microsoft used to serve neural networks in its pre-Maia Project Brainwave era, occupy a middle ground between processors and fixed logic. Most AI ASICs are better described as domain-specific architectures: programmable enough to run many different networks, but specialized for tensor operations rather than arbitrary code. What they cannot do is absorb a fundamental change in workload; a chip optimized for yesterday's architecture can be stranded by tomorrow's.

## How do AI ASICs differ from GPUs?

Neural networks are a near-ideal ASIC target for three reasons: training and inference are dominated by matrix multiplication, the dataflow is regular and largely known at compile time, and the arithmetic tolerates reduced precision (8-bit formats, and by 2026 4-bit floating point for inference). Google's original TPU exploited all three. Its core was a 256 x 256 [systolic array](/wiki/systolic_array) of 8-bit multiply-accumulate units delivering 92 trillion operations per second, fed from 28 MiB of software-managed on-chip memory instead of caches; Google measured it at 15 to 30 times the inference throughput of contemporary CPUs and GPUs and 30 to 80 times their performance per watt [1].

GPUs responded by becoming more ASIC-like. [NVIDIA](/wiki/nvidia) added tensor cores in 2017 and a transformer engine in 2022, and nearly all the FLOPS in a modern datacenter GPU come from matrix units. What GPUs retain is generality plus the [CUDA](/wiki/cuda) software ecosystem, which lets researchers run novel model architectures immediately. ASIC owners must build and maintain their own compilers and kernels (XLA for TPUs, Neuron for Trainium, Triton-based stacks at Meta), and immature software has sunk more custom-silicon efforts than transistors have. Memory strategy is the other structural divide: hyperscaler ASICs pair matrix engines with [high-bandwidth memory](/wiki/hbm) (HBM) much as GPUs do, while [Groq](/wiki/groq) and [Cerebras](/wiki/cerebras) instead keep model weights in on-chip SRAM, trading capacity for an order-of-magnitude gain in bandwidth and deterministic latency [8][12].

## Which companies build their own AI ASICs?

| Chip | Developer | Primary focus | Process | Status (June 2026) |
|---|---|---|---|---|
| TPU v7 "Ironwood" | Google (with Broadcom) | Training and inference | TSMC N3P | Generally available since November 2025 [2] |
| Trainium3 | AWS (with Marvell) | Training and inference | TSMC 3nm | Generally available since December 2025 [4] |
| Maia 200 | Microsoft | Inference | TSMC 3nm | Deploying in Azure since January 2026 [5] |
| MTIA 300 to 500 | Meta (with Broadcom) | Ranking and generative inference | Various | Rolling deployment through 2027 [6] |
| OpenAI XPU | OpenAI and Broadcom | Custom accelerator racks | 3nm and 2nm | First racks targeted for second half of 2026 [7] |
| LPU | Groq | Low-latency inference | 14nm (first generation) | Technology licensed to NVIDIA in December 2025 [9] |
| Sohu | Etched | Transformer-only inference | TSMC 4nm | Not publicly shipped as of mid-2026 [11] |
| WSE-3 | Cerebras | Wafer-scale training and inference | TSMC 5nm | Shipping in CS-3 systems [12] |
| Dojo D1 | Tesla | Vision model training | TSMC 7nm | Program shut down in August 2025 [15] |
| Ascend 910C | Huawei | Training and inference | SMIC 7nm-class | Volume production for the Chinese market [13] |

Google has iterated TPUs for a decade, the longest-running program. The seventh generation, [Ironwood](/wiki/ironwood), was announced in April 2025 and reached general availability in November 2025; each chip delivers 4,614 teraFLOPS of FP8 compute with 192 GB of HBM3e at 7.4 TB/s, and pods of 9,216 chips reach 42.5 FP8 exaflops [2][3]. [Anthropic](/wiki/anthropic) has committed to using up to one million TPUs for its Claude models, a deal Google says will bring well over a gigawatt of capacity online in 2026 [3][22]. Google Cloud CEO Thomas Kurian framed the rationale in the announcement: "Anthropic's choice to significantly expand its usage of TPUs reflects the strong price-performance and efficiency its teams have seen with TPUs for several years" [22].

Amazon introduced [Inferentia](/wiki/inferentia) for inference in 2018 and [Trainium](/wiki/trainium) for training in 2020. Trainium3, launched at re:Invent in December 2025 on TSMC's N3P 3nm process, provides about 2.52 petaflops of FP8 per chip and 144 GB of HBM3e; a 144-chip UltraServer reaches 362 petaflops, roughly 4.4 times its Trainium2 predecessor and about 4 times better performance per watt [4]. AWS says it has deployed more than one million Trainium chips, including the Project Rainier cluster built for Anthropic, and its planned Trainium4 will support NVIDIA's NVLink Fusion interconnect [4][13].

Microsoft announced its first accelerator, [Maia](/wiki/maia_100) 100, in November 2023, and after a reported slip in its silicon roadmap [20] launched Maia 200 in January 2026: a TSMC 3nm inference chip with 140 billion transistors, 216 GB of HBM3e at 7 TB/s, 272 MB of on-chip SRAM, and native FP4 throughput above 10 petaflops (more than 5 petaflops in FP8) within a 750-watt envelope, which Microsoft says delivers 30 percent better performance per dollar than its existing fleet and serves GPT-5.2 and Copilot workloads [5][13].

Meta's [MTIA](/wiki/mtia) line began in 2023 as a 7nm inference chip for recommendation models; hundreds of thousands are deployed for Facebook and Instagram ranking, and in March 2026 Meta outlined four further generations (MTIA 300, 400, 450, and 500) for deployment through 2027, designed around the premise that HBM bandwidth is the binding constraint on inference [6][13].

OpenAI announced its own program in October 2025: it designs the accelerators and rack systems, Broadcom co-develops and deploys them, and the partners target 10 gigawatts of Ethernet-networked capacity deployed from late 2026 through 2029 [7]. OpenAI CEO Sam Altman said the effort lets the company "embed what it's learned from developing frontier models and products directly into the hardware," adding that "developing our own accelerators adds to the broader ecosystem of partners all building the capacity required to push the frontier of AI" [7]. OpenAI simultaneously buys NVIDIA and AMD GPUs and, since January 2026, Cerebras systems [14].

Among startups, Groq, founded in 2016 by TPU veteran [Jonathan Ross](/wiki/jonathan_ross), built the Language Processing Unit, a single-core deterministic processor that stores weights in up to 230 MB of on-chip SRAM at roughly 80 TB/s and schedules every cycle at compile time [8]. In December 2025 NVIDIA agreed to pay about $20 billion, its largest transaction ever, for Groq's assets and a non-exclusive license to its inference technology, with Ross and other leaders joining NVIDIA [9]. [Etched](/wiki/etched) took the most extreme specialization bet: its Sohu chip, on TSMC 4nm, runs only transformer models, and the company claims an eight-chip server can replace about 160 H100 GPUs at over 500,000 tokens per second on Llama 70B [10]. Those claims remained unverified as of mid-2026, with no public shipments or third-party benchmarks despite a reported valuation around $5 billion [11]. Cerebras takes the opposite scaling approach: its WSE-3 is a single 46,225 mm2 wafer-scale chip with four trillion transistors, 900,000 cores, 44 GB of SRAM, and 125 FP16 petaflops [12].

Not every program survives. [Tesla](/wiki/tesla) disbanded its [Dojo](/wiki/dojo) supercomputer team in August 2025, with Elon Musk calling the planned Dojo 2 "an evolutionary dead end" as the company refocused on its AI5 and AI6 chips with external partners [15]. In China, [Huawei](/wiki/huawei)'s Ascend 910C fills the niche under US export controls, reportedly targeting 600,000 units in 2026 despite yield constraints [13].

## Who designs custom AI chips: Broadcom and Marvell

Few ASIC owners design chips alone. In the prevailing model the customer owns the architecture while a partner supplies SerDes and interconnect IP, physical design, advanced packaging, and manufacturing slots at [TSMC](/wiki/tsmc). Broadcom, which co-designs Google's TPUs and Meta's MTIA and builds OpenAI's XPUs, dominates this business: it reported $8.4 billion of AI revenue in its first fiscal quarter of 2026, up 106 percent year over year, against a $73 billion AI backlog, with named or reported customers including Google, Meta, ByteDance, and OpenAI [13]. CEO Hock Tan told investors the company has "line of sight" to more than $100 billion in AI chip revenue in 2027 [13]. [Marvell](/wiki/marvell), the clear number two, works with AWS on Trainium and with Microsoft, and together the two firms control roughly 95 percent of custom AI ASIC co-design [13][21]. Taiwanese design-service houses such as [Alchip](/wiki/alchip) and Global Unichip provide turnkey physical design and tape-out services for hyperscaler programs at 5nm and 3nm [21].

## Why do hyperscalers build their own AI chips?

The case for an AI ASIC is a volume calculation. Non-recurring engineering (NRE) costs at leading-edge nodes are severe: a single 3nm tape-out runs on the order of $100 million, and full program costs including IP licensing, verification, and software reach hundreds of millions of dollars, climbing further at 2nm [18][19]. Against that stands the cost of buying merchant GPUs from NVIDIA at gross margins around 70 to 75 percent. For a hyperscaler deploying hundreds of thousands of accelerators, in-house silicon priced near manufacturing cost can undercut that markup decisively: Google claims roughly 44 percent lower total cost of ownership per Ironwood chip than NVIDIA's GB200, and Microsoft claims a 30 percent performance-per-dollar advantage for Maia 200, according to figures the companies published or analysts compiled in 2026 [5][13]. Custom chips also offer supply security and the ability to tune silicon to a known workload, as with Meta's bandwidth-first MTIA roadmap [6]. The same arithmetic excludes smaller buyers, which is why custom ASICs remain a hyperscaler and frontier-lab phenomenon.

## How is an AI ASIC designed and built?

An AI ASIC takes roughly two to three years from specification to deployment, though Google's first TPU went from design start to datacenter in about 15 months [1]. Architects fix the datapath and memory hierarchy; engineers express the design in register-transfer level (RTL) code, typically SystemVerilog; functional verification consumes the largest share of engineering effort; and synthesis plus physical design (floorplanning, placement, routing, timing closure) produce the layout that is taped out to a foundry, almost always TSMC for leading AI chips, then packaged with HBM stacks using advanced techniques such as CoWoS before silicon bring-up and software enablement.

AI increasingly assists this flow. Commercial tools like Synopsys DSO.ai and Cadence Cerebrus apply machine learning to design-space exploration, and Google's [AlphaChip](/wiki/alphachip) uses reinforcement learning for floorplanning; published in Nature in 2021, it has produced layouts used in three TPU generations and Google's Axion CPUs [16][17], though its benchmark claims have been debated within the academic EDA community.

## Will custom ASICs challenge NVIDIA?

The momentum behind AI ASICs reflects a structural shift toward inference, where workloads are stable enough to justify specialization, and toward buyers large enough to fund it. NVIDIA has adapted rather than ceded ground: its NVLink Fusion program lets custom chips such as Trainium4 join NVIDIA-style rack architectures [4], and its $20 billion Groq transaction folded the most prominent inference-ASIC architecture into its own roadmap [9]. The risks for ASIC builders are equally clear: model architectures can shift beneath a multi-year design (the failure mode Etched's transformer-only bet makes explicit), software maturity lags CUDA, HBM and advanced-packaging supply are contested, and execution slips are costly, as Tesla's Dojo shutdown and Microsoft's roadmap delays showed [15][20]. As of mid-2026 the consensus trajectory is coexistence: NVIDIA retains roughly 70 percent of the accelerator market while custom silicon grows faster from a smaller base, concentrated in the handful of companies whose AI bills are measured in gigawatts [13].

## References

1. Jouppi, N. P., et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit." ISCA 2017. https://arxiv.org/abs/1704.04760
2. Google Cloud Blog. "Ironwood TPUs and new Axion-based VMs for your AI workloads." November 6, 2025. https://cloud.google.com/blog/products/compute/ironwood-tpus-and-new-axion-based-vms-for-your-ai-workloads
3. CNBC. "Google unveils Ironwood, seventh generation TPU, competing with Nvidia." November 6, 2025. https://www.cnbc.com/2025/11/06/google-unveils-ironwood-seventh-generation-tpu-competing-with-nvidia.html
4. TechCrunch. "Amazon releases an impressive new AI chip and teases an Nvidia-friendly roadmap." December 2, 2025. https://techcrunch.com/2025/12/02/amazon-releases-an-impressive-new-ai-chip-and-teases-a-nvidia-friendly-roadmap/
5. Microsoft. "Maia 200: The AI accelerator built for inference." January 26, 2026. https://blogs.microsoft.com/blog/2026/01/26/maia-200-the-ai-accelerator-built-for-inference/
6. Meta AI. "Four MTIA Chips in Two Years: Scaling AI Experiences for Billions." https://ai.meta.com/blog/meta-mtia-scale-ai-chips-for-billions/
7. OpenAI. "OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators." October 13, 2025. https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration/
8. Groq. "What is a Language Processing Unit?" https://groq.com/blog/the-groq-lpu-explained
9. CNBC. "Nvidia buying AI chip startup Groq's assets for about $20 billion in its largest deal on record." December 24, 2025. https://www.cnbc.com/2025/12/24/nvidia-buying-ai-chip-startup-groq-for-about-20-billion-biggest-deal.html
10. TechCrunch. "Etched is building an AI chip that only runs one type of model." June 25, 2024. https://techcrunch.com/2024/06/25/etched-is-building-an-ai-chip-that-only-runs-transformer-models/
11. Spheron. "Etched AI Sohu vs NVIDIA: Transformer ASIC vs General-Purpose GPU for LLM Inference." 2026. https://www.spheron.network/blog/etched-ai-sohu-vs-nvidia-transformer-asic-inference/
12. Cerebras. "Cerebras Systems Unveils World's Fastest AI Chip with Whopping 4 Trillion Transistors." March 2024. https://www.cerebras.ai/press-release/cerebras-announces-third-generation-wafer-scale-engine
13. Tom's Hardware. "The custom AI ASIC state of play (May 2026)." May 2026. https://www.tomshardware.com/tech-industry/semiconductors/custom-ai-asics-examined-from-broadcom-to-mtia
14. CNBC. "OpenAI chip deal with Cerebras adds to roster of Nvidia, AMD, Broadcom." January 16, 2026. https://www.cnbc.com/2026/01/16/openai-chip-deal-with-cerebras-adds-to-roster-of-nvidia-amd-broadcom.html
15. TechCrunch. "Elon Musk confirms shutdown of Tesla Dojo, 'an evolutionary dead end'." August 11, 2025. https://techcrunch.com/2025/08/11/elon-musk-confirms-shutdown-of-tesla-dojo-an-evolutionary-dead-end/
16. Google DeepMind. "How AlphaChip transformed computer chip design." September 2024. https://deepmind.google/blog/how-alphachip-transformed-computer-chip-design/
17. Mirhoseini, A., Goldie, A., et al. "A graph placement methodology for fast chip design." Nature 594, 207-212 (2021). https://www.nature.com/articles/s41586-021-03544-w
18. Semiconductor Engineering. "What Will That Chip Cost?" https://semiengineering.com/what-will-that-chip-cost/
19. AnySilicon. "ASIC NRE Explained: What You're Actually Paying For." https://anysilicon.com/asic-nre-explained/
20. Data Center Dynamics. "Microsoft delays production of next-generation Maia AI chip to 2026 - report." https://www.datacenterdynamics.com/en/news/microsoft-delays-production-of-maia-100-ai-chip-to-2026-report/
21. Silicon Analysts. "AI Data Center Value Chain: Every Layer from Chips to Cloud." 2026. https://siliconanalysts.com/research/ai-data-center-value-chain
22. Anthropic. "Expanding our use of Google Cloud TPUs and Services." October 23, 2025. https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
