ASIC
Last reviewed
Sources
22 citations
Review status
Source-backed
Revision
v2 ยท 2,503 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
22 citations
Review status
Source-backed
Revision
v2 ยท 2,503 words
Add missing citations, update stale details, or suggest a clearer explanation.
An application-specific integrated circuit (ASIC) is a chip designed to perform one job, in contrast to a general-purpose CPU or a broadly programmable GPU. By committing a single workload to fixed silicon, an ASIC gives up flexibility in exchange for higher throughput, lower energy per operation, and lower unit cost at volume. In artificial intelligence, the term has become shorthand for the custom AI accelerator chips built around the dense matrix arithmetic of neural networks. Google's Tensor Processing Unit (TPU), serving production traffic since 2015, proved the approach at datacenter scale: Google measured the first TPU at 15 to 30 times the inference throughput of contemporary CPUs and GPUs and 30 to 80 times their performance per watt [1]. By 2026 every US hyperscaler fields in-house AI silicon, OpenAI has commissioned 10 gigawatts of its own accelerators through Broadcom [7], and industry projections cited in May 2026 put custom ASICs at 27.8 percent of AI accelerator shipments for the year, growing nearly three times as fast as merchant GPUs [13].
An ASIC implements its target computation directly in dedicated logic, recovering the overhead that general-purpose processors spend on instruction decode, cache hierarchies, and dynamic scheduling. The category dates to the 1980s gate-array era, but the economics have always been the defining feature: design costs are enormous and fixed, so custom silicon pays off only when a workload is large, stable, and predictable enough to amortize them.
Modern AI accelerators sit on a spectrum rather than at a binary extreme. FPGAs, reconfigurable chips that Microsoft used to serve neural networks in its pre-Maia Project Brainwave era, occupy a middle ground between processors and fixed logic. Most AI ASICs are better described as domain-specific architectures: programmable enough to run many different networks, but specialized for tensor operations rather than arbitrary code. What they cannot do is absorb a fundamental change in workload; a chip optimized for yesterday's architecture can be stranded by tomorrow's.
Neural networks are a near-ideal ASIC target for three reasons: training and inference are dominated by matrix multiplication, the dataflow is regular and largely known at compile time, and the arithmetic tolerates reduced precision (8-bit formats, and by 2026 4-bit floating point for inference). Google's original TPU exploited all three. Its core was a 256 x 256 systolic array of 8-bit multiply-accumulate units delivering 92 trillion operations per second, fed from 28 MiB of software-managed on-chip memory instead of caches; Google measured it at 15 to 30 times the inference throughput of contemporary CPUs and GPUs and 30 to 80 times their performance per watt [1].
GPUs responded by becoming more ASIC-like. NVIDIA added tensor cores in 2017 and a transformer engine in 2022, and nearly all the FLOPS in a modern datacenter GPU come from matrix units. What GPUs retain is generality plus the CUDA software ecosystem, which lets researchers run novel model architectures immediately. ASIC owners must build and maintain their own compilers and kernels (XLA for TPUs, Neuron for Trainium, Triton-based stacks at Meta), and immature software has sunk more custom-silicon efforts than transistors have. Memory strategy is the other structural divide: hyperscaler ASICs pair matrix engines with high-bandwidth memory (HBM) much as GPUs do, while Groq and Cerebras instead keep model weights in on-chip SRAM, trading capacity for an order-of-magnitude gain in bandwidth and deterministic latency [8][12].
| Chip | Developer | Primary focus | Process | Status (June 2026) |
|---|---|---|---|---|
| TPU v7 "Ironwood" | Google (with Broadcom) | Training and inference | TSMC N3P | Generally available since November 2025 [2] |
| Trainium3 | AWS (with Marvell) | Training and inference | TSMC 3nm | Generally available since December 2025 [4] |
| Maia 200 | Microsoft | Inference | TSMC 3nm | Deploying in Azure since January 2026 [5] |
| MTIA 300 to 500 | Meta (with Broadcom) | Ranking and generative inference | Various | Rolling deployment through 2027 [6] |
| OpenAI XPU | OpenAI and Broadcom | Custom accelerator racks | 3nm and 2nm | First racks targeted for second half of 2026 [7] |
| LPU | Groq | Low-latency inference | 14nm (first generation) | Technology licensed to NVIDIA in December 2025 [9] |
| Sohu | Etched | Transformer-only inference | TSMC 4nm | Not publicly shipped as of mid-2026 [11] |
| WSE-3 | Cerebras | Wafer-scale training and inference | TSMC 5nm | Shipping in CS-3 systems [12] |
| Dojo D1 | Tesla | Vision model training | TSMC 7nm | Program shut down in August 2025 [15] |
| Ascend 910C | Huawei | Training and inference | SMIC 7nm-class | Volume production for the Chinese market [13] |
Google has iterated TPUs for a decade, the longest-running program. The seventh generation, Ironwood, was announced in April 2025 and reached general availability in November 2025; each chip delivers 4,614 teraFLOPS of FP8 compute with 192 GB of HBM3e at 7.4 TB/s, and pods of 9,216 chips reach 42.5 FP8 exaflops [2][3]. Anthropic has committed to using up to one million TPUs for its Claude models, a deal Google says will bring well over a gigawatt of capacity online in 2026 [3][22]. Google Cloud CEO Thomas Kurian framed the rationale in the announcement: "Anthropic's choice to significantly expand its usage of TPUs reflects the strong price-performance and efficiency its teams have seen with TPUs for several years" [22].
Amazon introduced Inferentia for inference in 2018 and Trainium for training in 2020. Trainium3, launched at re:Invent in December 2025 on TSMC's N3P 3nm process, provides about 2.52 petaflops of FP8 per chip and 144 GB of HBM3e; a 144-chip UltraServer reaches 362 petaflops, roughly 4.4 times its Trainium2 predecessor and about 4 times better performance per watt [4]. AWS says it has deployed more than one million Trainium chips, including the Project Rainier cluster built for Anthropic, and its planned Trainium4 will support NVIDIA's NVLink Fusion interconnect [4][13].
Microsoft announced its first accelerator, Maia 100, in November 2023, and after a reported slip in its silicon roadmap [20] launched Maia 200 in January 2026: a TSMC 3nm inference chip with 140 billion transistors, 216 GB of HBM3e at 7 TB/s, 272 MB of on-chip SRAM, and native FP4 throughput above 10 petaflops (more than 5 petaflops in FP8) within a 750-watt envelope, which Microsoft says delivers 30 percent better performance per dollar than its existing fleet and serves GPT-5.2 and Copilot workloads [5][13].
Meta's MTIA line began in 2023 as a 7nm inference chip for recommendation models; hundreds of thousands are deployed for Facebook and Instagram ranking, and in March 2026 Meta outlined four further generations (MTIA 300, 400, 450, and 500) for deployment through 2027, designed around the premise that HBM bandwidth is the binding constraint on inference [6][13].
OpenAI announced its own program in October 2025: it designs the accelerators and rack systems, Broadcom co-develops and deploys them, and the partners target 10 gigawatts of Ethernet-networked capacity deployed from late 2026 through 2029 [7]. OpenAI CEO Sam Altman said the effort lets the company "embed what it's learned from developing frontier models and products directly into the hardware," adding that "developing our own accelerators adds to the broader ecosystem of partners all building the capacity required to push the frontier of AI" [7]. OpenAI simultaneously buys NVIDIA and AMD GPUs and, since January 2026, Cerebras systems [14].
Among startups, Groq, founded in 2016 by TPU veteran Jonathan Ross, built the Language Processing Unit, a single-core deterministic processor that stores weights in up to 230 MB of on-chip SRAM at roughly 80 TB/s and schedules every cycle at compile time [8]. In December 2025 NVIDIA agreed to pay about $20 billion, its largest transaction ever, for Groq's assets and a non-exclusive license to its inference technology, with Ross and other leaders joining NVIDIA [9]. Etched took the most extreme specialization bet: its Sohu chip, on TSMC 4nm, runs only transformer models, and the company claims an eight-chip server can replace about 160 H100 GPUs at over 500,000 tokens per second on Llama 70B [10]. Those claims remained unverified as of mid-2026, with no public shipments or third-party benchmarks despite a reported valuation around $5 billion [11]. Cerebras takes the opposite scaling approach: its WSE-3 is a single 46,225 mm2 wafer-scale chip with four trillion transistors, 900,000 cores, 44 GB of SRAM, and 125 FP16 petaflops [12].
Not every program survives. Tesla disbanded its Dojo supercomputer team in August 2025, with Elon Musk calling the planned Dojo 2 "an evolutionary dead end" as the company refocused on its AI5 and AI6 chips with external partners [15]. In China, Huawei's Ascend 910C fills the niche under US export controls, reportedly targeting 600,000 units in 2026 despite yield constraints [13].
Few ASIC owners design chips alone. In the prevailing model the customer owns the architecture while a partner supplies SerDes and interconnect IP, physical design, advanced packaging, and manufacturing slots at TSMC. Broadcom, which co-designs Google's TPUs and Meta's MTIA and builds OpenAI's XPUs, dominates this business: it reported $8.4 billion of AI revenue in its first fiscal quarter of 2026, up 106 percent year over year, against a $73 billion AI backlog, with named or reported customers including Google, Meta, ByteDance, and OpenAI [13]. CEO Hock Tan told investors the company has "line of sight" to more than $100 billion in AI chip revenue in 2027 [13]. Marvell, the clear number two, works with AWS on Trainium and with Microsoft, and together the two firms control roughly 95 percent of custom AI ASIC co-design [13][21]. Taiwanese design-service houses such as Alchip and Global Unichip provide turnkey physical design and tape-out services for hyperscaler programs at 5nm and 3nm [21].
The case for an AI ASIC is a volume calculation. Non-recurring engineering (NRE) costs at leading-edge nodes are severe: a single 3nm tape-out runs on the order of $100 million, and full program costs including IP licensing, verification, and software reach hundreds of millions of dollars, climbing further at 2nm [18][19]. Against that stands the cost of buying merchant GPUs from NVIDIA at gross margins around 70 to 75 percent. For a hyperscaler deploying hundreds of thousands of accelerators, in-house silicon priced near manufacturing cost can undercut that markup decisively: Google claims roughly 44 percent lower total cost of ownership per Ironwood chip than NVIDIA's GB200, and Microsoft claims a 30 percent performance-per-dollar advantage for Maia 200, according to figures the companies published or analysts compiled in 2026 [5][13]. Custom chips also offer supply security and the ability to tune silicon to a known workload, as with Meta's bandwidth-first MTIA roadmap [6]. The same arithmetic excludes smaller buyers, which is why custom ASICs remain a hyperscaler and frontier-lab phenomenon.
An AI ASIC takes roughly two to three years from specification to deployment, though Google's first TPU went from design start to datacenter in about 15 months [1]. Architects fix the datapath and memory hierarchy; engineers express the design in register-transfer level (RTL) code, typically SystemVerilog; functional verification consumes the largest share of engineering effort; and synthesis plus physical design (floorplanning, placement, routing, timing closure) produce the layout that is taped out to a foundry, almost always TSMC for leading AI chips, then packaged with HBM stacks using advanced techniques such as CoWoS before silicon bring-up and software enablement.
AI increasingly assists this flow. Commercial tools like Synopsys DSO.ai and Cadence Cerebrus apply machine learning to design-space exploration, and Google's AlphaChip uses reinforcement learning for floorplanning; published in Nature in 2021, it has produced layouts used in three TPU generations and Google's Axion CPUs [16][17], though its benchmark claims have been debated within the academic EDA community.
The momentum behind AI ASICs reflects a structural shift toward inference, where workloads are stable enough to justify specialization, and toward buyers large enough to fund it. NVIDIA has adapted rather than ceded ground: its NVLink Fusion program lets custom chips such as Trainium4 join NVIDIA-style rack architectures [4], and its $20 billion Groq transaction folded the most prominent inference-ASIC architecture into its own roadmap [9]. The risks for ASIC builders are equally clear: model architectures can shift beneath a multi-year design (the failure mode Etched's transformer-only bet makes explicit), software maturity lags CUDA, HBM and advanced-packaging supply are contested, and execution slips are costly, as Tesla's Dojo shutdown and Microsoft's roadmap delays showed [15][20]. As of mid-2026 the consensus trajectory is coexistence: NVIDIA retains roughly 70 percent of the accelerator market while custom silicon grows faster from a smaller base, concentrated in the handful of companies whose AI bills are measured in gigawatts [13].