Pathways (Google AI)

AI Infrastructure AI Research Google

8 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v2 · 1,612 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Pathways is the name Google has used for two related but distinct things in its artificial-intelligence work: a research vision for a next-generation AI architecture, first articulated by Jeff Dean in October 2021, and a large-scale orchestration system (described in a 2022 systems paper by Paul Barham and colleagues) that runs machine-learning computations across thousands of accelerator chips. The vision describes what future models should look like (one model that does many tasks, multimodal, sparsely activated); the system is the software plumbing that made very large models such as PaLM (540 billion parameters, trained on 6,144 TPU v4 chips) trainable in practice.^[1]^[2]^[3] The name later resurfaced as "Pathways on Cloud," a Google Cloud product.^[9]^[10]

What is Pathways?

Pathways is, in its original 2021 sense, Google's proposed architecture for a single AI model that can "handle many tasks at once, learn new tasks quickly and reflect a better understanding of the world," rather than the prevailing practice of training one narrow model per task.^[1] In its 2022 systems sense, Pathways is "a new large scale orchestration layer for accelerators" that coordinates a single training run across many TPU pods.^[2] The two senses share a name and overlapping authors at Google, but they operate at different levels: the vision is the agenda, and the system is the infrastructure. PaLM, whose name is an acronym for Pathways Language Model, was trained on the system but is itself a dense model, not the sparse, multimodal model imagined by the vision.^[3]

Background

By the early 2020s, the dominant pattern in deep learning was to train a separate model for each task, with each model typically handling a single modality such as text, images, or speech. These models were usually "dense," meaning the entire network was activated for every input regardless of the task. Google argued that this approach was inefficient and limited, and it framed Pathways as a response to those limitations. The articulation of that argument came from Jeff Dean, at the time a Google Senior Fellow and Senior Vice President of Google Research; he had earlier co-founded Google Brain and was a principal designer of systems such as MapReduce, DistBelief, and TensorFlow.^[1]^[7]

What is the Pathways vision?

The vision was set out in a blog post titled "Introducing Pathways: a next-generation AI architecture," published on the official Google blog (The Keyword) on October 28, 2021, and authored by Jeff Dean. It described an aspiration rather than a finished product: "a new AI architecture that will handle many tasks at once, learn new tasks quickly and reflect a better understanding of the world."^[1]

The post identified three shortcomings of contemporary AI and a corresponding goal for each:

Limitation of contemporary AI (2021)	Pathways goal
Models are trained from scratch for a single task	Train a single model "to do thousands or millions of things," reusing existing skills to learn new ones faster
Models handle one modality at a time (text, or images, or audio)	Build "multimodal models that encompass vision, auditory, and language understanding simultaneously"
Dense networks activate the entire model for every input	Sparse activation, in which "only small pathways through the network are called into action as needed," learned dynamically per task

Google presented sparse, conditionally activated models as both more capable and more efficient, because unused parts of the network need not be computed. The blog described such models as faster and much more energy-efficient than equivalently capable dense networks, though it did not, in its primary text, attach a single headline efficiency figure.^[1] In the months that followed, Google published research it positioned as steps toward this vision, including sparse mixture-of-experts work such as V-MoE for vision and LIMoE, a sparse mixture-of-experts model trained jointly on images and text.^[8]

What is the Pathways system?

The Pathways system is a separate engineering effort: the orchestration layer that actually schedules and runs large models on hardware. It was described in the paper "Pathways: Asynchronous Distributed Dataflow for ML," submitted to arXiv on March 23, 2022 and presented at the MLSys 2022 conference, where it was an oral presentation and received an Outstanding Paper Award. The authors included Paul Barham, Aakanksha Chowdhery, Michael Isard, Sanjay Ghemawat, and Jeff Dean, among 16 in total, all from Google.^[2]^[4]

The paper frames Pathways as "a new large scale orchestration layer for accelerators," built to support experimental systems and ML research while still matching the performance of existing production systems. Its central technical idea is a single-controller programming model layered on "a novel asynchronous distributed dataflow design that lets the control plane execute in parallel despite dependencies in the data plane."^[2] Most high-performance ML systems of the era, including JAX, PyTorch, and some TensorFlow configurations, used a multi-controller model in which the same program runs independently on every host. That arrangement gives low dispatch latency but is awkward for pipelined or sparse computations and for anything beyond standard collective communication. Pathways instead uses one client program as a central controller with a unified view of all devices, which makes complex parallelism patterns easier to express. To avoid paying a latency penalty for that flexibility, it dispatches work asynchronously, using a "sharded dataflow graph of asynchronous operators that consume and produce futures" and gang-scheduling heterogeneous computations across thousands of accelerators.^[2]^[4]^[6]

A key practical result was scale. The paper reported that Pathways achieved "performance parity (~100% accelerator utilization)" with state-of-the-art systems when running single-program-multiple-data (SPMD) computations over 2,048 TPUs, while also handling models pipelined across many stages or sharded across separate "islands" of accelerators connected by a data-center network.^[2] This let JAX programs scale beyond a single TPU pod for the first time, coordinating computation over both the fast intra-pod interconnect and the slower inter-pod data-center network.^[5]^[6]

What did Pathways train?

The system's most visible early payoff was PaLM. The model's name is an acronym for Pathways Language Model, and the connection is direct: PaLM was trained using the Pathways system. The paper "PaLM: Scaling Language Modeling with Pathways," led by Aakanksha Chowdhery and colleagues and posted to arXiv in April 2022, describes "a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM," trained "on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods."^[3]

It is worth noting a nuance that the shared name can obscure. PaLM is a dense model, not the sparsely activated, multimodal system imagined in the 2021 vision. What it demonstrated was the Pathways system's ability to coordinate one enormous training run across hardware, specifically across two Cloud TPU v4 Pods, using data parallelism between the pods and standard data and model parallelism within each pod. Google reported that PaLM "achieves a training efficiency of 57.8% hardware FLOPs utilization, the highest yet achieved for LLMs at this scale," and reported state-of-the-art few-shot results across many benchmarks.^[3]^[5]

Aspect	Detail
Full name	Pathways Language Model
Parameters	540 billion
Architecture	Dense (densely activated) decoder-only Transformer
Hardware	6,144 TPU v4 chips across two Cloud TPU v4 Pods
Training software	The Pathways system
Reported hardware FLOPs utilization	57.8%
Paper	Chowdhery et al., arXiv:2204.02311 (April 2022)

Where is Pathways used now (Cloud)?

Pathways was originally an internal Google system, and Google has said it continued to use it to train large models including later generations such as Gemini. The name re-entered public view as a commercial product, "Pathways on Cloud," made available to Google Cloud customers as part of the company's AI Hypercomputer stack and announced around Google Cloud Next 2025. In that setting Pathways is described as a single-controller runtime that lets a single JAX client orchestrate workloads across multiple large Cloud TPU slices spanning thousands of chips, with emphasis on multihost inference, disaggregated serving (scaling the prefill and decode stages of inference independently), resilient training, and interactive development.^[9]^[10]

Why does Pathways matter?

The lasting confusion around Pathways is also a useful reminder of how it should be read. As a vision, Pathways named an agenda that the broader field would pursue over the following years: larger, more general, multimodal, and often sparsely activated models, rather than fleets of narrow single-task systems. As a system, Pathways was a concrete piece of infrastructure whose chief contribution was showing that a single-controller, asynchronous-dataflow design could match multi-controller performance while scaling training across multiple TPU pods, an approach validated by training PaLM. The fully realized vision (one sparse, multimodal model doing millions of tasks) and the system that trained a dense 540-billion-parameter model are not the same thing, even though they share a name and a lineage at Google.^[1]^[2]^[3]

References

Jeff Dean, "Introducing Pathways: a next-generation AI architecture," The Keyword (Google blog), October 28, 2021. ↩
Barham, P., Chowdhery, A., Dean, J., et al., "Pathways: Asynchronous Distributed Dataflow for ML," arXiv:2203.12533, March 23, 2022. ↩
Chowdhery, A., Narang, S., Devlin, J., et al., "PaLM: Scaling Language Modeling with Pathways," arXiv:2204.02311, April 2022. ↩
"Pathways: Asynchronous Distributed Dataflow for ML" (Oral, Outstanding Paper Award), MLSys 2022. ↩
"Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance," Google Research blog. ↩
"Pathways: Asynchronous Distributed Dataflow for ML" (full paper PDF), MLSys 2022 Proceedings. ↩
"Jeffrey Dean," Google Research. ↩
"LIMoE: Learning Multiple Modalities with One Sparse Mixture-of-Experts Model," Google Research blog. ↩
"Introduction to Pathways on Cloud," Google Cloud Documentation. ↩
"AI Hypercomputer inference updates for Google Cloud TPU and GPU," Google Cloud Blog. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Jeff Dean Parti (text-to-image model)TPU v4

What is Pathways?

Background

What is the Pathways vision?

What is the Pathways system?

What did Pathways train?

Where is Pathways used now (Cloud)?

Why does Pathways matter?

References

Improve this article

Related Articles

Noam Shazeer

Quoc V. Le

Machine learning terms/Google Cloud

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

What links here

Related Articles

Noam Shazeer

Quoc V. Le

Machine learning terms/Google Cloud

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

What links here