Pathways (Google AI)
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,438 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,438 words
Add missing citations, update stale details, or suggest a clearer explanation.
Pathways is the name Google has used for two related but distinct things in its artificial-intelligence work: a research vision for a next-generation AI architecture, first articulated by Jeff Dean in October 2021, and a large-scale orchestration system (described in a 2022 systems paper) that runs machine-learning computations across thousands of accelerator chips. The two senses are easy to confuse because they share a name and overlapping authors, but they operate at different levels. The vision is about what future models should look like; the system is the software plumbing that makes very large models trainable in practice. The name later resurfaced again as "Pathways on Cloud," a Google Cloud product.[1][2][3]
By the early 2020s, the dominant pattern in deep learning was to train a separate model for each task, with each model typically handling a single modality such as text, images, or speech. These models were usually "dense," meaning the entire network was activated for every input regardless of the task. Google argued that this approach was inefficient and limited, and it framed Pathways as a response to those limitations. The articulation of that argument came from Jeff Dean, at the time a Google Senior Fellow and Senior Vice President of Google Research; he had earlier co-founded Google Brain and was a principal designer of systems such as MapReduce, DistBelief, and TensorFlow.[1][7]
The vision was set out in a blog post titled "Introducing Pathways: a next-generation AI architecture," published on the official Google blog on October 28, 2021. It described an aspiration rather than a finished product: "a new AI architecture that will handle many tasks at once, learn new tasks quickly and reflect a better understanding of the world."[1]
The post identified three shortcomings of contemporary AI and a corresponding goal for each:
| Limitation of contemporary AI (2021) | Pathways goal |
|---|---|
| Models are trained from scratch for a single task | Train one model "to do thousands or millions of things," reusing existing skills to learn new ones faster |
| Models handle one modality at a time (text, or images, or audio) | Build multimodal models spanning "vision, auditory, and language understanding simultaneously" |
| Dense networks activate the entire model for every input | Sparse activation, in which "only small pathways through the network are called into action as needed," learned dynamically per task |
Google presented sparse, conditionally activated models as both more capable and more efficient, because unused parts of the network need not be computed. The blog described such models as faster and much more energy-efficient than equivalently capable dense networks, though it did not, in its primary text, attach a single headline efficiency figure.[1] In the months that followed, Google published research it positioned as steps toward this vision, including sparse mixture-of-experts work such as V-MoE for vision and LIMoE, a sparse mixture-of-experts model trained jointly on images and text.[8]
The Pathways system is a separate engineering effort: the orchestration layer that actually schedules and runs large models on hardware. It was described in the paper "Pathways: Asynchronous Distributed Dataflow for ML," posted to arXiv on March 23, 2022 and presented at the MLSys 2022 conference, where it was an oral presentation and received an Outstanding Paper Award. The authors included Paul Barham, Aakanksha Chowdhery, Michael Isard, Sanjay Ghemawat, and Jeff Dean, among others, all from Google.[2][4]
The paper frames Pathways as "a new large scale orchestration layer for accelerators," built to support experimental systems and ML research while still matching the performance of existing production systems. Its central technical idea is a single-controller programming model layered on an asynchronous distributed dataflow design. Most high-performance ML systems of the era, including JAX, PyTorch, and some TensorFlow configurations, used a multi-controller model in which the same program runs independently on every host. That arrangement gives low dispatch latency but is awkward for pipelined or sparse computations and for anything beyond standard collective communication. Pathways instead uses one client program as a central controller with a unified view of all devices, which makes complex parallelism patterns easier to express. To avoid paying a latency penalty for that flexibility, it dispatches work asynchronously, using a "sharded dataflow graph of asynchronous operators that consume and produce futures" and gang-scheduling heterogeneous computations across thousands of accelerators.[2][4][6]
A key practical result was scale. The paper reported that Pathways could reach roughly 100 percent accelerator utilization, on par with state-of-the-art systems, when running single-program-multiple-data (SPMD) computations over 2,048 TPUs, while also handling models pipelined across many stages or sharded across separate "islands" of accelerators connected by a data-center network. This let JAX programs scale beyond a single TPU pod for the first time, coordinating computation over both the fast intra-pod interconnect and the slower inter-pod data-center network.[2][5][6]
The system's most visible early payoff was PaLM. The model's name is an acronym for Pathways Language Model, and the connection is direct: PaLM was trained using the Pathways system. The paper "PaLM: Scaling Language Modeling with Pathways," led by Aakanksha Chowdhery and colleagues and posted to arXiv in April 2022, describes "a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM," trained "on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods."[3]
It is worth noting a nuance that the shared name can obscure. PaLM is a dense model, not the sparsely activated, multimodal system imagined in the 2021 vision. What it demonstrated was the Pathways system's ability to coordinate one enormous training run across hardware, specifically across two Cloud TPU v4 pods, using data parallelism between the pods and standard data and model parallelism within each pod. Google reported a training efficiency of 57.8 percent hardware FLOPs utilization, which it described as the highest then achieved for a large language model at that scale, and reported state-of-the-art few-shot results across many benchmarks.[3]
| Aspect | Detail |
|---|---|
| Full name | Pathways Language Model |
| Parameters | 540 billion |
| Architecture | Dense (densely activated) decoder-only Transformer |
| Hardware | 6,144 TPU v4 chips across two Cloud TPU v4 pods |
| Training software | The Pathways system |
| Reported hardware FLOPs utilization | 57.8% |
Pathways was originally an internal Google system, and Google has said it continued to use it to train large models including later generations such as Gemini. The name re-entered public view as a commercial product, "Pathways on Cloud," made available to Google Cloud customers as part of the company's AI Hypercomputer stack and announced around Google Cloud Next 2025. In that setting Pathways is described as a single-controller runtime that lets a single JAX client orchestrate workloads across multiple large Cloud TPU slices spanning thousands of chips, with emphasis on multihost inference, disaggregated serving (scaling the prefill and decode stages of inference independently), resilient training, and interactive development.[9][10]
The lasting confusion around Pathways is also a useful reminder of how it should be read. As a vision, Pathways named an agenda that the broader field would pursue over the following years: larger, more general, multimodal, and often sparsely activated models, rather than fleets of narrow single-task systems. As a system, Pathways was a concrete piece of infrastructure whose chief contribution was showing that a single-controller, asynchronous-dataflow design could match multi-controller performance while scaling training across multiple TPU pods, an approach validated by training PaLM. The fully realized vision (one sparse, multimodal model doing millions of tasks) and the system that trained a dense 540-billion-parameter model are not the same thing, even though they share a name and a lineage at Google.[1][2][3]