# Pathways (Google AI)

> Source: https://aiwiki.ai/wiki/pathways
> Updated: 2026-06-03
> Categories: AI Infrastructure, AI Research, Google
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Pathways** is the name Google has used for two related but distinct things in its artificial-intelligence work: a research *vision* for a next-generation AI architecture, first articulated by [Jeff Dean](/wiki/jeff_dean) in October 2021, and a large-scale orchestration *system* (described in a 2022 systems paper) that runs machine-learning computations across thousands of accelerator chips. The two senses are easy to confuse because they share a name and overlapping authors, but they operate at different levels. The vision is about what future models should look like; the system is the software plumbing that makes very large models trainable in practice. The name later resurfaced again as "Pathways on Cloud," a Google Cloud product.[1][2][3]

## Background

By the early 2020s, the dominant pattern in deep learning was to train a separate model for each task, with each model typically handling a single modality such as text, images, or speech. These models were usually "dense," meaning the entire network was activated for every input regardless of the task. Google argued that this approach was inefficient and limited, and it framed Pathways as a response to those limitations. The articulation of that argument came from [Jeff Dean](/wiki/jeff_dean), at the time a Google Senior Fellow and Senior Vice President of Google Research; he had earlier co-founded [Google Brain](/wiki/google_brain) and was a principal designer of systems such as MapReduce, DistBelief, and TensorFlow.[1][7]

## The Pathways vision

The vision was set out in a blog post titled "Introducing Pathways: a next-generation AI architecture," published on the official Google blog on October 28, 2021. It described an aspiration rather than a finished product: "a new AI architecture that will handle many tasks at once, learn new tasks quickly and reflect a better understanding of the world."[1]

The post identified three shortcomings of contemporary AI and a corresponding goal for each:

| Limitation of contemporary AI (2021) | Pathways goal |
| --- | --- |
| Models are trained from scratch for a single task | Train one model "to do thousands or millions of things," reusing existing skills to learn new ones faster |
| Models handle one modality at a time (text, or images, or audio) | Build multimodal models spanning "vision, auditory, and language understanding simultaneously" |
| Dense networks activate the entire model for every input | Sparse activation, in which "only small pathways through the network are called into action as needed," learned dynamically per task |

Google presented sparse, conditionally activated models as both more capable and more efficient, because unused parts of the network need not be computed. The blog described such models as faster and much more energy-efficient than equivalently capable dense networks, though it did not, in its primary text, attach a single headline efficiency figure.[1] In the months that followed, Google published research it positioned as steps toward this vision, including sparse mixture-of-experts work such as V-MoE for vision and LIMoE, a sparse mixture-of-experts model trained jointly on images and text.[8]

## The Pathways system

The Pathways *system* is a separate engineering effort: the orchestration layer that actually schedules and runs large models on hardware. It was described in the paper "Pathways: Asynchronous Distributed Dataflow for ML," posted to arXiv on March 23, 2022 and presented at the MLSys 2022 conference, where it was an oral presentation and received an Outstanding Paper Award. The authors included Paul Barham, Aakanksha Chowdhery, Michael Isard, Sanjay Ghemawat, and [Jeff Dean](/wiki/jeff_dean), among others, all from Google.[2][4]

The paper frames Pathways as "a new large scale orchestration layer for accelerators," built to support experimental systems and ML research while still matching the performance of existing production systems. Its central technical idea is a single-controller programming model layered on an asynchronous distributed dataflow design. Most high-performance ML systems of the era, including [JAX](/wiki/jax), PyTorch, and some TensorFlow configurations, used a *multi-controller* model in which the same program runs independently on every host. That arrangement gives low dispatch latency but is awkward for pipelined or sparse computations and for anything beyond standard collective communication. Pathways instead uses one client program as a central controller with a unified view of all devices, which makes complex parallelism patterns easier to express. To avoid paying a latency penalty for that flexibility, it dispatches work asynchronously, using a "sharded dataflow graph of asynchronous operators that consume and produce futures" and gang-scheduling heterogeneous computations across thousands of accelerators.[2][4][6]

A key practical result was scale. The paper reported that Pathways could reach roughly 100 percent accelerator utilization, on par with state-of-the-art systems, when running single-program-multiple-data (SPMD) computations over 2,048 TPUs, while also handling models pipelined across many stages or sharded across separate "islands" of accelerators connected by a data-center network. This let [JAX](/wiki/jax) programs scale beyond a single TPU pod for the first time, coordinating computation over both the fast intra-pod interconnect and the slower inter-pod data-center network.[2][5][6]

## Use in training PaLM

The system's most visible early payoff was PaLM. The model's name is an acronym for Pathways Language Model, and the connection is direct: PaLM was trained using the Pathways system. The paper "PaLM: Scaling Language Modeling with Pathways," led by Aakanksha Chowdhery and colleagues and posted to arXiv in April 2022, describes "a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM," trained "on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods."[3]

It is worth noting a nuance that the shared name can obscure. PaLM is a *dense* model, not the sparsely activated, multimodal system imagined in the 2021 vision. What it demonstrated was the Pathways *system's* ability to coordinate one enormous training run across hardware, specifically across two Cloud TPU v4 pods, using data parallelism between the pods and standard data and model parallelism within each pod. Google reported a training efficiency of 57.8 percent hardware FLOPs utilization, which it described as the highest then achieved for a large language model at that scale, and reported state-of-the-art few-shot results across many benchmarks.[3]

| Aspect | Detail |
| --- | --- |
| Full name | Pathways Language Model |
| Parameters | 540 billion |
| Architecture | Dense (densely activated) decoder-only Transformer |
| Hardware | 6,144 TPU v4 chips across two Cloud TPU v4 pods |
| Training software | The Pathways system |
| Reported hardware FLOPs utilization | 57.8% |

## Later developments (Cloud)

Pathways was originally an internal Google system, and Google has said it continued to use it to train large models including later generations such as Gemini. The name re-entered public view as a commercial product, "Pathways on Cloud," made available to Google Cloud customers as part of the company's AI Hypercomputer stack and announced around Google Cloud Next 2025. In that setting Pathways is described as a single-controller runtime that lets a single [JAX](/wiki/jax) client orchestrate workloads across multiple large [Cloud TPU](/wiki/cloud_tpu) slices spanning thousands of chips, with emphasis on multihost inference, disaggregated serving (scaling the prefill and decode stages of inference independently), resilient training, and interactive development.[9][10]

## Significance

The lasting confusion around Pathways is also a useful reminder of how it should be read. As a *vision*, Pathways named an agenda that the broader field would pursue over the following years: larger, more general, multimodal, and often sparsely activated models, rather than fleets of narrow single-task systems. As a *system*, Pathways was a concrete piece of infrastructure whose chief contribution was showing that a single-controller, asynchronous-dataflow design could match multi-controller performance while scaling training across multiple TPU pods, an approach validated by training PaLM. The fully realized vision (one sparse, multimodal model doing millions of tasks) and the system that trained a dense 540-billion-parameter model are not the same thing, even though they share a name and a lineage at Google.[1][2][3]

## References

1. Jeff Dean, "Introducing Pathways: a next-generation AI architecture," [The Keyword (Google blog)](https://blog.google/innovation-and-ai/products/introducing-pathways-next-generation-ai-architecture/), October 28, 2021.
2. Barham, P., Chowdhery, A., Dean, J., et al., "Pathways: Asynchronous Distributed Dataflow for ML," [arXiv:2203.12533](https://arxiv.org/abs/2203.12533), March 23, 2022.
3. Chowdhery, A., Narang, S., Devlin, J., et al., "PaLM: Scaling Language Modeling with Pathways," [arXiv:2204.02311](https://arxiv.org/abs/2204.02311), April 2022.
4. "Pathways: Asynchronous Distributed Dataflow for ML" (Oral, Outstanding Paper Award), [MLSys 2022](https://mlsys.org/virtual/2022/oral/2146).
5. "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance," [Google Research blog](https://research.google/blog/pathways-language-model-palm-scaling-to-540-billion-parameters-for-breakthrough-performance/).
6. "Pathways: Asynchronous Distributed Dataflow for ML" (full paper PDF), [MLSys 2022 Proceedings](https://proceedings.mlsys.org/paper_files/paper/2022/file/37385144cac01dff38247ab11c119e3c-Paper.pdf).
7. "Jeffrey Dean," [Google Research](https://research.google/people/jeff/).
8. "LIMoE: Learning Multiple Modalities with One Sparse Mixture-of-Experts Model," [Google Research blog](https://research.google/blog/limoe-learning-multiple-modalities-with-one-sparse-mixture-of-experts-model/).
9. "Introduction to Pathways on Cloud," [Google Cloud Documentation](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro).
10. "AI Hypercomputer inference updates for Google Cloud TPU and GPU," [Google Cloud Blog](https://cloud.google.com/blog/products/compute/ai-hypercomputer-inference-updates-for-google-cloud-tpu-and-gpu).

