Ray is an open-source distributed computing framework designed for scaling artificial intelligence and Python applications. Developed at the University of California, Berkeley's RISELab, Ray provides a universal API for building distributed applications and includes a suite of libraries tailored to machine learning workloads such as model training, hyperparameter tuning, model serving, reinforcement learning, and data processing. Ray was introduced in a 2018 paper at the USENIX OSDI conference by Philipp Moritz, Robert Nishihara, and collaborators [1]. Since then, it has become one of the most widely adopted distributed computing frameworks in the AI industry, with over 237 million downloads and more than 41,000 stars on GitHub as of early 2026. Ray is licensed under the Apache License 2.0 and joined the PyTorch Foundation in October 2025 [2].
Ray originated from research at UC Berkeley's RISELab (Real-time Intelligent Secure Execution Lab), a successor to the AMPLab that had produced Apache Spark. In 2016, graduate students Philipp Moritz and Robert Nishihara, working under professors Ion Stoica and Michael I. Jordan, began building a system to address the computational demands of emerging AI applications, particularly reinforcement learning. Traditional cluster computing systems like MapReduce and Spark were designed for bulk-synchronous parallel processing and could not handle the heterogeneous, fine-grained workloads common in RL training, which require millions of short-lived tasks mixed with long-running stateful computations [1].
The first public release of Ray (version 0.1) appeared in May 2017. Version 0.2 followed in September 2017, introducing improvements to the Plasma object store, an initial Jupyter-based web UI, early reinforcement learning support, and fault tolerance for actors. Version 0.3, released in November 2017, added distributed actor handles and introduced Ray Tune for hyperparameter search [3].
The foundational paper, "Ray: A Distributed Framework for Emerging AI Applications," was published at the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) in October 2018. The authors were Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. The paper argued that next-generation AI applications would need a system capable of supporting both stateless parallel tasks and stateful actor computations, with millisecond-level task latencies and the ability to scale to millions of tasks per second. Ray's design met these requirements through a distributed scheduler, a distributed object store built on shared memory, and a Global Control Store (GCS) that maintains system metadata and enables fault tolerance [1].
Ray continued to evolve through the late 2010s and early 2020s with growing adoption in both research and industry. In August 2022, Anyscale announced Ray 2.0 at the Ray Summit conference. Ray 2.0 introduced the Ray AI Runtime (AIR), a unified layer that brought Ray's separate libraries (Tune, Train, Serve, Data, RLlib) into a cohesive framework for end-to-end ML pipelines. This release also featured improvements to Ray Serve's Deployment Graph API and new algorithms in RLlib [4].
By 2025, Ray had reached version 2.50+ with steady improvements to performance, security (built-in token authentication), data processing (Iceberg support, expression framework enhancements), and Python compatibility. Python 3.9 support was dropped after Ray 2.51.x. As of March 2026, the latest stable release is Ray 2.53.0 [5].
On October 22, 2025, the PyTorch Foundation announced at the PyTorch Conference that Ray would join The Linux Foundation as part of the PyTorch Foundation. This move positioned Ray alongside PyTorch for model development and vLLM for inference, forming an integrated open-source foundation for the AI compute stack [2].
Ray's architecture is divided into two layers: the application layer and the system layer.
The application layer consists of three components:
The system layer has three major components:
Global Control Store (GCS): A key-value store with pub-sub functionality that maintains the entire control state of the system. The GCS stores task specifications, remote function code, computation graph metadata, object location tables, and scheduling events. It uses sharding for scalability and per-shard chain replication for fault tolerance. By centralizing control state in the GCS and keeping all other components stateless, Ray achieves clean separation between state management and computation [1].
Distributed Object Store: Ray uses a shared-memory object store (originally forked from Apache Arrow's Plasma store) that runs on every node. Objects created by tasks and actors are stored as immutable entries in this distributed store, enabling zero-copy reads for processes on the same node via shared memory. When objects are needed on a different node, they are transferred over the network. The immutability of objects simplifies consistency and enables lineage-based fault recovery [1].
Distributed Scheduler: Ray employs a bottom-up hierarchical scheduling architecture. Each node has a local scheduler, and there is a global scheduler for the cluster. Tasks created on a node are first submitted to the local scheduler. If the local scheduler determines that the task cannot run locally (due to resource constraints or data locality), it forwards the task to the global scheduler, which uses load information and object location metadata from the GCS to place the task on an appropriate node. This design avoids the bottleneck of a centralized scheduler while still enabling cluster-wide optimization [1].
A Ray cluster consists of a head node and zero or more worker nodes. The head node runs the same workloads as worker nodes but also hosts cluster-level services including the GCS, the autoscaler, and the Ray dashboard. The autoscaler monitors pending resource requests and automatically adds or removes nodes to match workload demand. Ray supports heterogeneous hardware, allowing clusters to mix CPU-only nodes with GPU-equipped nodes (including multi-GPU machines) [6].
Ray Core is the foundational distributed runtime that provides three simple but powerful primitives for building distributed applications.
Tasks are stateless function invocations that execute remotely. Any Python function can be turned into a Ray task by decorating it with @ray.remote. Once decorated, the function is called using .remote() instead of a direct invocation, and it returns a future (an object reference) rather than the result itself. Tasks can be scheduled across any available node in the cluster and can specify resource requirements such as CPU cores or GPU fractions.
import ray
ray.init()
@ray.remote
def square(x):
return x * x
futures = [square.remote(i) for i in range(4)]
results = ray.get(futures) # [0, 1, 4, 9]
Actors are stateful computations implemented as Python classes decorated with @ray.remote. When an actor is instantiated, Ray creates a dedicated worker process that maintains the actor's state. Methods on the actor can be called remotely, and each method invocation is executed sequentially on that worker, preserving state consistency. Actors are suitable for scenarios requiring mutable state, such as parameter servers, simulators, or long-running services.
@ray.remote
class Counter:
def __init__(self):
self.value = 0
def increment(self):
self.value += 1
return self.value
counter = Counter.remote()
ray.get(counter.increment.remote()) # 1
ray.get(counter.increment.remote()) # 2
Objects are immutable values stored in Ray's distributed object store. When a task or actor method returns a value, it is placed in the object store and an object reference is returned to the caller. Object references can be passed to other tasks or actors, enabling data to flow through a computation graph without requiring explicit serialization by the user. The ray.put() function allows users to explicitly place large objects into the store for efficient sharing across tasks [7].
Ray includes a set of high-level libraries (collectively known as the Ray AI Libraries) built on top of Ray Core. Each library addresses a specific phase of the ML lifecycle.
Ray Train is a library for distributed training that abstracts the complexities of multi-node, multi-GPU training. It supports PyTorch (with TorchTrainer), TensorFlow/Keras (with TensorFlowTrainer), and other frameworks.
For PyTorch workloads, Ray Train handles torch.distributed initialization, NCCL backend setup, and DistributedDataParallel (DDP) wrapping behind the scenes. For models too large to fit on a single GPU, it supports FullyShardedDataParallel (FSDP) to shard parameters, gradients, and optimizer states across workers. For TensorFlow, it configures TF_CONFIG and supports MultiWorkerMirroredStrategy [8].
Key features of Ray Train include:
Ray Tune is a library for hyperparameter tuning and experiment execution at any scale. It supports running experiments from a single machine to a large distributed cluster with minimal code changes.
Tune provides a pluggable architecture for search algorithms and trial schedulers:
| Component | Description | Examples |
|---|---|---|
| Search Algorithms | Suggest hyperparameter configurations to evaluate | Random Search, Bayesian Optimization (BayesOpt), Optuna, Ax, BOHB, Nevergrad |
| Trial Schedulers | Decide whether to stop, pause, or modify running trials based on intermediate results | ASHA (Asynchronous Successive Halving), HyperBand, Population Based Training (PBT), FIFO (default) |
| Reporters | Log and display metrics during the tuning process | CLIReporter, JupyterNotebookReporter |
When no search algorithm is specified, Tune defaults to random search. When no scheduler is specified, it uses a FIFO scheduler that runs trials in the order they were selected without early stopping. For efficient large-scale tuning, Tune also offers utilities such as ConcurrencyLimiter (to cap concurrent trials) and Repeater (to run each configuration with multiple random seeds) [9].
Ray Serve is a framework-agnostic library for online model serving and inference. It can serve models built with PyTorch, TensorFlow, Scikit-learn, XGBoost, and arbitrary Python logic. Built on Ray's actor model, Ray Serve enables complex inference pipelines where multiple models and business logic are composed into a directed acyclic graph (DAG).
Core features include:
Ray Serve has developed specialized support for LLM serving through the Ray Serve LLM APIs, which provide OpenAI-compatible endpoints, integration with vLLM as a backend engine, multi-node and multi-GPU serving, and advanced parallelism strategies including pipeline parallelism, tensor parallelism, expert parallelism (for Mixture of Experts models), and prefill-decode disaggregation [10].
Ray Data is a library for scalable data processing designed for AI workloads. It features a streaming execution engine that processes data through pipelines of operators, where each operator can be scaled independently. This streaming model keeps both CPUs and GPUs active, reducing idle time during training or inference.
Ray Data supports reading from and writing to common formats (Parquet, CSV, JSON, images, Iceberg tables) and integrates with cloud storage providers (AWS S3, Google Cloud Storage, Azure Blob Storage). It provides first-class integration with AI frameworks including vLLM, PyTorch, Hugging Face, and TensorFlow [11].
The library uses a two-phase planning process: a logical plan (describing what operations to perform) is built first, then converted into a physical plan (specifying exactly how to execute those operations) when execution begins. Data processing is parallelized at the block level, with blocks distributed across the cluster for parallel execution.
Primary use cases include:
RLlib is Ray's library for scalable reinforcement learning. It supports production-level, fault-tolerant RL workloads with a modular architecture centered on the Algorithm class. An Algorithm manages EnvRunner actors for collecting training samples from the environment and Learner actors for computing gradients and updating models. The number of each actor type is independently configurable, allowing users to scale data collection and model updates separately [12].
RLlib supports a broad range of RL paradigms:
| Category | Description |
|---|---|
| Model-free RL | On-policy (PPO, A2C) and off-policy (DQN, SAC, TD3) algorithms |
| Model-based RL | Algorithms that learn and plan with environment models |
| Offline RL | Training on pre-collected datasets (e.g., Critic Regularized Regression) |
| Multi-agent RL | Native support for cooperative, competitive, and mixed multi-agent settings with independent, shared, or centralized critics |
| Self-play | Adversarial training using self-play or league-based self-play |
RLlib's RLModule API supports arbitrary model architectures with PyTorch, complex multi-model setups, and multi-agent models with shared components between agents. It integrates with the Farama Foundation's Gymnasium, PettingZoo for multi-agent environments, and custom environment formats [12].
Anyscale is the commercial company behind Ray, founded in 2019 by Ion Stoica, Robert Nishihara, and Philipp Moritz. The company is headquartered in San Francisco, California. Anyscale provides a managed cloud platform for running Ray workloads, offering features such as cluster management, autoscaling, monitoring, and enterprise support on top of the open-source framework [13].
Anyscale has raised approximately $260 million across multiple funding rounds:
| Round | Date | Amount | Lead Investors |
|---|---|---|---|
| Seed | 2019 | $20.6M | Not disclosed |
| Series B | 2020 | $40M | NEA |
| Series C | December 2021 | $100M | Andreessen Horowitz, Addition |
| Series C extension | August 2022 | $99M | Addition, Intel Capital |
The December 2021 Series C round valued Anyscale at $1 billion, making it a unicorn. Investors include Andreessen Horowitz, NEA, Addition, Intel Capital, and Foundation Capital [14].
Ion Stoica serves as Co-Founder and Executive Chairman. Stoica is also a UC Berkeley professor and co-creator of Apache Spark. Philipp Moritz serves as Co-Founder and CTO. Robert Nishihara, who served as CEO in the company's early years, transitioned to a product-focused role in 2024. Keerti Melkote, an industry veteran, became CEO of Anyscale [15].
The Anyscale Platform provides a managed runtime for Ray with additional enterprise features. At Ray Summit 2025, Anyscale announced new capabilities including lineage tracking, a new Anyscale Runtime API-compatible engine, Azure support, and a Global Resource Scheduler with multi-resource cloud support. The platform also integrates with NVIDIA NeMo for LLM fine-tuning and customization [16].
Ray has become a central component in the infrastructure used to train and serve large language models. Its ability to coordinate distributed computation across thousands of GPUs makes it well suited for the massive scale of modern LLM workloads.
OpenAI used Ray to coordinate the training of ChatGPT, including GPT-3.5 and GPT-4. Ray Train handled the orchestration of distributed training workers across large GPU clusters [17].
Cohere uses Ray on top of TPU v4 pods for distributing and scheduling tasks across many TPU hosts as part of their LLM training pipeline, alongside PyTorch, JAX, and TPU-specific tooling [17].
vLLM, the widely adopted open-source LLM inference engine, uses Ray as its distributed runtime for multi-node inference. While vLLM can use native Python multiprocessing for single-node deployments, multi-node deployments require Ray. In a vLLM Ray cluster, the head node coordinates the cluster and serves the API endpoint, while worker nodes provide additional GPU resources for tensor parallelism and pipeline parallelism. Ray handles all inter-node communication on behalf of vLLM [18].
Other companies using Ray at scale include Uber, Shopify, Spotify, Pinterest, Instacart, and Roblox [13].
KubeRay is the official Kubernetes operator for deploying and managing Ray clusters on Kubernetes. It is maintained in the ray-project/kuberay GitHub repository and provides a toolkit for running Ray applications in containerized environments.
KubeRay introduces three Kubernetes Custom Resource Definitions (CRDs):
| CRD | Purpose | Key Features |
|---|---|---|
| RayCluster | Manages the lifecycle of a Ray cluster | Cluster creation/deletion, autoscaling, fault tolerance |
| RayJob | Submits a one-off job to a Ray cluster | Automatically creates a RayCluster, submits the job, and optionally deletes the cluster when the job finishes |
| RayService | Manages Ray Serve deployments | Zero-downtime upgrades, high availability, combines a RayCluster with a Serve deployment graph |
KubeRay supports heterogeneous compute nodes (including GPUs), running multiple Ray clusters with different Ray versions in the same Kubernetes cluster, and optional autoscaling that sizes Ray clusters according to workload demand. Starting from KubeRay v1.3.0, users can use the kubectl ray plugin to simplify common workflows. KubeRay v1.4.0 introduced a dashboard for viewing and managing KubeRay resources [19].
KubeRay integrates with the broader Kubernetes ecosystem, including observability tools (Prometheus, Grafana), queuing systems (Volcano, Apache YuniKorn, Kueue), and ingress controllers (Nginx). Major cloud providers offer managed KubeRay support: Google Cloud provides Ray on GKE, and Databricks offers Ray on Databricks [19].
Ray operates in a space shared by several distributed computing frameworks, each with different design philosophies and strengths.
| Feature | Ray | Apache Spark | Dask |
|---|---|---|---|
| Primary focus | AI/ML workloads, general distributed Python | Large-scale data processing (ETL) | Scaling Python analytics (pandas, NumPy) |
| Task latency | Microsecond-level | Millisecond-level | Millisecond-level |
| Task throughput | Millions of tasks per second | Optimized for batch throughput | Moderate |
| Programming model | Tasks + Actors (stateful and stateless) | Functional transformations on RDDs/DataFrames | Task graphs on collections |
| Language | Python-first (also C++, Java) | Scala, Java, Python, R | Python |
| GPU support | Native, first-class GPU scheduling | Limited (via third-party libraries) | Basic GPU support |
| Fault tolerance | Lineage-based reconstruction via GCS | Lineage-based reconstruction via RDD lineage | Task graph recomputation |
| Ecosystem | Train, Tune, Serve, Data, RLlib | MLlib, Spark SQL, Structured Streaming | Dask ML, Dask DataFrame, Dask Array |
| Maturity | First released 2017 | First released 2010 | First released 2014 |
| Best for | Distributed ML training, RL, model serving, LLM inference | Production ETL pipelines, large-scale batch processing | Scaling pandas/NumPy workflows beyond single-machine memory |
Notably, these frameworks are not mutually exclusive. It is possible to run Spark on Ray (via the RayDP library), Ray on Spark (via Databricks' integration), and Dask on Ray (via the Dask-on-Ray scheduler). This interoperability allows organizations to adopt Ray incrementally alongside existing data infrastructure [20].
Ray is the strongest choice when workloads involve fine-grained task parallelism, stateful computations (actors), GPU-heavy ML training or inference, reinforcement learning, or real-time model serving. Its microsecond-level task scheduling overhead and shared-memory object store make it particularly efficient for workloads that require rapid communication between distributed components. For pure ETL or SQL-heavy analytics on petabyte-scale data, Spark remains the more proven option. For scaling pandas-style analytics on medium-sized data, Dask offers a lower barrier to entry [20].
As of early 2026, the Ray project (hosted at github.com/ray-project/ray) has over 41,800 GitHub stars, 7,400 forks, and more than 1,400 contributors. The codebase is primarily written in Python (76%) and C++ (18%), with additional components in Java, TypeScript, Starlark, and Cython [21].
Ray's community gathers at the annual Ray Summit conference organized by Anyscale, with Ray Summit 2025 held November 3 to 5 in San Francisco. The community also maintains a discussion board and a Slack workspace for developers and users.
Ray's adoption spans multiple sectors:
| Sector | Example Users | Use Cases |
|---|---|---|
| AI research labs | OpenAI, Cohere | LLM training orchestration |
| E-commerce | Shopify, Instacart | ML pipelines, recommendation systems |
| Media and entertainment | Spotify, Roblox | Content recommendation, game AI |
| Ride-sharing and delivery | Uber | Large-scale ML training and serving |
| Social media | Visual search, recommendation models | |
| Open-source AI | vLLM | Distributed LLM inference |
The framework's position within the PyTorch Foundation, alongside PyTorch and vLLM, solidifies its role as a foundational layer in the open-source AI compute stack [2].
Ray can be installed via pip:
pip install ray
To install specific library components:
pip install "ray[train]" # For Ray Train
pip install "ray[tune]" # For Ray Tune
pip install "ray[serve]" # For Ray Serve
pip install "ray[data]" # For Ray Data
pip install "ray[rllib]" # For RLlib
pip install "ray[all]" # All libraries
Ray supports Python 3.10 and later (Python 3.9 support was dropped after Ray 2.51.x). It runs on Linux, macOS, and Windows. Clusters can be deployed on bare metal, cloud VMs (AWS, GCP, Azure), or Kubernetes via KubeRay [5].