Ray (framework)

AI Infrastructure Developer Tools Machine Learning Open Source AI

21 min read

Updated Jun 22, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 22, 2026

Fact-checked

In review queue

Sources

21 citations

Revision

v5 · 4,196 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Ray is an open-source distributed computing framework, developed at the University of California, Berkeley's RISELab and commercialized by Anyscale, that lets developers scale Python and artificial intelligence applications from a laptop to a cluster of thousands of CPUs and GPUs with minimal code changes. Ray provides a universal API for distributed execution plus a suite of libraries for machine learning workloads: distributed training (Ray Train), hyperparameter tuning (Ray Tune), model serving (Ray Serve), reinforcement learning (RLlib), and data processing (Ray Data). Introduced in a 2018 paper at the USENIX OSDI conference by Philipp Moritz, Robert Nishihara, and collaborators, Ray's original prototype demonstrated scaling to more than 1.8 million tasks per second ^[1]. As of early 2026 it had surpassed 237 million downloads and more than 41,000 GitHub stars, is licensed under the Apache License 2.0, and in October 2025 joined the PyTorch Foundation, where it sits alongside PyTorch and vLLM as a foundational layer of the open-source AI compute stack ^[2].

What is Ray used for?

Ray is used to scale the full MLOps lifecycle: data preprocessing, distributed training, hyperparameter tuning, reinforcement learning, batch inference, and online model serving. Its core value is that the same lightweight primitives (tasks, actors, and objects) that run a program on a single machine also run it across a cluster, so teams can move from prototype to production without rewriting their code for a separate distributed system. "With Ray, our goal is to make distributed computing as straightforward as writing Python code," said Robert Nishihara, co-founder of Anyscale, on Ray joining the PyTorch Foundation ^[2]. The framework is especially well suited to AI workloads that mix many short-lived parallel tasks with long-running stateful computation, a pattern common in large language model training pipelines, reinforcement learning, and real-time serving.

History

Origins at UC Berkeley RISELab

Ray originated from research at UC Berkeley's RISELab (Real-time Intelligent Secure Execution Lab), a successor to the AMPLab that had produced Apache Spark. In 2016, graduate students Philipp Moritz and Robert Nishihara, working under professors Ion Stoica and Michael I. Jordan, began building a system to address the computational demands of emerging AI applications, particularly reinforcement learning. Traditional cluster computing systems like MapReduce and Spark were designed for bulk-synchronous parallel processing and could not handle the heterogeneous, fine-grained workloads common in RL training, which require millions of short-lived tasks mixed with long-running stateful computations ^[1].

The first public release of Ray (version 0.1) appeared in May 2017. Version 0.2 followed in September 2017, introducing improvements to the Plasma object store, an initial Jupyter-based web UI, early reinforcement learning support, and fault tolerance for actors. Version 0.3, released in November 2017, added distributed actor handles and introduced Ray Tune for hyperparameter search ^[3].

What did the OSDI 2018 paper claim?

The foundational paper, "Ray: A Distributed Framework for Emerging AI Applications," was published at the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) in October 2018. The authors were Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. The paper argued that next-generation AI applications would need a system capable of supporting both stateless parallel tasks and stateful actor computations, with millisecond-level task latencies and the ability to scale to millions of tasks per second. The authors reported that their implementation could "scale beyond 1.8 million tasks per second" while outperforming existing specialized systems on several reinforcement learning workloads ^[1]. Ray's design met these requirements through a distributed scheduler, a distributed object store built on shared memory, and a Global Control Store (GCS) that maintains system metadata and enables fault tolerance ^[1].

Growth and Ray 2.0

Ray continued to evolve through the late 2010s and early 2020s with growing adoption in both research and industry. In August 2022, Anyscale announced Ray 2.0 at the Ray Summit conference. Ray 2.0 introduced the Ray AI Runtime (AIR), a unified layer that brought Ray's separate libraries (Tune, Train, Serve, Data, RLlib) into a cohesive framework for end-to-end ML pipelines. This release also featured improvements to Ray Serve's Deployment Graph API and new algorithms in RLlib ^[4].

By 2025, Ray had reached version 2.50+ with steady improvements to performance, security (built-in token authentication), data processing (Iceberg support, expression framework enhancements), and Python compatibility. Python 3.9 support was dropped after Ray 2.51.x. As of March 2026, the latest stable release is Ray 2.53.0 ^[5].

When did Ray join the PyTorch Foundation?

On October 22, 2025, the PyTorch Foundation announced at the PyTorch Conference that Ray would join The Linux Foundation as part of the PyTorch Foundation. At the time of the announcement Ray had reached 237 million downloads, and the foundation described the move as completing a critical layer of the open-source AI stack. This positioned Ray alongside PyTorch for model development and vLLM for inference, forming an integrated open-source foundation for the AI compute stack ^[2].

Architecture

Ray's architecture is divided into two layers: the application layer and the system layer.

Application Layer

The application layer consists of three components:

Driver: The process that runs the user's main program. It is responsible for submitting tasks and creating actors.
Worker: A stateless process that executes tasks (remote functions). Workers are created by the system and can run on any node in the cluster.
Actor: A stateful process that exposes methods for remote invocation. Each actor runs on a dedicated worker and maintains internal state between method calls.

System Layer

The system layer has three major components:

Global Control Store (GCS): A key-value store with pub-sub functionality that maintains the entire control state of the system. The GCS stores task specifications, remote function code, computation graph metadata, object location tables, and scheduling events. It uses sharding for scalability and per-shard chain replication for fault tolerance. By centralizing control state in the GCS and keeping all other components stateless, Ray achieves clean separation between state management and computation ^[1].

Distributed Object Store: Ray uses a shared-memory object store (originally forked from Apache Arrow's Plasma store) that runs on every node. Objects created by tasks and actors are stored as immutable entries in this distributed store, enabling zero-copy reads for processes on the same node via shared memory. When objects are needed on a different node, they are transferred over the network. The immutability of objects simplifies consistency and enables lineage-based fault recovery ^[1].

Distributed Scheduler: Ray employs a bottom-up hierarchical scheduling architecture. Each node has a local scheduler, and there is a global scheduler for the cluster. Tasks created on a node are first submitted to the local scheduler. If the local scheduler determines that the task cannot run locally (due to resource constraints or data locality), it forwards the task to the global scheduler, which uses load information and object location metadata from the GCS to place the task on an appropriate node. This design avoids the bottleneck of a centralized scheduler while still enabling cluster-wide optimization ^[1].

Cluster Architecture

A Ray cluster consists of a head node and zero or more worker nodes. The head node runs the same workloads as worker nodes but also hosts cluster-level services including the GCS, the autoscaler, and the Ray dashboard. The autoscaler monitors pending resource requests and automatically adds or removes nodes to match workload demand. Ray supports heterogeneous hardware, allowing clusters to mix CPU-only nodes with GPU-equipped nodes (including multi-GPU machines) ^[6].

Ray Core

Ray Core is the foundational distributed runtime that provides three simple but powerful primitives for building distributed applications.

Tasks

Tasks are stateless function invocations that execute remotely. Any Python function can be turned into a Ray task by decorating it with @ray.remote. Once decorated, the function is called using .remote() instead of a direct invocation, and it returns a future (an object reference) rather than the result itself. Tasks can be scheduled across any available node in the cluster and can specify resource requirements such as CPU cores or GPU fractions.

import ray

ray.init()

@ray.remote
def square(x):
    return x * x

futures = [square.remote(i) for i in range(4)]
results = ray.get(futures)  # [0, 1, 4, 9]

Actors

Actors are stateful computations implemented as Python classes decorated with @ray.remote. When an actor is instantiated, Ray creates a dedicated worker process that maintains the actor's state. Methods on the actor can be called remotely, and each method invocation is executed sequentially on that worker, preserving state consistency. Actors are suitable for scenarios requiring mutable state, such as parameter servers, simulators, or long-running services.

@ray.remote
class Counter:
    def __init__(self):
        self.value = 0

    def increment(self):
        self.value += 1
        return self.value

counter = Counter.remote()
ray.get(counter.increment.remote())  # 1
ray.get(counter.increment.remote())  # 2

Objects

Objects are immutable values stored in Ray's distributed object store. When a task or actor method returns a value, it is placed in the object store and an object reference is returned to the caller. Object references can be passed to other tasks or actors, enabling data to flow through a computation graph without requiring explicit serialization by the user. The ray.put() function allows users to explicitly place large objects into the store for efficient sharing across tasks ^[7].

Ray Libraries

Ray includes a set of high-level libraries (collectively known as the Ray AI Libraries) built on top of Ray Core. Each library addresses a specific phase of the ML lifecycle.

Ray Train

Ray Train is a library for distributed training that abstracts the complexities of multi-node, multi-GPU training. It supports PyTorch (with TorchTrainer), TensorFlow/Keras (with TensorFlowTrainer), and other frameworks.

For PyTorch workloads, Ray Train handles torch.distributed initialization, NCCL backend setup, and DistributedDataParallel (DDP) wrapping behind the scenes. For models too large to fit on a single GPU, it supports FullyShardedDataParallel (FSDP) to shard parameters, gradients, and optimizer states across workers. For TensorFlow, it configures TF_CONFIG and supports MultiWorkerMirroredStrategy ^[8].

Key features of Ray Train include:

Fault tolerance: When a worker process crashes, Ray Train restarts it and resumes training from the most recent checkpoint. If an entire node goes down, a replacement is provisioned and training recovers automatically.
Data splitting: Training data is automatically partitioned across workers on the fly, so only the fraction needed by each worker is transferred and read, enabling training on datasets larger than any single node's memory.
Checkpoint management: Integrated checkpointing allows models to be saved periodically and restored on failure or for later evaluation.

Ray Tune

Ray Tune is a library for hyperparameter tuning and experiment execution at any scale. It supports running experiments from a single machine to a large distributed cluster with minimal code changes.

Tune provides a pluggable architecture for search algorithms and trial schedulers:

Component	Description	Examples
Search Algorithms	Suggest hyperparameter configurations to evaluate	Random Search, Bayesian Optimization (BayesOpt), Optuna, Ax, BOHB, Nevergrad
Trial Schedulers	Decide whether to stop, pause, or modify running trials based on intermediate results	ASHA (Asynchronous Successive Halving), HyperBand, Population Based Training (PBT), FIFO (default)
Reporters	Log and display metrics during the tuning process	CLIReporter, JupyterNotebookReporter

When no search algorithm is specified, Tune defaults to random search. When no scheduler is specified, it uses a FIFO scheduler that runs trials in the order they were selected without early stopping. For efficient large-scale tuning, Tune also offers utilities such as ConcurrencyLimiter (to cap concurrent trials) and Repeater (to run each configuration with multiple random seeds) ^[9].

Ray Serve

Ray Serve is a framework-agnostic library for online model serving and inference. It can serve models built with PyTorch, TensorFlow, Scikit-learn, XGBoost, and arbitrary Python logic. Built on Ray's actor model, Ray Serve enables complex inference pipelines where multiple models and business logic are composed into a directed acyclic graph (DAG).

Core features include:

Autoscaling: Serve monitors request load and scales the number of model replicas up or down automatically.
Dynamic request batching: Incoming requests are batched on the fly to maximize GPU throughput.
Fractional GPU allocation: Multiple models can share a single GPU by requesting fractional GPU resources, reducing infrastructure costs.
Multi-model composition: The Deployment Graph API allows chaining multiple models and preprocessing steps into a single inference pipeline.
Response streaming: For large language models, Serve supports streaming responses token by token.

Ray Serve has developed specialized support for LLM serving through the Ray Serve LLM APIs, which provide OpenAI-compatible endpoints, integration with vLLM as a backend engine, multi-node and multi-GPU serving, and advanced parallelism strategies including pipeline parallelism, tensor parallelism, expert parallelism (for Mixture of Experts models), and prefill-decode disaggregation ^[10].

Ray Data

Ray Data is a library for scalable data processing designed for AI workloads. It features a streaming execution engine that processes data through pipelines of operators, where each operator can be scaled independently. This streaming model keeps both CPUs and GPUs active, reducing idle time during training or inference.

Ray Data supports reading from and writing to common formats (Parquet, CSV, JSON, images, Iceberg tables) and integrates with cloud storage providers (AWS S3, Google Cloud Storage, Azure Blob Storage). It provides first-class integration with AI frameworks including vLLM, PyTorch, Hugging Face, and TensorFlow ^[11].

The library uses a two-phase planning process: a logical plan (describing what operations to perform) is built first, then converted into a physical plan (specifying exactly how to execute those operations) when execution begins. Data processing is parallelized at the block level, with blocks distributed across the cluster for parallel execution.

Primary use cases include:

Offline batch inference: Processing large datasets through ML models for batch predictions.
Data preprocessing for training: Feature engineering, augmentation, and data transformation at scale.
Last-mile data ingest: Streaming preprocessed data into Ray Train for distributed training.

Ray RLlib

RLlib is Ray's library for scalable reinforcement learning. It supports production-level, fault-tolerant RL workloads with a modular architecture centered on the Algorithm class. An Algorithm manages EnvRunner actors for collecting training samples from the environment and Learner actors for computing gradients and updating models. The number of each actor type is independently configurable, allowing users to scale data collection and model updates separately ^[12].

RLlib supports a broad range of RL paradigms:

Category	Description
Model-free RL	On-policy (PPO, A2C) and off-policy (DQN, SAC, TD3) algorithms
Model-based RL	Algorithms that learn and plan with environment models
Offline RL	Training on pre-collected datasets (e.g., Critic Regularized Regression)
Multi-agent RL	Native support for cooperative, competitive, and mixed multi-agent settings with independent, shared, or centralized critics
Self-play	Adversarial training using self-play or league-based self-play

RLlib's RLModule API supports arbitrary model architectures with PyTorch, complex multi-model setups, and multi-agent models with shared components between agents. It integrates with the Farama Foundation's Gymnasium, PettingZoo for multi-agent environments, and custom environment formats ^[12].

Anyscale

Anyscale is the commercial company behind Ray, founded in 2019 by Ion Stoica, Robert Nishihara, and Philipp Moritz. The company is headquartered in San Francisco, California. Anyscale provides a managed cloud platform for running Ray workloads, offering features such as cluster management, autoscaling, monitoring, and enterprise support on top of the open-source framework ^[13].

How much funding has Anyscale raised?

Anyscale has raised approximately $260 million across multiple funding rounds ^[14]:

Round	Date	Amount	Lead Investors
Seed	2019	$20.6M	Not disclosed
Series B	2020	$40M	NEA
Series C	December 2021	$100M	Andreessen Horowitz, Addition
Series C extension	August 2022	$99M	Addition, Intel Capital

The December 2021 Series C round valued Anyscale at $1 billion, making it a unicorn ^[14]. The August 2022 extension added a further $99 million from existing investors Addition, Intel Capital, and Foundation Capital, announced alongside the Ray 2.0 release at Ray Summit 2022 ^[4]. Investors across the rounds include Andreessen Horowitz, NEA, Addition, Intel Capital, and Foundation Capital.

Leadership

Ion Stoica serves as Co-Founder and Executive Chairman. Stoica is also a UC Berkeley professor and co-creator of Apache Spark. Philipp Moritz serves as Co-Founder and CTO. Robert Nishihara, who served as CEO in the company's early years, transitioned to a product-focused role in 2024. Keerti Melkote, who previously founded Aruba Networks and led it through its 2007 IPO and 2015 acquisition by Hewlett Packard Enterprise, became CEO of Anyscale in 2024 ^[15].

Anyscale Platform

The Anyscale Platform provides a managed runtime for Ray with additional enterprise features. At Ray Summit 2025, Anyscale announced new capabilities including lineage tracking, a new Anyscale Runtime API-compatible engine, Azure support, and a Global Resource Scheduler with multi-resource cloud support. The platform also integrates with NVIDIA NeMo for LLM fine-tuning and customization ^[16].

Use in LLM Training and Serving

Ray has become a central component in the infrastructure used to train and serve large language models. Its ability to coordinate distributed computation across thousands of GPUs makes it well suited for the massive scale of modern LLM workloads.

Which companies use Ray?

OpenAI used Ray to coordinate the training of ChatGPT, including GPT-3.5 and GPT-4. Ray Train handled the orchestration of distributed training workers across large GPU clusters ^[17].

Cohere uses Ray on top of TPU v4 pods for distributing and scheduling tasks across many TPU hosts as part of their LLM training pipeline, alongside PyTorch, JAX, and TPU-specific tooling ^[17].

vLLM, the widely adopted open-source LLM inference engine, uses Ray as its distributed runtime for multi-node inference. While vLLM can use native Python multiprocessing for single-node deployments, multi-node deployments require Ray. In a vLLM Ray cluster, the head node coordinates the cluster and serves the API endpoint, while worker nodes provide additional GPU resources for tensor parallelism and pipeline parallelism. Ray handles all inter-node communication on behalf of vLLM ^[18].

Other companies using Ray at scale include Uber, Shopify, Spotify, Pinterest, Instacart, and Roblox ^[13].

Ray on Kubernetes (KubeRay)

KubeRay is the official Kubernetes operator for deploying and managing Ray clusters on Kubernetes. It is maintained in the ray-project/kuberay GitHub repository and provides a toolkit for running Ray applications in containerized environments.

Custom Resource Definitions

KubeRay introduces three Kubernetes Custom Resource Definitions (CRDs):

CRD	Purpose	Key Features
RayCluster	Manages the lifecycle of a Ray cluster	Cluster creation/deletion, autoscaling, fault tolerance
RayJob	Submits a one-off job to a Ray cluster	Automatically creates a RayCluster, submits the job, and optionally deletes the cluster when the job finishes
RayService	Manages Ray Serve deployments	Zero-downtime upgrades, high availability, combines a RayCluster with a Serve deployment graph

Features

KubeRay supports heterogeneous compute nodes (including GPUs), running multiple Ray clusters with different Ray versions in the same Kubernetes cluster, and optional autoscaling that sizes Ray clusters according to workload demand. Starting from KubeRay v1.3.0, users can use the kubectl ray plugin to simplify common workflows. KubeRay v1.4.0 introduced a dashboard for viewing and managing KubeRay resources ^[19].

KubeRay integrates with the broader Kubernetes ecosystem, including observability tools (Prometheus, Grafana), queuing systems (Volcano, Apache YuniKorn, Kueue), and ingress controllers (Nginx). Major cloud providers offer managed KubeRay support: Google Cloud provides Ray on GKE, and Databricks offers Ray on Databricks ^[19].

How does Ray differ from Spark and Dask?

Ray operates in a space shared by several distributed computing frameworks, each with different design philosophies and strengths.

Feature	Ray	Apache Spark	Dask
Primary focus	AI/ML workloads, general distributed Python	Large-scale data processing (ETL)	Scaling Python analytics (pandas, NumPy)
Task latency	Microsecond-level	Millisecond-level	Millisecond-level
Task throughput	Millions of tasks per second	Optimized for batch throughput	Moderate
Programming model	Tasks + Actors (stateful and stateless)	Functional transformations on RDDs/DataFrames	Task graphs on collections
Language	Python-first (also C++, Java)	Scala, Java, Python, R	Python
GPU support	Native, first-class GPU scheduling	Limited (via third-party libraries)	Basic GPU support
Fault tolerance	Lineage-based reconstruction via GCS	Lineage-based reconstruction via RDD lineage	Task graph recomputation
Ecosystem	Train, Tune, Serve, Data, RLlib	MLlib, Spark SQL, Structured Streaming	Dask ML, Dask DataFrame, Dask Array
Maturity	First released 2017	First released 2010	First released 2014
Best for	Distributed ML training, RL, model serving, LLM inference	Production ETL pipelines, large-scale batch processing	Scaling pandas/NumPy workflows beyond single-machine memory

Notably, these frameworks are not mutually exclusive. It is possible to run Spark on Ray (via the RayDP library), Ray on Spark (via Databricks' integration), and Dask on Ray (via the Dask-on-Ray scheduler). This interoperability allows organizations to adopt Ray incrementally alongside existing data infrastructure ^[20].

When to Choose Ray

Ray is the strongest choice when workloads involve fine-grained task parallelism, stateful computations (actors), GPU-heavy ML training or inference, reinforcement learning, or real-time model serving. Its microsecond-level task scheduling overhead and shared-memory object store make it particularly efficient for workloads that require rapid communication between distributed components. For pure ETL or SQL-heavy analytics on petabyte-scale data, Spark remains the more proven option. For scaling pandas-style analytics on medium-sized data, Dask offers a lower barrier to entry ^[20].

Community and Adoption

As of early 2026, the Ray project (hosted at github.com/ray-project/ray) has over 41,800 GitHub stars, 7,400 forks, and more than 1,400 contributors. The codebase is primarily written in Python (76%) and C++ (18%), with additional components in Java, TypeScript, Starlark, and Cython ^[21].

Ray's community gathers at the annual Ray Summit conference organized by Anyscale, with Ray Summit 2025 held November 3 to 5 in San Francisco ^[2]. The community also maintains a discussion board and a Slack workspace for developers and users.

Ray's adoption spans multiple sectors:

Sector	Example Users	Use Cases
AI research labs	OpenAI, Cohere	LLM training orchestration
E-commerce	Shopify, Instacart	ML pipelines, recommendation systems
Media and entertainment	Spotify, Roblox	Content recommendation, game AI
Ride-sharing and delivery	Uber	Large-scale ML training and serving
Social media	Pinterest	Visual search, recommendation models
Open-source AI	vLLM	Distributed LLM inference

The framework's position within the PyTorch Foundation, alongside PyTorch and vLLM, solidifies its role as a foundational layer in the open-source AI compute stack ^[2].

Is Ray open source and how do I install it?

Ray is free and open source under the Apache License 2.0, with development hosted on GitHub and governed under the PyTorch Foundation. It can be installed via pip:

pip install ray

To install specific library components:

pip install "ray[train]"   # For Ray Train
pip install "ray[tune]"    # For Ray Tune
pip install "ray[serve]"   # For Ray Serve
pip install "ray[data]"    # For Ray Data
pip install "ray[rllib]"   # For RLlib
pip install "ray[all]"     # All libraries

Ray supports Python 3.10 and later (Python 3.9 support was dropped after Ray 2.51.x). It runs on Linux, macOS, and Windows. Clusters can be deployed on bare metal, cloud VMs (AWS, GCP, Azure), or Kubernetes via KubeRay ^[5].

References

Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M. I., & Stoica, I. (2018). "Ray: A Distributed Framework for Emerging AI Applications." 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18), pp. 561-577. https://www.usenix.org/conference/osdi18/presentation/moritz ↩
PyTorch Foundation. (2025). "PyTorch Foundation Welcomes Ray to Deliver a Unified Open Source AI Compute Stack." https://pytorch.org/blog/pytorch-foundation-welcomes-ray-to-deliver-a-unified-open-source-ai-compute-stack/ ↩
Ray Project. (2017). "Ray: 0.2 Release." https://ray-project.github.io/2017/09/30/ray-0.2-release.html ↩
Anyscale. (2022). "Anyscale Unveils Ray 2.0 and Anyscale Innovations at Ray Summit 2022; Adds an Additional $99M Funding from Existing Investors Addition, Intel Capital, and Foundation Capital." https://www.anyscale.com/blog/anyscale-unveils-ray-2-0-and-anyscale-innovations-at-ray-summit-2022-adds-an ↩
Ray Project. (2026). "Welcome to Ray! Ray Documentation." https://docs.ray.io/en/latest/index.html ↩
Ray Project. (2026). "Ray Clusters Overview." https://docs.ray.io/en/latest/cluster/kubernetes/index.html ↩
Ray Project. (2026). "Key Concepts: Ray Core." https://docs.ray.io/en/latest/ray-core/key-concepts.html ↩
Ray Project. (2026). "Ray Train: Scalable Model Training." https://docs.ray.io/en/latest/train/train.html ↩
Ray Project. (2026). "Ray Tune: Hyperparameter Tuning." https://docs.ray.io/en/latest/tune/index.html ↩
Ray Project. (2026). "Ray Serve: Scalable and Programmable Serving." https://docs.ray.io/en/latest/serve/index.html ↩
Ray Project. (2026). "Ray Data: Scalable Data Processing for AI Workloads." https://docs.ray.io/en/latest/data/data.html ↩
Ray Project. (2026). "RLlib: Industry-Grade, Scalable Reinforcement Learning." https://docs.ray.io/en/latest/rllib/index.html ↩
Anyscale. (2026). "About Us." https://www.anyscale.com/about-us ↩
Intel Capital. (2021). "Anyscale Secures $100M Series C at $1B Valuation." https://www.intelcapital.com/anyscale-secures-100m-series-c-at-1b-valuation-to-radically-simplify-scaling-and-productionizing-ai-applications/ ↩
Anyscale. (2024). "Anyscale Names Industry Veteran Keerti Melkote Chief Executive Officer." https://www.anyscale.com/press/anyscale-names-industry-veteran-keerti-melkote-chief-executive-officer ↩
Anyscale. (2025). "Ray Summit 2025: Anyscale Platform Updates." https://www.anyscale.com/blog/ray-summit-2025-anyscale-product-updates ↩
The New Stack. (2023). "How Ray, a Distributed AI Framework, Helps Power ChatGPT." https://thenewstack.io/how-ray-a-distributed-ai-framework-helps-power-chatgpt/ ↩
vLLM. (2025). "Distributed Inference and Serving." https://docs.vllm.ai/en/v0.8.0/serving/distributed_serving.html ↩
KubeRay. (2026). "Welcome to KubeRay." https://ray-project.github.io/kuberay/ ↩
Onehouse. (2025). "Ray vs Dask vs Apache Spark: Comparing Data Science & Machine Learning Engines." https://www.onehouse.ai/blog/apache-spark-vs-ray-vs-dask-comparing-data-science-machine-learning-engines ↩
GitHub. (2026). "ray-project/ray." https://github.com/ray-project/ray ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

What links here

AI Infrastructure Actor model Anyscale Dask Data preprocessing Dev tools Distributed training GPU cluster HarmBench Horovod Hyperparameter Tuning LLaMA-Factory Lepton AI Linux Foundation MLX PyTorch Lightning Ray Serve University of California, Berkeley nnsight

What is Ray used for?

History

Origins at UC Berkeley RISELab

What did the OSDI 2018 paper claim?

Growth and Ray 2.0

When did Ray join the PyTorch Foundation?

Architecture

Application Layer

System Layer

Cluster Architecture

Ray Core

Tasks

Actors

Objects

Ray Libraries

Ray Train

Ray Tune

Ray Serve

Ray Data

Ray RLlib

Anyscale

How much funding has Anyscale raised?

Leadership

Anyscale Platform

Use in LLM Training and Serving

Which companies use Ray?

Ray on Kubernetes (KubeRay)

Custom Resource Definitions

Features

How does Ray differ from Spark and Dask?

When to Choose Ray

Community and Adoption

Is Ray open source and how do I install it?

References

Improve this article

Related Articles

XLA (Accelerated Linear Algebra)

Supabase

Apache MXNet

Horovod

LanceDB

NVIDIA Dynamo

What links here

Related Articles

XLA (Accelerated Linear Algebra)

Supabase

Apache MXNet

Horovod

LanceDB

NVIDIA Dynamo

What links here