# Ray (framework)

> Source: https://aiwiki.ai/wiki/ray
> Updated: 2026-06-22
> Categories: AI Infrastructure, Developer Tools, Machine Learning, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

Ray is an open-source distributed computing framework, developed at the University of California, Berkeley's RISELab and commercialized by Anyscale, that lets developers scale Python and [artificial intelligence](/wiki/artificial_intelligence) applications from a laptop to a cluster of thousands of CPUs and GPUs with minimal code changes. Ray provides a universal API for distributed execution plus a suite of libraries for [machine learning](/wiki/machine_learning) workloads: distributed training (Ray Train), [hyperparameter](/wiki/hyperparameter) tuning (Ray Tune), model serving (Ray Serve), [reinforcement learning](/wiki/reinforcement_learning) (RLlib), and data processing (Ray Data). Introduced in a 2018 paper at the USENIX OSDI conference by Philipp Moritz, Robert Nishihara, and collaborators, Ray's original prototype demonstrated scaling to more than 1.8 million tasks per second [1]. As of early 2026 it had surpassed 237 million downloads and more than 41,000 GitHub stars, is licensed under the Apache License 2.0, and in October 2025 joined the [PyTorch](/wiki/pytorch) Foundation, where it sits alongside PyTorch and [vLLM](/wiki/vllm) as a foundational layer of the open-source AI compute stack [2].

## What is Ray used for?

Ray is used to scale the full [MLOps](/wiki/mlops) lifecycle: data preprocessing, [distributed training](/wiki/distributed_training), hyperparameter tuning, [reinforcement learning](/wiki/reinforcement_learning), batch inference, and online model serving. Its core value is that the same lightweight primitives (tasks, actors, and objects) that run a program on a single machine also run it across a cluster, so teams can move from prototype to production without rewriting their code for a separate distributed system. "With Ray, our goal is to make distributed computing as straightforward as writing Python code," said Robert Nishihara, co-founder of Anyscale, on Ray joining the PyTorch Foundation [2]. The framework is especially well suited to AI workloads that mix many short-lived parallel tasks with long-running stateful computation, a pattern common in [large language model](/wiki/large_language_model) training pipelines, reinforcement learning, and real-time serving.

## History

### Origins at UC Berkeley RISELab

Ray originated from research at UC Berkeley's RISELab (Real-time Intelligent Secure Execution Lab), a successor to the AMPLab that had produced Apache Spark. In 2016, graduate students Philipp Moritz and Robert Nishihara, working under professors Ion Stoica and Michael I. Jordan, began building a system to address the computational demands of emerging AI applications, particularly reinforcement learning. Traditional cluster computing systems like MapReduce and Spark were designed for bulk-synchronous parallel processing and could not handle the heterogeneous, fine-grained workloads common in RL training, which require millions of short-lived tasks mixed with long-running stateful computations [1].

The first public release of Ray (version 0.1) appeared in May 2017. Version 0.2 followed in September 2017, introducing improvements to the Plasma object store, an initial Jupyter-based web UI, early reinforcement learning support, and fault tolerance for actors. Version 0.3, released in November 2017, added distributed actor handles and introduced Ray Tune for hyperparameter search [3].

### What did the OSDI 2018 paper claim?

The foundational paper, "Ray: A Distributed Framework for Emerging AI Applications," was published at the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) in October 2018. The authors were Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. The paper argued that next-generation AI applications would need a system capable of supporting both stateless parallel tasks and stateful actor computations, with millisecond-level task latencies and the ability to scale to millions of tasks per second. The authors reported that their implementation could "scale beyond 1.8 million tasks per second" while outperforming existing specialized systems on several reinforcement learning workloads [1]. Ray's design met these requirements through a distributed scheduler, a distributed object store built on shared memory, and a Global Control Store (GCS) that maintains system metadata and enables fault tolerance [1].

### Growth and Ray 2.0

Ray continued to evolve through the late 2010s and early 2020s with growing adoption in both research and industry. In August 2022, Anyscale announced Ray 2.0 at the Ray Summit conference. Ray 2.0 introduced the Ray AI Runtime (AIR), a unified layer that brought Ray's separate libraries (Tune, Train, Serve, Data, RLlib) into a cohesive framework for end-to-end ML pipelines. This release also featured improvements to Ray Serve's Deployment Graph API and new algorithms in RLlib [4].

By 2025, Ray had reached version 2.50+ with steady improvements to performance, security (built-in token authentication), data processing (Iceberg support, expression framework enhancements), and Python compatibility. Python 3.9 support was dropped after Ray 2.51.x. As of March 2026, the latest stable release is Ray 2.53.0 [5].

### When did Ray join the PyTorch Foundation?

On October 22, 2025, the PyTorch Foundation announced at the PyTorch Conference that Ray would join The Linux Foundation as part of the PyTorch Foundation. At the time of the announcement Ray had reached 237 million downloads, and the foundation described the move as completing a critical layer of the open-source AI stack. This positioned Ray alongside [PyTorch](/wiki/pytorch) for model development and [vLLM](/wiki/vllm) for inference, forming an integrated open-source foundation for the AI compute stack [2].

## Architecture

Ray's architecture is divided into two layers: the application layer and the system layer.

### Application Layer

The application layer consists of three components:

- **Driver**: The process that runs the user's main program. It is responsible for submitting tasks and creating actors.
- **Worker**: A stateless process that executes tasks (remote functions). Workers are created by the system and can run on any node in the cluster.
- **Actor**: A stateful process that exposes methods for remote invocation. Each actor runs on a dedicated worker and maintains internal state between method calls.

### System Layer

The system layer has three major components:

**Global Control Store (GCS)**: A key-value store with pub-sub functionality that maintains the entire control state of the system. The GCS stores task specifications, remote function code, computation graph metadata, object location tables, and scheduling events. It uses sharding for scalability and per-shard chain replication for fault tolerance. By centralizing control state in the GCS and keeping all other components stateless, Ray achieves clean separation between state management and computation [1].

**Distributed Object Store**: Ray uses a shared-memory object store (originally forked from Apache Arrow's Plasma store) that runs on every node. Objects created by tasks and actors are stored as immutable entries in this distributed store, enabling zero-copy reads for processes on the same node via shared memory. When objects are needed on a different node, they are transferred over the network. The immutability of objects simplifies consistency and enables lineage-based fault recovery [1].

**Distributed Scheduler**: Ray employs a bottom-up hierarchical scheduling architecture. Each node has a local scheduler, and there is a global scheduler for the cluster. Tasks created on a node are first submitted to the local scheduler. If the local scheduler determines that the task cannot run locally (due to resource constraints or data locality), it forwards the task to the global scheduler, which uses load information and object location metadata from the GCS to place the task on an appropriate node. This design avoids the bottleneck of a centralized scheduler while still enabling cluster-wide optimization [1].

### Cluster Architecture

A Ray cluster consists of a head node and zero or more worker nodes. The head node runs the same workloads as worker nodes but also hosts cluster-level services including the GCS, the autoscaler, and the Ray dashboard. The autoscaler monitors pending resource requests and automatically adds or removes nodes to match workload demand. Ray supports heterogeneous hardware, allowing clusters to mix CPU-only nodes with [GPU](/wiki/gpu)-equipped nodes (including multi-GPU machines) [6].

## Ray Core

Ray Core is the foundational distributed runtime that provides three simple but powerful primitives for building distributed applications.

### Tasks

Tasks are stateless function invocations that execute remotely. Any Python function can be turned into a Ray task by decorating it with `@ray.remote`. Once decorated, the function is called using `.remote()` instead of a direct invocation, and it returns a future (an object reference) rather than the result itself. Tasks can be scheduled across any available node in the cluster and can specify resource requirements such as CPU cores or GPU fractions.

```python
import ray

ray.init()

@ray.remote
def square(x):
    return x * x

futures = [square.remote(i) for i in range(4)]
results = ray.get(futures)  # [0, 1, 4, 9]
```

### Actors

Actors are stateful computations implemented as Python classes decorated with `@ray.remote`. When an actor is instantiated, Ray creates a dedicated worker process that maintains the actor's state. Methods on the actor can be called remotely, and each method invocation is executed sequentially on that worker, preserving state consistency. Actors are suitable for scenarios requiring mutable state, such as parameter servers, simulators, or long-running services.

```python
@ray.remote
class Counter:
    def __init__(self):
        self.value = 0

    def increment(self):
        self.value += 1
        return self.value

counter = Counter.remote()
ray.get(counter.increment.remote())  # 1
ray.get(counter.increment.remote())  # 2
```

### Objects

Objects are immutable values stored in Ray's distributed object store. When a task or actor method returns a value, it is placed in the object store and an object reference is returned to the caller. Object references can be passed to other tasks or actors, enabling data to flow through a computation graph without requiring explicit serialization by the user. The `ray.put()` function allows users to explicitly place large objects into the store for efficient sharing across tasks [7].

## Ray Libraries

Ray includes a set of high-level libraries (collectively known as the Ray AI Libraries) built on top of Ray Core. Each library addresses a specific phase of the ML lifecycle.

### Ray Train

Ray Train is a library for [distributed training](/wiki/distributed_training) that abstracts the complexities of multi-node, multi-GPU training. It supports [PyTorch](/wiki/pytorch) (with `TorchTrainer`), [TensorFlow](/wiki/tensorflow)/Keras (with `TensorFlowTrainer`), and other frameworks.

For PyTorch workloads, Ray Train handles `torch.distributed` initialization, NCCL backend setup, and `DistributedDataParallel` (DDP) wrapping behind the scenes. For models too large to fit on a single GPU, it supports `FullyShardedDataParallel` (FSDP) to shard parameters, gradients, and optimizer states across workers. For TensorFlow, it configures `TF_CONFIG` and supports `MultiWorkerMirroredStrategy` [8].

Key features of Ray Train include:

- **Fault tolerance**: When a worker process crashes, Ray Train restarts it and resumes training from the most recent checkpoint. If an entire node goes down, a replacement is provisioned and training recovers automatically.
- **Data splitting**: Training data is automatically partitioned across workers on the fly, so only the fraction needed by each worker is transferred and read, enabling training on datasets larger than any single node's memory.
- **Checkpoint management**: Integrated checkpointing allows models to be saved periodically and restored on failure or for later evaluation.

### Ray Tune

Ray Tune is a library for [hyperparameter](/wiki/hyperparameter) tuning and experiment execution at any scale. It supports running experiments from a single machine to a large distributed cluster with minimal code changes.

Tune provides a pluggable architecture for search algorithms and trial schedulers:

| Component | Description | Examples |
|-----------|-------------|----------|
| Search Algorithms | Suggest hyperparameter configurations to evaluate | Random Search, Bayesian Optimization (BayesOpt), Optuna, Ax, BOHB, Nevergrad |
| Trial Schedulers | Decide whether to stop, pause, or modify running trials based on intermediate results | ASHA (Asynchronous Successive Halving), HyperBand, Population Based Training (PBT), FIFO (default) |
| Reporters | Log and display metrics during the tuning process | CLIReporter, JupyterNotebookReporter |

When no search algorithm is specified, Tune defaults to random search. When no scheduler is specified, it uses a FIFO scheduler that runs trials in the order they were selected without early stopping. For efficient large-scale tuning, Tune also offers utilities such as `ConcurrencyLimiter` (to cap concurrent trials) and `Repeater` (to run each configuration with multiple random seeds) [9].

### Ray Serve

Ray Serve is a framework-agnostic library for online model serving and inference. It can serve models built with [PyTorch](/wiki/pytorch), [TensorFlow](/wiki/tensorflow), [Scikit-learn](/wiki/scikit_learn), XGBoost, and arbitrary Python logic. Built on Ray's actor model, Ray Serve enables complex inference pipelines where multiple models and business logic are composed into a directed acyclic graph (DAG).

Core features include:

- **Autoscaling**: Serve monitors request load and scales the number of model replicas up or down automatically.
- **Dynamic request batching**: Incoming requests are batched on the fly to maximize GPU throughput.
- **Fractional GPU allocation**: Multiple models can share a single GPU by requesting fractional GPU resources, reducing infrastructure costs.
- **Multi-model composition**: The Deployment Graph API allows chaining multiple models and preprocessing steps into a single inference pipeline.
- **Response streaming**: For [large language models](/wiki/large_language_model), Serve supports streaming responses token by token.

Ray Serve has developed specialized support for [LLM](/wiki/large_language_model) serving through the Ray Serve LLM APIs, which provide OpenAI-compatible endpoints, integration with [vLLM](/wiki/vllm) as a backend engine, multi-node and multi-GPU serving, and advanced parallelism strategies including pipeline parallelism, [tensor parallelism](/wiki/model_parallelism), expert parallelism (for [Mixture of Experts](/wiki/mixture_of_experts) models), and prefill-decode disaggregation [10].

### Ray Data

Ray Data is a library for scalable data processing designed for AI workloads. It features a streaming execution engine that processes data through pipelines of operators, where each operator can be scaled independently. This streaming model keeps both CPUs and GPUs active, reducing idle time during training or inference.

Ray Data supports reading from and writing to common formats (Parquet, CSV, JSON, images, Iceberg tables) and integrates with cloud storage providers (AWS S3, Google Cloud Storage, Azure Blob Storage). It provides first-class integration with AI frameworks including vLLM, PyTorch, [Hugging Face](/wiki/hugging_face), and TensorFlow [11].

The library uses a two-phase planning process: a logical plan (describing what operations to perform) is built first, then converted into a physical plan (specifying exactly how to execute those operations) when execution begins. Data processing is parallelized at the block level, with blocks distributed across the cluster for parallel execution.

Primary use cases include:

- **Offline batch inference**: Processing large datasets through ML models for batch predictions.
- **Data preprocessing for training**: [Feature engineering](/wiki/feature_engineering), augmentation, and data transformation at scale.
- **Last-mile data ingest**: Streaming preprocessed data into Ray Train for distributed training.

### Ray RLlib

RLlib is Ray's library for scalable [reinforcement learning](/wiki/reinforcement_learning). It supports production-level, fault-tolerant RL workloads with a modular architecture centered on the `Algorithm` class. An Algorithm manages `EnvRunner` actors for collecting training samples from the environment and `Learner` actors for computing gradients and updating models. The number of each actor type is independently configurable, allowing users to scale data collection and model updates separately [12].

RLlib supports a broad range of RL paradigms:

| Category | Description |
|----------|-------------|
| Model-free RL | On-policy (PPO, A2C) and off-policy (DQN, SAC, TD3) algorithms |
| Model-based RL | Algorithms that learn and plan with environment models |
| Offline RL | Training on pre-collected datasets (e.g., Critic Regularized Regression) |
| Multi-agent RL | Native support for cooperative, competitive, and mixed multi-agent settings with independent, shared, or centralized critics |
| Self-play | Adversarial training using self-play or league-based self-play |

RLlib's `RLModule` API supports arbitrary model architectures with PyTorch, complex multi-model setups, and multi-agent models with shared components between agents. It integrates with the Farama Foundation's Gymnasium, PettingZoo for multi-agent environments, and custom environment formats [12].

## Anyscale

Anyscale is the commercial company behind Ray, founded in 2019 by Ion Stoica, Robert Nishihara, and Philipp Moritz. The company is headquartered in San Francisco, California. Anyscale provides a managed cloud platform for running Ray workloads, offering features such as cluster management, autoscaling, monitoring, and enterprise support on top of the open-source framework [13].

### How much funding has Anyscale raised?

Anyscale has raised approximately $260 million across multiple funding rounds [14]:

| Round | Date | Amount | Lead Investors |
|-------|------|--------|----------------|
| Seed | 2019 | $20.6M | Not disclosed |
| Series B | 2020 | $40M | NEA |
| Series C | December 2021 | $100M | Andreessen Horowitz, Addition |
| Series C extension | August 2022 | $99M | Addition, Intel Capital |

The December 2021 Series C round valued Anyscale at $1 billion, making it a unicorn [14]. The August 2022 extension added a further $99 million from existing investors Addition, Intel Capital, and Foundation Capital, announced alongside the Ray 2.0 release at Ray Summit 2022 [4]. Investors across the rounds include Andreessen Horowitz, NEA, Addition, Intel Capital, and Foundation Capital.

### Leadership

Ion Stoica serves as Co-Founder and Executive Chairman. Stoica is also a UC Berkeley professor and co-creator of Apache Spark. Philipp Moritz serves as Co-Founder and CTO. Robert Nishihara, who served as CEO in the company's early years, transitioned to a product-focused role in 2024. Keerti Melkote, who previously founded Aruba Networks and led it through its 2007 IPO and 2015 acquisition by Hewlett Packard Enterprise, became CEO of Anyscale in 2024 [15].

### Anyscale Platform

The Anyscale Platform provides a managed runtime for Ray with additional enterprise features. At Ray Summit 2025, Anyscale announced new capabilities including lineage tracking, a new Anyscale Runtime API-compatible engine, Azure support, and a Global Resource Scheduler with multi-resource cloud support. The platform also integrates with NVIDIA NeMo for [LLM](/wiki/large_language_model) fine-tuning and customization [16].

## Use in LLM Training and Serving

Ray has become a central component in the infrastructure used to train and serve [large language models](/wiki/large_language_model). Its ability to coordinate distributed computation across thousands of GPUs makes it well suited for the massive scale of modern LLM workloads.

### Which companies use Ray?

**OpenAI** used Ray to coordinate the training of [ChatGPT](/wiki/chatgpt), including [GPT-3.5](/wiki/gpt-3.5) and [GPT-4](/wiki/gpt-4). Ray Train handled the orchestration of distributed training workers across large GPU clusters [17].

**[Cohere](/wiki/cohere)** uses Ray on top of TPU v4 pods for distributing and scheduling tasks across many TPU hosts as part of their LLM training pipeline, alongside [PyTorch](/wiki/pytorch), [JAX](/wiki/jax), and TPU-specific tooling [17].

**[vLLM](/wiki/vllm)**, the widely adopted open-source LLM inference engine, uses Ray as its distributed runtime for multi-node inference. While vLLM can use native Python multiprocessing for single-node deployments, multi-node deployments require Ray. In a vLLM Ray cluster, the head node coordinates the cluster and serves the API endpoint, while worker nodes provide additional GPU resources for [tensor parallelism](/wiki/model_parallelism) and pipeline parallelism. Ray handles all inter-node communication on behalf of vLLM [18].

Other companies using Ray at scale include Uber, Shopify, Spotify, Pinterest, Instacart, and Roblox [13].

## Ray on Kubernetes (KubeRay)

KubeRay is the official Kubernetes operator for deploying and managing Ray clusters on Kubernetes. It is maintained in the `ray-project/kuberay` GitHub repository and provides a toolkit for running Ray applications in containerized environments.

### Custom Resource Definitions

KubeRay introduces three Kubernetes Custom Resource Definitions (CRDs):

| CRD | Purpose | Key Features |
|-----|---------|---------------|
| RayCluster | Manages the lifecycle of a Ray cluster | Cluster creation/deletion, autoscaling, fault tolerance |
| RayJob | Submits a one-off job to a Ray cluster | Automatically creates a RayCluster, submits the job, and optionally deletes the cluster when the job finishes |
| RayService | Manages Ray Serve deployments | Zero-downtime upgrades, high availability, combines a RayCluster with a Serve deployment graph |

### Features

KubeRay supports heterogeneous compute nodes (including GPUs), running multiple Ray clusters with different Ray versions in the same Kubernetes cluster, and optional autoscaling that sizes Ray clusters according to workload demand. Starting from KubeRay v1.3.0, users can use the `kubectl ray` plugin to simplify common workflows. KubeRay v1.4.0 introduced a dashboard for viewing and managing KubeRay resources [19].

KubeRay integrates with the broader Kubernetes ecosystem, including observability tools (Prometheus, Grafana), queuing systems (Volcano, Apache YuniKorn, Kueue), and ingress controllers (Nginx). Major cloud providers offer managed KubeRay support: Google Cloud provides Ray on GKE, and [Databricks](/wiki/databricks) offers Ray on Databricks [19].

## How does Ray differ from Spark and Dask?

Ray operates in a space shared by several distributed computing frameworks, each with different design philosophies and strengths.

| Feature | Ray | Apache Spark | Dask |
|---------|-----|----|------|
| Primary focus | AI/ML workloads, general distributed Python | Large-scale data processing (ETL) | Scaling Python analytics (pandas, NumPy) |
| Task latency | Microsecond-level | Millisecond-level | Millisecond-level |
| Task throughput | Millions of tasks per second | Optimized for batch throughput | Moderate |
| Programming model | Tasks + Actors (stateful and stateless) | Functional transformations on RDDs/DataFrames | Task graphs on collections |
| Language | Python-first (also C++, Java) | Scala, Java, Python, R | Python |
| GPU support | Native, first-class GPU scheduling | Limited (via third-party libraries) | Basic GPU support |
| Fault tolerance | Lineage-based reconstruction via GCS | Lineage-based reconstruction via RDD lineage | Task graph recomputation |
| Ecosystem | Train, Tune, Serve, Data, RLlib | MLlib, Spark SQL, Structured Streaming | Dask ML, Dask DataFrame, Dask Array |
| Maturity | First released 2017 | First released 2010 | First released 2014 |
| Best for | Distributed ML training, RL, model serving, LLM inference | Production ETL pipelines, large-scale batch processing | Scaling pandas/NumPy workflows beyond single-machine memory |

Notably, these frameworks are not mutually exclusive. It is possible to run Spark on Ray (via the RayDP library), Ray on Spark (via Databricks' integration), and [Dask](/wiki/dask) on Ray (via the Dask-on-Ray scheduler). This interoperability allows organizations to adopt Ray incrementally alongside existing data infrastructure [20].

### When to Choose Ray

Ray is the strongest choice when workloads involve fine-grained task parallelism, stateful computations (actors), GPU-heavy ML training or inference, reinforcement learning, or real-time model serving. Its microsecond-level task scheduling overhead and shared-memory object store make it particularly efficient for workloads that require rapid communication between distributed components. For pure ETL or SQL-heavy analytics on petabyte-scale data, Spark remains the more proven option. For scaling pandas-style analytics on medium-sized data, Dask offers a lower barrier to entry [20].

## Community and Adoption

As of early 2026, the Ray project (hosted at `github.com/ray-project/ray`) has over 41,800 GitHub stars, 7,400 forks, and more than 1,400 contributors. The codebase is primarily written in Python (76%) and C++ (18%), with additional components in Java, TypeScript, Starlark, and Cython [21].

Ray's community gathers at the annual Ray Summit conference organized by Anyscale, with Ray Summit 2025 held November 3 to 5 in San Francisco [2]. The community also maintains a discussion board and a Slack workspace for developers and users.

Ray's adoption spans multiple sectors:

| Sector | Example Users | Use Cases |
|--------|---------------|----------|
| AI research labs | [OpenAI](/wiki/openai), Cohere | LLM training orchestration |
| E-commerce | Shopify, Instacart | ML pipelines, recommendation systems |
| Media and entertainment | Spotify, Roblox | Content recommendation, game AI |
| Ride-sharing and delivery | Uber | Large-scale ML training and serving |
| Social media | Pinterest | Visual search, recommendation models |
| Open-source AI | vLLM | Distributed LLM inference |

The framework's position within the PyTorch Foundation, alongside PyTorch and vLLM, solidifies its role as a foundational layer in the open-source AI compute stack [2].

## Is Ray open source and how do I install it?

Ray is free and open source under the Apache License 2.0, with development hosted on GitHub and governed under the PyTorch Foundation. It can be installed via pip:

```bash
pip install ray
```

To install specific library components:

```bash
pip install "ray[train]"   # For Ray Train
pip install "ray[tune]"    # For Ray Tune
pip install "ray[serve]"   # For Ray Serve
pip install "ray[data]"    # For Ray Data
pip install "ray[rllib]"   # For RLlib
pip install "ray[all]"     # All libraries
```

Ray supports Python 3.10 and later (Python 3.9 support was dropped after Ray 2.51.x). It runs on Linux, macOS, and Windows. Clusters can be deployed on bare metal, cloud VMs (AWS, GCP, Azure), or Kubernetes via KubeRay [5].

## References

1. Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M. I., & Stoica, I. (2018). "Ray: A Distributed Framework for Emerging AI Applications." 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18), pp. 561-577. https://www.usenix.org/conference/osdi18/presentation/moritz
2. PyTorch Foundation. (2025). "PyTorch Foundation Welcomes Ray to Deliver a Unified Open Source AI Compute Stack." https://pytorch.org/blog/pytorch-foundation-welcomes-ray-to-deliver-a-unified-open-source-ai-compute-stack/
3. Ray Project. (2017). "Ray: 0.2 Release." https://ray-project.github.io/2017/09/30/ray-0.2-release.html
4. Anyscale. (2022). "Anyscale Unveils Ray 2.0 and Anyscale Innovations at Ray Summit 2022; Adds an Additional $99M Funding from Existing Investors Addition, Intel Capital, and Foundation Capital." https://www.anyscale.com/blog/anyscale-unveils-ray-2-0-and-anyscale-innovations-at-ray-summit-2022-adds-an
5. Ray Project. (2026). "Welcome to Ray! Ray Documentation." https://docs.ray.io/en/latest/index.html
6. Ray Project. (2026). "Ray Clusters Overview." https://docs.ray.io/en/latest/cluster/kubernetes/index.html
7. Ray Project. (2026). "Key Concepts: Ray Core." https://docs.ray.io/en/latest/ray-core/key-concepts.html
8. Ray Project. (2026). "Ray Train: Scalable Model Training." https://docs.ray.io/en/latest/train/train.html
9. Ray Project. (2026). "Ray Tune: Hyperparameter Tuning." https://docs.ray.io/en/latest/tune/index.html
10. Ray Project. (2026). "Ray Serve: Scalable and Programmable Serving." https://docs.ray.io/en/latest/serve/index.html
11. Ray Project. (2026). "Ray Data: Scalable Data Processing for AI Workloads." https://docs.ray.io/en/latest/data/data.html
12. Ray Project. (2026). "RLlib: Industry-Grade, Scalable Reinforcement Learning." https://docs.ray.io/en/latest/rllib/index.html
13. Anyscale. (2026). "About Us." https://www.anyscale.com/about-us
14. Intel Capital. (2021). "Anyscale Secures $100M Series C at $1B Valuation." https://www.intelcapital.com/anyscale-secures-100m-series-c-at-1b-valuation-to-radically-simplify-scaling-and-productionizing-ai-applications/
15. Anyscale. (2024). "Anyscale Names Industry Veteran Keerti Melkote Chief Executive Officer." https://www.anyscale.com/press/anyscale-names-industry-veteran-keerti-melkote-chief-executive-officer
16. Anyscale. (2025). "Ray Summit 2025: Anyscale Platform Updates." https://www.anyscale.com/blog/ray-summit-2025-anyscale-product-updates
17. The New Stack. (2023). "How Ray, a Distributed AI Framework, Helps Power ChatGPT." https://thenewstack.io/how-ray-a-distributed-ai-framework-helps-power-chatgpt/
18. vLLM. (2025). "Distributed Inference and Serving." https://docs.vllm.ai/en/v0.8.0/serving/distributed_serving.html
19. KubeRay. (2026). "Welcome to KubeRay." https://ray-project.github.io/kuberay/
20. Onehouse. (2025). "Ray vs Dask vs Apache Spark: Comparing Data Science & Machine Learning Engines." https://www.onehouse.ai/blog/apache-spark-vs-ray-vs-dask-comparing-data-science-machine-learning-engines
21. GitHub. (2026). "ray-project/ray." https://github.com/ray-project/ray