| Anyscale, Inc. |
|---|
| Type |
| Industry |
| Founded |
| Founders |
| Headquarters |
| CEO |
| Key product |
| Total funding |
| Valuation |
| Website |
Anyscale is an American technology company that develops the Anyscale Platform, a fully managed compute platform built on top of Ray, an open-source distributed computing framework for scaling artificial intelligence and Python applications. Founded in 2019 by the creators of Ray at the University of California, Berkeley, Anyscale provides infrastructure for distributed machine learning training, model serving, batch inference, and fine-tuning of large language models. The company's technology is used by organizations including OpenAI, Uber, Shopify, Spotify, Instacart, Netflix, and ByteDance.
Anyscale's roots trace back to 2016, when Robert Nishihara and Philipp Moritz, then PhD students at the University of California, Berkeley's RISELab (Real-time Intelligent Secure Execution Laboratory), began developing Ray as a class project. The initial goal was to build a framework for distributed training of machine learning models, particularly for reinforcement learning applications that required both task-parallel and actor-based computations.
Nishihara and Moritz worked under the guidance of UC Berkeley computer science professors Ion Stoica and Michael I. Jordan. Stoica, who also co-founded Databricks and co-created Apache Spark, brought deep expertise in distributed systems and cloud computing. Jordan, the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics, contributed foundational knowledge in machine learning and optimization theory.
The Ray framework was formally introduced in the 2017 paper "Ray: A Distributed Framework for Emerging AI Applications" (arXiv:1712.05889), and the work was presented at the USENIX Symposium on Operating Systems Design and Implementation (OSDI) in 2018. The paper described Ray's unified interface for expressing both task-parallel and actor-based computations, supported by a distributed scheduler and a distributed, fault-tolerant store for managing system control state.
Anyscale was officially founded in 2019 and headquartered in Berkeley, California. The co-founders were Robert Nishihara (CEO at launch), Philipp Moritz (CTO), Ion Stoica, and Michael I. Jordan. The company's mission was to "make scalable computing effortless" by building a commercially supported platform around the open-source Ray framework.
In December 2019, Anyscale launched publicly with a $20.6 million Series A funding round led by Andreessen Horowitz (a16z). Additional participants included NEA (New Enterprise Associates), Intel Capital, Ant Financial, Amplify Partners, 11.2 Capital, and The House Fund. The funding was announced alongside the company's goal of democratizing distributed programming for developers who lacked specialized expertise in distributed systems.
Less than a year after its Series A, Anyscale raised $40 million in a Series B round in October 2020, led by existing investor NEA with participation from Andreessen Horowitz, Intel Capital, and Foundation Capital. This round coincided with the launch of the Anyscale managed platform, which offered a hosted service for running Ray workloads without manual cluster management.
In December 2021, Anyscale closed the first tranche of its Series C funding, raising $100 million in a round led by NEA. This pushed the company to a $1 billion valuation, making it a unicorn. The investors in this round included Addition, Andreessen Horowitz, Intel Capital, and Foundation Capital.
In August 2022, Anyscale raised an additional $99 million in an extension of its Series C round, co-led by existing investors Addition, Intel Capital, and Foundation Capital. This brought total Series C funding to $199 million and total funding to approximately $281 million. The announcement coincided with the release of Ray 2.0 at the Ray Summit 2022 conference.
In July 2024, Anyscale appointed Keerti Melkote as its new CEO, succeeding Robert Nishihara. Melkote brought substantial enterprise experience to the role. He founded Aruba Networks in 2001, led the company through its IPO in 2007, and oversaw its acquisition by Hewlett Packard Enterprise (HPE) in 2015 for approximately $3 billion. Under his leadership at HPE, Aruba Networks grew to over $5 billion in revenue as he served as President of the Intelligent Edge division. Melkote's appointment followed a year of 4x revenue growth at Anyscale and reflected the company's pivot toward broader enterprise sales and adoption.
Ion Stoica transitioned to the role of Executive Chairman, while Robert Nishihara and Philipp Moritz continued in co-founder capacities.
Ray is an open-source distributed computing framework written in Python and C++. It provides a simple, universal API for building distributed applications that can scale from a single laptop to a large cluster. As of early 2026, the Ray repository on GitHub has over 40,000 stars, more than 1,000 contributors, and orchestrates more than one million clusters per month.
Ray's architecture is organized as a three-layer stack:
| Layer | Description |
|---|---|
| Ray AI Libraries | Domain-specific libraries for common ML tasks (training, tuning, serving, data processing, reinforcement learning) |
| Ray Core | General-purpose distributed computing foundation providing tasks, actors, and objects as primitives |
| Ray Clusters | Infrastructure layer consisting of a head node and worker nodes, with optional autoscaling |
A Ray cluster consists of a head node responsible for cluster management (including the Global Control Store, or GCS, for metadata) and a set of worker nodes that execute tasks. The system supports dynamic scheduling and automatic resource management across nodes.
Ray Core provides three fundamental building blocks:
| Primitive | Type | Description |
|---|---|---|
| Tasks | Stateless | Remote functions executed in the cluster; declared with the @ray.remote decorator and invoked with .remote() |
| Actors | Stateful | Long-running worker processes that maintain mutable state; instantiated from classes decorated with @ray.remote |
| Objects | Immutable | Values stored in Ray's distributed object store, referenced through object refs for inter-task communication |
Developers convert standard Python functions into distributed tasks by adding the @ray.remote decorator. For stateful computations, Ray actors provide persistent worker processes whose methods can access and modify internal state. Objects serve as shared, immutable data that can be passed between tasks and actors across the cluster.
Ray includes a suite of specialized libraries for machine learning workloads:
| Library | Purpose | Key features |
|---|---|---|
| Ray Data | Scalable data loading and transformation | Framework-agnostic preprocessing for training, tuning, and prediction pipelines |
| Ray Train | Distributed model training | Multi-node and multi-GPU training with fault tolerance; integrates with PyTorch, TensorFlow, and other frameworks |
| Ray Tune | Hyperparameter tuning | Scalable hyperparameter search with support for various optimization algorithms |
| Ray Serve | Model serving and inference | Scalable, programmable serving for real-time inference with low latency; integrates with vLLM |
| Ray RLlib | Reinforcement learning | Distributed RL training with support for a wide range of algorithms and environments |
These libraries are designed to work together within a single Ray application, allowing users to build end-to-end ML pipelines that cover data processing, training, tuning, and serving in one unified system.
Ray 2.0, released in August 2022, introduced several major improvements. The Ray AI Runtime (AIR) provided a unified toolkit for combining Ray's libraries into cohesive ML workflows. Ray 2.0 also added native support for shuffling 100 terabytes or more of data through Ray Datasets, improved Kubernetes support through the KubeRay operator, and enhanced the monitoring dashboard.
Subsequent releases continued to expand the framework's capabilities. Ray 2.3 (February 2023) added Python 3.11 support and Linux ARM64 builds. Ray 2.4 introduced dedicated infrastructure for LLM training, tuning, inference, and serving. Ray 2.7 added RayLLM for simplified large language model deployment and introduced major stability improvements to KubeRay. As of early 2026, the latest stable release is Ray 2.54.
Ray can be deployed across a variety of environments:
| Environment | Details |
|---|---|
| Local machine | Single-machine mode for development and testing |
| Cloud providers | AWS, Google Cloud, Microsoft Azure |
| Kubernetes | Via KubeRay on EKS, GKE, AKS, CoreWeave CKS, OKE, or on-premise clusters |
| Cluster managers | YARN, Slurm |
| Managed platforms | Anyscale Platform, Google Cloud Vertex AI, Databricks |
The Anyscale Platform is the company's commercial product: a fully managed service for running Ray workloads at scale. It abstracts away the complexity of cluster provisioning, scaling, monitoring, and failure recovery, allowing data scientists and ML engineers to focus on their application code rather than infrastructure.
Managed infrastructure. Anyscale handles all aspects of Ray cluster management, including provisioning, configuration, monitoring, and teardown. Users can migrate existing Ray workloads to Anyscale with no code changes.
Serverless autoscaling. The platform automatically scales clusters up or down based on workload demands, optimizing resource utilization and cost. Idle resources are released when not in use.
Production jobs and services. Anyscale supports both scheduled batch jobs and low-latency production services. The system automatically creates and manages clusters for job execution and tears them down upon completion.
Cost tracking. A unified dashboard provides visibility into compute expenses organized by jobs, clusters, and individual users.
Observability. Built-in Grafana dashboards enable monitoring of cluster health, job progress, and resource utilization. Users can also integrate their own observability tools.
APIs and SDKs. Anyscale provides programmatic interfaces for automating job submission, cluster management, and integration with CI/CD pipelines.
Anyscale offers two primary deployment models:
| Model | Description |
|---|---|
| Hosted (fully managed) | Anyscale manages all infrastructure on its own cloud; the simplest way to get started |
| Bring Your Own Cloud (BYOC) | Anyscale runs inside the customer's own AWS, GCP, or Azure account, providing greater control over data residency and security; customers can also leverage existing GPU reservations with their cloud provider |
The BYOC model involves a platform fee on top of the customer's existing cloud provider costs. Billing for BYOC deployments is typically handled through cloud marketplaces. Anyscale also supports Kubernetes-based deployments on Amazon EKS, Google GKE, Azure AKS, CoreWeave CKS, and Oracle OKE.
Anyscale uses a pay-as-you-go pricing model with no monthly fixed fees. Users pay per-hour charges for compute instances, with rates varying based on hardware type (especially GPU model and specifications), cloud provider, and region. A $100 credit is provided for new users to get started. Enterprise customers can negotiate committed contracts with volume discounts. Anyscale is also available through the Microsoft Azure Marketplace.
Anyscale has positioned itself as a platform for building, fine-tuning, and serving large language models. The platform provides integrated tools for the full LLM lifecycle.
Anyscale supports both full fine-tuning and parameter-efficient methods for adapting pre-trained models to specific tasks:
| Method | Description |
|---|---|
| Full fine-tuning | Updates all model parameters; requires more compute but can achieve the highest quality |
| LoRA | Low-Rank Adaptation; trains small adapter layers while freezing the base model |
| QLoRA | Quantized LoRA; combines 4-bit quantization with LoRA for reduced memory usage |
| Freeze-tuning | Freezes most layers and trains only a subset of parameters |
Distributed training is supported through strategies such as FSDP (Fully Sharded Data Parallel), DeepSpeed, and Megatron for efficient scaling across multiple GPUs. Monitoring integrations with Weights & Biases, MLflow, and TensorBoard allow users to track training metrics and debug performance issues.
Anyscale has published case studies showing that users can fine-tune a 6-billion-parameter LLM for less than $7 using their platform.
Anyscale combines Ray Serve for orchestration with vLLM for high-performance inference. This integration provides features such as continuous batching, PagedAttention for efficient memory management, streaming output, and multi-model serving from a single cluster. The serving layer is compatible with the OpenAI API format, allowing developers to switch from proprietary APIs with minimal code changes.
Anyscale Endpoints is a serverless API service for accessing open-source large language models. Launched in 2023, the service offered hosted access to models from the Llama, Mistral, and CodeLlama families, among others. Anyscale marketed the service as up to 10x more cost-effective than comparable proprietary solutions, with pricing at $1 per million tokens for the 70-billion-parameter Llama 2 model.
The service supported the OpenAI API format, enabling straightforward migration from OpenAI's API. Anyscale also offered fine-tuning through the Endpoints platform, allowing users to customize open-source models with their own data.
As of August 2024, Anyscale Endpoints was integrated into the Anyscale Platform as a unified offering rather than a standalone service.
Anyscale supports large-scale batch inference through Ray Data, which enables offline processing of large datasets with LLMs. The platform can distribute inference workloads across multiple GPUs and nodes, handling tasks such as text classification, summarization, entity extraction, and embedding generation at scale.
Ray and the Anyscale Platform are used by a wide range of technology companies and enterprises. Some of the most notable users include:
| Company | Use case |
|---|---|
| OpenAI | Used Ray to train its largest models, including those powering ChatGPT; Ray distributes training work across thousands of CPUs and GPUs |
| Uber | ML platform infrastructure for scaling machine learning workloads |
| Shopify | Product classification and attribute tagging using hundreds of ML models |
| Spotify | ML infrastructure scaling |
| Instacart | Demand forecasting and ML training acceleration (reported 10x speedup over previous tools) |
| Netflix | ML and data processing workloads |
| ByteDance | Large-scale ML infrastructure |
| ML infrastructure scaling | |
| Roblox | ML workload management |
| Coinbase | AI workloads including training, inference, and data processing |
| Canva | AI-powered features |
| Cruise | Autonomous vehicle ML pipelines |
| Lyft | ML platform infrastructure |
| Amazon | AI development workloads |
OpenAI's use of Ray received particular attention when it became publicly known that the framework powered much of the training infrastructure behind ChatGPT and other OpenAI models. At Ray Summit 2022, OpenAI engineers discussed how they used Ray to coordinate thousands of hardware components during model training, managing data distribution, failure recovery, and resource allocation across the cluster.
Ray Summit is Anyscale's annual conference, bringing together AI engineers, researchers, and open-source contributors. The conference features keynotes, technical talks, workshops, and community networking.
| Year | Location | Highlights |
|---|---|---|
| 2021 | Virtual | Early community gathering |
| 2022 | San Francisco | Ray 2.0 announcement, $99M funding, OpenAI keynote |
| 2023 | San Francisco | Growth of LLM use cases on Ray |
| 2024 | San Francisco (Sep 30 - Oct 2) | Community growth to 1,000+ contributors, 1M+ clusters per month |
| 2025 | San Francisco (Nov 3-5) | Focus on evolution of Ray ecosystem, open-source AI stack, new Anyscale platform capabilities; keynotes from Meta, NVIDIA, UC Berkeley, Azure |
Anyscale operates in the AI infrastructure and MLOps market, competing with several categories of platforms:
| Competitor | Differentiator compared to Anyscale |
|---|---|
| Databricks | End-to-end data and AI platform built around a governed lakehouse; integrates Spark, MLflow, and data governance; anchors ML in a unified data platform |
| AWS SageMaker | Amazon's managed ML platform deeply integrated with AWS services (IAM, VPC, S3); covers training, deployment, pipelines, and monitoring |
| Modal | Serverless Python platform for ML and data workloads; emphasizes simplicity with automatic packaging, scaling, and pay-per-second billing |
| Google Cloud Vertex AI | Google's managed ML platform with built-in Ray support; integrates with GCP services and TPU infrastructure |
| Northflank | Cloud deployment platform with predictable pricing and container-based workload management |
Anyscale differentiates itself through its tight integration with the open-source Ray ecosystem, its focus on distributed compute as a core strength rather than an add-on, and its cloud-agnostic approach that avoids lock-in to a single cloud provider.
Anyscale's founding team has made significant contributions to systems research and the broader AI ecosystem:
Ion Stoica is a Romanian-American computer scientist, a professor in the EECS Department at UC Berkeley, and the Director of SkyLab. He co-founded both Databricks (with the creators of Apache Spark) and Conviva. He is an ACM Fellow, a member of the National Academy of Sciences, the National Academy of Engineering, and the American Academy of Arts and Sciences. In 2022, Forbes ranked him as one of the wealthiest Romanians with a net worth of $1.6 billion.
Michael I. Jordan is the Pehong Chen Distinguished Professor at UC Berkeley in both the EECS and Statistics departments. He is widely regarded as one of the most influential figures in machine learning and artificial intelligence. He is a member of the National Academy of Sciences, the National Academy of Engineering, the American Academy of Arts and Sciences, and a Foreign Member of the Royal Society.
Robert Nishihara completed his PhD at UC Berkeley's RISELab, focusing on machine learning, optimization, and AI. He served as Anyscale's initial CEO and continues as co-founder.
Philipp Moritz completed his PhD in the EECS Department at UC Berkeley, with research interests spanning artificial intelligence, machine learning, and distributed systems. He serves as Anyscale's CTO.
Ray is released under the Apache License 2.0 and maintains an active open-source community. The project is hosted on GitHub at ray-project/ray. As of 2024, the project has accumulated over 40,000 stars and more than 1,000 contributors.
The Ray ecosystem includes several related open-source projects and integrations:
| Project | Description |
|---|---|
| KubeRay | Kubernetes operator for deploying and managing Ray clusters on K8s |
| vLLM | High-performance LLM inference engine that integrates with Ray Serve for distributed serving |
| Ray on Vertex AI | Google Cloud's managed Ray integration within the Vertex AI platform |
| Ray on Databricks | Integration for running Ray workloads on the Databricks platform |
| Ray Workflows | Durable execution engine for long-running, fault-tolerant workflows |
Ray's integration with vLLM is particularly significant for LLM deployment. While standalone vLLM handles single-node inference efficiently, Ray Serve adds multi-node scaling, autoscaling, fault tolerance, and distributed observability on top of vLLM, making it suitable for production-grade LLM serving.