RAPIDS
Last reviewed
Jun 3, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,905 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 · 1,905 words
Add missing citations, update stale details, or suggest a clearer explanation.
RAPIDS is an open-source suite of GPU-accelerated software libraries for data science, analytics, and machine learning, developed and maintained by Nvidia. It lets data scientists run common end-to-end workflows, loading and wrangling tabular data, training classical machine-learning models, analyzing graphs, and visualizing results, entirely on graphics processing units (GPUs) rather than on the central processor, often delivering order-of-magnitude speedups on large datasets. RAPIDS is designed so that its libraries mirror the application programming interfaces (APIs) of the most widely used Python data tools, such as pandas, scikit-learn, and NetworkX, allowing practitioners to keep their existing code and habits while moving the computation to the GPU. It is built on CUDA, Nvidia's parallel-computing platform, and uses the Apache Arrow columnar memory format as a foundation for efficient in-memory data representation. [1][2]
Nvidia announced RAPIDS on October 10, 2018, at the GPU Technology Conference (GTC) Europe in Munich, where chief executive Jensen Huang introduced it as a platform to bring GPU acceleration to the parts of high-performance computing that had so far run only on CPUs. In the announcement Huang argued that "data analytics and machine learning are the largest segments of the high performance computing market that have not been accelerated, until now." The software was developed over roughly the preceding two years by Nvidia engineers working with open-source contributors, and it was released under the Apache 2.0 license with the code immediately available on the project site at rapids.ai. [1][2]
From the outset Nvidia positioned RAPIDS as an open-source project rather than a proprietary product, building on and contributing back to established projects including Apache Arrow, pandas, and scikit-learn. The launch was accompanied by a broad list of ecosystem and hardware partners, among them Hewlett Packard Enterprise, IBM, Oracle, Databricks, Anaconda, Cisco, Dell EMC, Lenovo, NetApp, Pure Storage, SAP, and SAS, signaling that the suite was intended to plug into existing enterprise data-science and infrastructure stacks. An early headline benchmark, training the XGBoost gradient-boosting algorithm on an Nvidia DGX-2 system, showed roughly a 50-times speedup over CPU-only systems, reducing training that took days to hours, or hours to minutes, depending on dataset size. [1][3]
Since launch, RAPIDS has followed a frequent, calendar-versioned release cadence (for example releases numbered by year and month, such as 24.02 or 25.02), steadily adding libraries, broadening hardware support, and introducing the zero-code-change accelerators described below. [4]
RAPIDS is not a single library but a collection of interoperable components, most of them prefixed "cu" to denote their CUDA foundation. The core design principle is API compatibility: each GPU library is meant to behave like a familiar CPU counterpart from the PyData ecosystem, so that adopting RAPIDS does not require learning an entirely new programming model. The following table maps the principal RAPIDS libraries to their CPU-side equivalents.
| RAPIDS library | Purpose | CPU / PyData equivalent |
|---|---|---|
| cuDF | GPU DataFrames for loading, joining, filtering, and aggregating tabular data | pandas (and Polars, Apache Spark) |
| cuML | Classical machine-learning algorithms (regression, clustering, trees, dimensionality reduction) | scikit-learn |
| cuGraph | Graph analytics on millions of nodes and edges | NetworkX |
| cuSpatial | Geospatial and vector GIS operations (point-in-polygon, spatial joins, distances, trajectories) | GeoPandas / shapely |
| cuxfilter | Cross-filtered interactive dashboards over large tabular datasets | Datashader / Bokeh dashboards |
| cuCIM | I/O and image-processing primitives for n-dimensional images, with a biomedical-imaging focus | scikit-image |
| cuVS | Vector search and clustering, including the CAGRA index | FAISS / vector-search libraries |
| RMM | The RAPIDS Memory Manager, for fast device-memory allocation and pooling | (CUDA memory allocators) |
| RAFT | Reusable algorithmic primitives shared across the other libraries | (internal building blocks) |
cuDF is the cornerstone of the suite: a GPU DataFrame library that provides a pandas-like interface for reading data onto the GPU and performing operations such as filters, joins, group-bys, and column arithmetic, with its on-GPU data laid out in the Apache Arrow columnar format. cuML implements machine-learning algorithms, including linear and logistic regression, random forests, k-means clustering, k-nearest neighbors, principal component analysis (PCA), UMAP, and HDBSCAN, behind an API that matches scikit-learn. cuGraph brings graph analytics to the GPU with algorithms for centrality, community detection, traversal, and link analysis, and integrates with NetworkX and with graph neural network frameworks. cuSpatial accelerates geospatial workloads, while cuxfilter connects GPU DataFrames to charting libraries to build interactive dashboards over datasets of 100 million rows or more, and cuCIM targets image and signal processing, notably in biomedical imaging. Supporting these are lower-level components such as RMM for memory management, RAFT for shared algorithmic primitives, KvikIO for high-throughput file input/output using GPUDirect Storage, and cuVS for vector search. An older signal-processing library, cuSignal, has since been folded into the CuPy project. [2][4][5]
Two technologies underpin RAPIDS. The first is CUDA, the parallel-computing platform and programming model that exposes the thousands of cores on an Nvidia GPU; every RAPIDS library ultimately compiles its operations down to CUDA kernels, and the suite is frequently described as part of Nvidia's broader "CUDA-X" family of domain libraries. The second is the Apache Arrow columnar memory format, a language-independent standard for representing structured data in memory as contiguous columns. By adopting Arrow, RAPIDS gives its DataFrames a layout that is both efficient for the data-parallel access patterns GPUs excel at and interoperable with other Arrow-based tools, reducing the cost of moving data between libraries. Together, CUDA provides the compute and Arrow provides the common data representation that lets cuDF, cuML, cuGraph, and the rest share data on the GPU without expensive format conversions. [2][6]
A defining feature of modern RAPIDS is a set of "zero-code-change" accelerators that let existing CPU-based scripts run on the GPU without rewriting them. The flagship is cudf.pandas, a pandas accelerator mode that reached general availability in RAPIDS version 24.02, announced at GTC in March 2024. Once enabled, pandas operations execute on the GPU through cuDF when possible and fall back transparently to ordinary pandas on the CPU when an operation is unsupported, synchronizing data between the two as needed so that results are identical to running pandas alone. It is turned on either with the Jupyter notebook magic %load_ext cudf.pandas or by running a script with python -m cudf.pandas, after which unmodified import pandas code is accelerated. On a five-gigabyte workload from the DuckDB database-like operations benchmark, Nvidia reported cudf.pandas running roughly 150 times faster than pandas v2.2. [7][8]
The same pattern has been extended across the suite. cuML added a zero-code-change accelerator, cuml.accel, released as an open beta with cuML 25.02 in March 2025; enabled via %load_ext cuml.accel or python -m cuml.accel, it transparently speeds up popular scikit-learn algorithms (such as random forests, k-nearest neighbors, PCA, and k-means) as well as UMAP and HDBSCAN, falling back to CPU execution for anything unsupported. Nvidia reported representative speedups including roughly 25 times for random-forest training, about 50 times for t-SNE, up to 60 times for UMAP, and up to 175 times for HDBSCAN. cuGraph provides a zero-code-change backend for NetworkX (nx-cugraph), which became part of the headline features of the RAPIDS 24.10 release in October 2024 and lets NetworkX programs run graph algorithms on the GPU unchanged. RAPIDS also powers the Polars GPU engine: developed jointly with the Polars project and built on cuDF, it launched in open beta in September 2024 and accelerates Polars queries simply by selecting the GPU engine, with Nvidia citing up to roughly 13 times faster workflows than CPU execution. [9][10][11]
| Accelerator | Library accelerated | How it is enabled | Status / first availability |
|---|---|---|---|
| cudf.pandas | pandas | %load_ext cudf.pandas or python -m cudf.pandas | Generally available, RAPIDS 24.02 (March 2024) |
| cuml.accel | scikit-learn, UMAP, HDBSCAN | %load_ext cuml.accel or python -m cuml.accel | Open beta, cuML 25.02 (March 2025) |
| nx-cugraph | NetworkX | set the cuGraph backend for NetworkX | RAPIDS 24.10 (October 2024) |
| Polars GPU engine | Polars | select the GPU engine when collecting a query | Open beta (September 2024) |
A single GPU has a fixed amount of high-bandwidth memory, so RAPIDS relies on Dask, a Python library for parallel and distributed computing, to scale beyond it. The dask-cudf library extends cuDF DataFrames across many GPUs and across multiple machines, partitioning the data and coordinating computation, while dask-cuda provides utilities for launching and managing Dask worker processes on multi-GPU systems. This combination lets a RAPIDS workflow grow from a single desktop GPU to a multi-node, multi-GPU (commonly abbreviated MNMG) cluster while keeping a familiar DataFrame-style interface. cuML and cuGraph likewise offer Dask-backed, distributed implementations of many algorithms. For high-speed communication between workers, RAPIDS can use UCX-based transports over interconnects such as InfiniBand and NVLink. [4][5]
RAPIDS is open source under the Apache 2.0 license and is developed in the open across many repositories under the rapidsai organization on GitHub, where Nvidia engineers and external contributors file issues, submit pull requests, and discuss design. The project deliberately tracks and integrates with the wider PyData ecosystem rather than replacing it: it mirrors the pandas, scikit-learn, and NetworkX APIs, interoperates through Apache Arrow, plugs into Dask for scaling, and provides an accelerator plug-in, the RAPIDS Accelerator for Apache Spark, that offloads Spark SQL and DataFrame operations to GPUs. RAPIDS is distributed through conda and pip packages and as prebuilt Docker container images, and it is included and supported as part of NVIDIA AI Enterprise, the company's supported software platform, which adds certified configurations and enterprise support on top of the open-source libraries. [4][12][13]
RAPIDS represents a deliberate effort to extend GPU acceleration, long established in deep-learning training, into the broader and historically CPU-bound world of data preparation, classical machine learning, and analytics, which together make up a large share of practical data-science work. By matching the APIs of the tools data scientists already use and, increasingly, by removing the need to change any code at all, it lowers the barrier to GPU adoption for that audience. Its open-source model and tight integration with the PyData stack have made it a reference point for GPU-accelerated data science and a building block within Nvidia's wider CUDA-X and AI Enterprise software strategy, where fast, end-to-end data pipelines feed the model-training and inference systems that run on the same GPUs. [2][13]