# R (programming language)

> Source: https://aiwiki.ai/wiki/r_programming_language
> Updated: 2026-06-27
> Categories: Machine Learning, Programming Languages, Statistics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Python](/wiki/python), [Data science](/wiki/data_science), [Machine learning](/wiki/machine_learning), [Statistics](/wiki/statistics)*

**R** is a free, open-source programming language and software environment for statistical computing and graphics. It was created by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand, who began the project as a research experiment around 1991 and posted the first public binary in August 1993 [1][27]. R is a GNU project and a free implementation of the S language developed at Bell Laboratories in the 1970s by John Chambers and colleagues; it is distributed under the GNU General Public License and maintained by the R Core Team and the R Foundation for Statistical Computing [2][4]. Version 1.0.0, the first stable release, shipped on February 29, 2000 [27].

The defining feature of R is not the language itself, which is small and a bit eccentric, but the package ecosystem around it. The Comprehensive R Archive Network (CRAN), founded in 1997, hosts more than 24,000 contributed packages covering most of applied statistics, plus a separate ecosystem called Bioconductor (over 2,400 packages) for genomics and bioinformatics [3][5]. The tidyverse, a set of packages built around tidy-data principles by Hadley Wickham and collaborators at Posit (formerly RStudio), has become the de-facto modern dialect for data manipulation and visualization [7][23]. R sits next to [Python](/wiki/python) as one of the two dominant languages in [data science](/wiki/data_science). Python has clearly won the deep-learning and production-ML race, but R remains the first choice in academic statistics, biostatistics, clinical trials, official statistics, and much social-science work.

For [machine learning](/wiki/machine_learning) work, R has bindings to most of the same numerical engines Python users rely on. The `xgboost`, `lightgbm`, and `glmnet` packages share C++ cores with their Python siblings; `caret` and `tidymodels` give a unified interface to dozens of model families; `keras`, `tensorflow`, and `torch` provide [deep learning](/wiki/deep_learning) bindings against [TensorFlow](/wiki/tensorflow) and [PyTorch](/wiki/pytorch); and `reticulate` lets R and Python share data and call each other in the same session [13][15].

## ELI5: explain like I'm 5

Imagine a programmable calculator. You can ask it things like "draw me a graph of how ice-cream sales change with temperature" or "is the difference between these two groups bigger than I would expect by chance?" and it gives you back a chart or a number. R is that calculator, with tens of thousands of free add-ons for almost any kind of math, especially the kind statisticians use.

R is a sibling of [Python](/wiki/python). Python is a general-purpose language that is also good at math; R is a stats language that is also good at general purpose stuff if you push it. Biologists measuring gene expression, economists forecasting inflation, and grad students fitting mixed-effects models on Tuesday usually reach for R. People training huge neural networks or shipping web services usually reach for Python. The two worlds talk to each other, and most working analysts use both.

## What is R, and where did it come from?

### What is the S language, and how is R related to it?

R's lineage starts with S language, developed at Bell Labs by John Chambers, Rick Becker, Allan Wilks, and others starting around 1976. S was an interactive layer over existing Fortran statistical libraries, a way to do exploratory data analysis without recompiling a Fortran program every time you wanted a histogram. Chambers's stated aim was "to turn ideas into software, quickly and faithfully," and that line still describes the ambition of R better than any tagline R itself ever had [6].

S evolved through the 1980s; the third version ("new S," 1988) introduced the formula notation (`y ~ x1 + x2`) that survives in R today, and the fourth version (1998) brought the S4 object system. S was sold commercially as S-PLUS by StatSci (later Insightful, then TIBCO). Chambers received the 1998 ACM Software System Award and later joined the R Core Team, which is one of those rare cases where the creator of a language follows it into the open-source clone [6].

R's own designers were explicit that it is an S dialect built on a different engine. In their 1996 paper, Ihaka and Gentleman wrote that "in developing this new language, we sought to combine what we felt were useful features from two existing computer languages": Becker, Chambers, and Wilks's S and Steele and Sussman's Scheme [1]. They added that "we implemented the language by first writing an interpreter for a Scheme subset and then progressively mutating it to resemble S" [1]. That is why R looks like S on the surface but has Scheme-style lexical scoping and first-class functions underneath.

### When was R created, and who made it?

Ross Ihaka and Robert Gentleman were junior staff in the Statistics Department at the University of Auckland in the early 1990s, both teaching introductory statistics, both unhappy with the available tools. They wanted something free and small enough to run on the Macintosh computers in the teaching lab. They began writing an interpreter inspired by S but with a Scheme-like internal model (lexical scoping, first-class functions), describing the build as roughly "two years of part-time effort" [1]. The first binary was posted to the StatLib server in August 1993 with an announcement on the s-news mailing list [27]. Ihaka and Gentleman published the design rationale in 1996 in the Journal of Computational and Graphical Statistics under the title "R: A Language for Data Analysis and Graphics" [1]. The name R was chosen, in their words, "in part to acknowledge the influence of S and in part to celebrate our own efforts" (the language is also a play on the authors' shared first initial) [1].

R became a GNU project in December 1997, the same year CRAN was set up by Kurt Hornik and Friedrich Leisch at TU Wien [3][27]. Version 1.0 was released on February 29, 2000, a leap day chosen so the developers would not have to commit to a yearly anniversary [27]. Notable later versions include R 2.0.0 (October 2004), R 3.0.0 (April 2013, which broke binary backward compatibility for compiled packages), and R 4.0.0 (April 2020). Minor versions ship roughly once a year. R 4.5.0 "How About a Twenty-Six" was released April 11, 2025, and the current stable release is R 4.6.0 "Because it was There," released April 24, 2026; R release names are running jokes drawn from Peanuts strips [27].

### Who governs R?

The R Core Team, formed informally in 1997, holds commit rights to the language. As of the mid-2020s the team has roughly twenty members, including Ihaka, Gentleman, John Chambers, Brian Ripley, Kurt Hornik, Peter Dalgaard, Martin Maechler, Luke Tierney, Duncan Murdoch, and Tomas Kalibera [2]. The R Foundation for Statistical Computing, registered as a non-profit association under Austrian law in Vienna in April 2003, holds the copyrights and runs finances [4]. The R Consortium, a Linux Foundation project launched in 2015 with corporate members including Microsoft, Google, RStudio/Posit, and IBM, funds infrastructure and outreach but does not control the language [26].

## How is R designed as a language?

R is a multi-paradigm language: procedural, functional, object-oriented, with strong vector semantics and a heavy emphasis on interactive use. Most of the surface idioms come from S, the internals borrow from Scheme, and a handful of design choices come from APL [1].

### Vectors and recycling

The atomic unit in R is the vector. There are no scalars in the strict sense; a single number is a length-1 vector, and arithmetic is vectorized by default:

```
x <- c(1, 2, 3, 4)
y <- c(10, 20, 30, 40)
x + y  # returns 11 22 33 44
```

When vectors of different lengths interact, R recycles the shorter one to match. Most numerical work is done with vectors and matrices, and tight `for` loops are generally slower than vectorized expressions.

### Data frames and functional features

The core abstraction for tabular data is the data frame, a named list of equal-length vectors introduced in S in the late 1980s. It is what makes R feel like a statistics environment rather than a numeric one. The tidyverse `tibble` and Matt Dowle's `data.table` are alternative implementations with different performance and syntax trade-offs. Modern R also supports Arrow-backed data frames through the `arrow` package, which gives zero-copy interop with [Python](/wiki/python), pandas, and Apache Spark.

Functions in R are first-class objects. Closures capture their enclosing environment (lexical scoping, inherited from Scheme), `...` collects extra arguments to forward, and lazy evaluation defers argument evaluation until use [1]. Lazy evaluation enables some of R's most distinctive idioms, including the formula notation (`lm(y ~ x, data = df)` works because the formula object captures unevaluated expressions).

### What object systems does R have?

R has not one but several object systems, layered on top of each other for historical reasons:

| System | Year | Style | Used in |
|--------|------|-------|---------|
| S3 | 1988 (S), inherited by R | Generic-function dispatch on a `class` attribute, very lightweight | Most base R, `lm`, `glm`, `print`, `summary` |
| S4 | 1998 (S), 2001 (R) | Formal multiple-dispatch generics, slot-based | Bioconductor, lme4, sp |
| Reference classes (R5) | 2010 | Mutable Java-style classes, methods bound to objects | Some packages, mostly superseded |
| R6 | 2014 | Lightweight reference classes by Winston Chang | Shiny, plumber, modern object-oriented packages |
| S7 | 2023 | Joint successor to S3 and S4, designed by R Core and Posit | Experimental, slowly being adopted |

Most everyday R code uses S3, which is so minimal it almost feels like a convention rather than a system. S4 shows up in Bioconductor, where multiple dispatch is useful for biological data structures. R6 is the typical choice for stateful objects in modern packages [22].

### Pipes

The `magrittr` package by Stefan Milton Bache (2014) introduced the `%>%` pipe, modelled on F#'s forward pipe. It quickly became the standard in tidyverse code: `data %>% filter(x > 0) %>% mutate(y = x^2) %>% summarize(mean(y))` reads left to right rather than inside out. R Core added a native pipe `|>` in version 4.1.0 (May 2021), with slightly different semantics [27]. Most new code uses `|>`, but `%>%` is still everywhere because old code still works.

## What is CRAN, and how big is the R package ecosystem?

CRAN is R's distinguishing institution, mirrored on around 90 sites worldwide. Every package goes through automated checks (the `R CMD check` machinery) on multiple operating systems before being accepted, and packages that break their dependencies are nagged or pulled. As of 2026, CRAN hosts more than 24,000 contributed packages [3].

The quality bar is uneven by design. CRAN does not vet science or statistical correctness; it vets that the package builds, passes its own tests, and does not break other packages. The long tail includes both careful, peer-reviewed work and weekend hobby projects. The tidyverse, Bioconductor, and ROpenSci sit on top as quality filters [5].

### Major package families

| Family | Maintained by | Focus |
|--------|---------------|-------|
| base R | R Core | Language plus core stats: `lm`, `glm`, `aov`, `t.test`, etc. |
| Recommended packages | R Core | Ships with R: `MASS`, `Matrix`, `survival`, `nlme`, `lattice` |
| tidyverse | Posit (Wickham et al.) | Data wrangling: `dplyr`, `tidyr`, `ggplot2`, `purrr`, `readr`, `tibble`, `stringr`, `forcats` |
| data.table | Matt Dowle, Arun Srinivasan | Fast in-memory data tables with their own DSL |
| Bioconductor | Bioconductor Core | Genomics, microarrays, sequencing |
| tidymodels | Posit (Max Kuhn et al.) | Modern ML wrappers (`parsnip`, `recipes`, `rsample`, `yardstick`) |
| mlr3 | Bernd Bischl's group | Object-oriented ML framework |
| ROpenSci | ROpenSci collective | Peer-reviewed scientific packages |
| spatial | r-spatial team | `sf`, `terra`, `stars` for geospatial |

### What is Bioconductor?

Bioconductor is a parallel ecosystem to CRAN focused on bioinformatics, started in 2001 by Robert Gentleman and colleagues at the Fred Hutchinson Cancer Center [5]. It uses S4 heavily, has its own twice-yearly release cycle aligned with R minor versions, and hosts more than 2,400 packages for genomics, proteomics, flow cytometry, and single-cell analysis [5]. Papers in Nature Biotechnology that say "analysis was performed in R using DESeq2 / limma / edgeR / Seurat" are usually citing Bioconductor (Seurat is on CRAN, but the methodological neighborhood is the same).

## What is the tidyverse?

The tidyverse is the most influential thing to happen to R in the last fifteen years. The name was coined by Hadley Wickham around 2016, and the meta-package `tidyverse` was published to CRAN on September 15, 2016 [23][28]. The idea is that data should be in *tidy* form (one observation per row, one variable per column, one value per cell, per Wickham's 2014 paper) and that a small grammar of verbs should be enough to manipulate it [7].

The core packages are:

| Package | Purpose | First release |
|---------|---------|---------------|
| `ggplot2` | Layered grammar of graphics | 2007 |
| `dplyr` | Data manipulation verbs (`filter`, `mutate`, `summarize`, `group_by`, `arrange`, `select`) | 2014 |
| `tidyr` | Reshaping (wide-to-long, long-to-wide, pivoting, unnesting) | 2014 |
| `readr` | Fast text-file reading with type guessing | 2015 |
| `purrr` | Functional iteration (`map`, `walk`, `pmap`) | 2015 |
| `tibble` | Modern data-frame with cleaner printing and stricter rules | 2014 |
| `stringr` | String manipulation wrappers around `stringi` | 2009 |
| `forcats` | Factor (categorical) manipulation | 2016 |

Around the core sit `broom` (turn model objects into tidy tibbles), `lubridate` (dates and times), `rvest` (web scraping), `httr2` (HTTP), `dbplyr` (translate dplyr to SQL), and many others. The book that taught a generation of working data scientists tidyverse-style R is Wickham and Grolemund's *R for Data Science*, free online; the second edition (2023) is co-authored with Mine Cetinkaya-Rundel [8][9].

### What is ggplot2?

Wickham wrote ggplot2 as part of his 2008 PhD dissertation at Iowa State, building on Leland Wilkinson's 1999 book *The Grammar of Graphics*, and first released it on CRAN in June 2007 [10][11][29]. A plot is built up from layers (data, geometric mark, statistical transform, scale, coordinate system, facet) rather than chosen from a menu of plot types. A faceted scatter plot with a per-group smoother is a few lines:

```
library(ggplot2)
ggplot(mpg, aes(displ, hwy, color = class)) +
  geom_point() +
  geom_smooth(method = "loess") +
  facet_wrap(~ year)
```

The aesthetic is so distinctive that ggplot's gray-grid panels became almost a marker of "a chart from a data scientist" in 2010s journalism and academic papers. Extension packages include `patchwork` for composition, `gganimate` for animation, `ggrepel` for label placement, and `ggdist` for uncertainty viz.

## What are RStudio and Posit?

RStudio the integrated development environment was first released in February 2011 by RStudio Inc., a company founded by JJ Allaire (creator of ColdFusion and Open Live Writer) [30]. The IDE was an immediate hit. R had previously been used through plain editors plus a console, and RStudio bundled a code editor, console, plot pane, environment browser, package manager, and Sweave/knitr integration into one window. The free edition is open-source under the AGPL; commercial editions add team features [30].

In July 2022 RStudio Inc. renamed itself to Posit PBC (a registered Public Benefit Corporation), to signal that the company was no longer R-only and was investing seriously in [Python](/wiki/python) [23]. Posit now ships Posit Workbench (multi-language IDE), Posit Connect (publishing platform), and Posit Cloud (hosted environments), and it maintains the tidyverse, the `keras` and `tensorflow` R bindings, the `torch` R port, `reticulate`, Quarto, Shiny, and many other open-source projects [23][24][25].

## What tools does R provide for reports and web apps (R Markdown, Quarto, Shiny)?

Reproducible reporting in R has gone through several generations. Friedrich Leisch's `Sweave` (2002) wove R code into LaTeX. Yihui Xie's `knitr` (2011) generalized it to Markdown, HTML, and other formats. R Markdown, built on `knitr` and Pandoc by JJ Allaire, Yihui Xie, and the RStudio team starting in 2014, made literate programming the default for analyses, papers, books, and slides. *R for Data Science* and *Forecasting: Principles and Practice* are both written in R Markdown; the `bookdown` extension by Xie made book-length projects practical [8][20].

Quarto, released by Posit in 2022, is the next-generation rewrite. It is language-agnostic by design (R, Python, Julia, Observable JavaScript) and runs as a separate command-line tool rather than an R package [24]. The same `.qmd` file can render to HTML, PDF, Word, websites, books, slides, or dashboards. Quarto is one of the clearest artifacts of Posit's broader-than-R turn; scientific Python users have adopted it alongside Jupyter.

Shiny, released in 2012 by Joe Cheng at RStudio, is R's web application framework [25]. A Shiny app is an R script that defines a UI (HTML widgets) and a server function (reactive R code), with the framework keeping them in sync over a websocket. An analyst can build an interactive dashboard or modeling tool in R alone, no JavaScript required. Shiny is used heavily in pharma, finance, government, and academia. A companion Shiny for Python launched in 2022 [25].

## What statistical and machine-learning packages does R have?

R's classical statistics coverage is essentially exhaustive. Linear and generalized linear models (`lm`, `glm` in base), mixed-effects models (`lme4` by Doug Bates, Martin Maechler, Ben Bolker), survival analysis (`survival` by Terry Therneau), generalized additive models (`mgcv` by Simon Wood), Bayesian inference (`rstan`, `brms`, `cmdstanr`, `rjags`), and thousands more in econometrics, psychometrics, and ecology [2]. If a paper proposes a statistical method, there is a good chance one of the authors put a working R package on CRAN.

### Classical ML packages

For [machine learning](/wiki/machine_learning) specifically, R has long had wrappers around the standard algorithm families:

| Package | Author / year | Method |
|---------|---------------|--------|
| `randomForest` | Andy Liaw & Matthew Wiener (2002), wrapping Breiman & Cutler's Fortran code | [Random forest](/wiki/random_forest) |
| `ranger` | Marvin N. Wright (2015) | Faster random forests, especially for high dimensions |
| `xgboost` | Tianqi Chen et al. (2014) | [XGBoost](/wiki/xgboost) gradient boosting, R binding |
| `lightgbm` | Microsoft (2017) | [LightGBM](/wiki/lightgbm), R binding |
| `glmnet` | Friedman, Hastie, Tibshirani (2010) | Lasso and elastic-net regularized regression |
| `e1071` | David Meyer et al. | LIBSVM bindings, naive Bayes, k-means |
| `kernlab` | Alexandros Karatzoglou et al. | Kernel methods, SVM, kernel PCA |
| `nnet` | Brian Ripley | Single-hidden-layer neural networks (in base for decades) |
| `mboost` | Hothorn et al. | Model-based boosting |
| `MASS` | Venables & Ripley | LDA, QDA, polynomial regression, classic methods |

The `caret` package (Classification And REgression Training) was built by Max Kuhn starting in 2007 to wrap the dozens of inconsistent ML packages behind a single API: a unified `train()` function that handles preprocessing, resampling, hyperparameter tuning, and prediction across hundreds of models [12]. For about a decade, `caret` was how most R users did ML.

### Tidymodels and mlr3

Kuhn rewrote the framework with the same philosophy but tidyverse-native idioms; the result is `tidymodels`, a meta-package and collection of components (`parsnip` for models, `recipes` for preprocessing, `rsample` for resampling, `yardstick` for metrics, `tune` for hyperparameter search, `workflowsets` for combining them) released to CRAN in 2020 and now the default for new code [13]. `mlr3`, by Bernd Bischl's group at LMU Munich, is the third generation of the `mlr` framework and a more object-oriented alternative to tidymodels [14]. It uses R6 classes throughout, has good support for benchmarking and pipelines, and is favored in some research labs for experiment management.

## Can you do deep learning in R?

R is not where most [deep learning](/wiki/deep_learning) research happens. The community of paper authors writing in R is small relative to the [Python](/wiki/python)/[PyTorch](/wiki/pytorch) crowd, and the papers-with-code culture grew up around Python notebooks. But R has good bindings for the major frameworks.

The `tensorflow` R package binds the [TensorFlow](/wiki/tensorflow) C++ library through `reticulate`. The companion `keras` package wraps the [Keras](/wiki/keras) high-level API. Both were released by RStudio starting in 2017, with JJ Allaire as lead author. Francois Chollet co-wrote *Deep Learning with R* (2018, second edition 2022) with Allaire and Tomasz Kalinowski, an R port of his *Deep Learning with Python* [15].

The `torch` R package, written by Daniel Falbel and the mlverse team and first released to CRAN in October 2020, is a different beast [16]. It does not bind [PyTorch](/wiki/pytorch) through Python; it links directly against `libtorch`, the C++ library underlying PyTorch. That means torch for R has no Python dependency, runs in a single process, and gets near-native speed. The API mirrors PyTorch closely (autograd, `nn.Module`-equivalent, datasets and dataloaders), and the mlverse organization on GitHub maintains extensions including `torchvision`, `torchaudio`, `tabnet`, and `luz` (a high-level training loop akin to PyTorch Lightning).

R's deep-learning footprint is biggest in research areas where R was already entrenched: epidemiological modeling, ecology, psychometrics, single-cell genomics. Most production deep-learning systems still run in Python, but a researcher who needs to fit a CNN as part of a larger statistical analysis can stay inside R if they want to.

## How does R interoperate with Python?

The `reticulate` package, released by RStudio in 2018, embeds a Python interpreter inside an R session and translates objects between the two [13]. You can call Python functions from R, source `.py` files, use NumPy arrays and pandas DataFrames as if they were R vectors and data frames, and run a Python REPL inside the R console. Reticulate is what makes the R-side Keras and TensorFlow bindings work, and it is what makes mixed-language Quarto documents practical. The other direction, calling R from Python, is handled by `rpy2` (Laurent Gautier, 2008). Apache Arrow's R and Python bindings share an in-memory format, so large data frames move between languages with no serialization cost. In practice many data teams write data pipelines and ML training in Python and reach for R for the final modeling and reporting steps.

## R vs Python: which is better for data science?

The rivalry between R and [Python](/wiki/python) has cooled. Both communities mostly accept that they overlap heavily and complement each other for the rest. Neither is universally "better": R is typically stronger for classical statistics, static visualization, and bioinformatics, while Python dominates deep learning, production engineering, and general-purpose programming. A rough division of labor:

| Task | R is typically stronger | Python is typically stronger |
|------|-------------------------|------------------------------|
| Classical statistics (mixed models, survival, GAMs, Bayesian inference) | Yes, by a wide margin | Catching up via PyMC and statsmodels but still behind |
| Data wrangling | Tidyverse, `data.table`, `dplyr` are excellent | pandas is excellent; Polars is excellent |
| Static plotting | ggplot2 is widely considered best in class | matplotlib, plotly, seaborn are all good |
| Interactive dashboards | Shiny | Streamlit, Dash, Gradio, Shiny for Python |
| Reproducible reports | R Markdown, Quarto | Jupyter, Quarto |
| ML pipelines | tidymodels, mlr3 | scikit-learn |
| Deep learning | torch, keras (R bindings to the same engines) | [PyTorch](/wiki/pytorch), [TensorFlow](/wiki/tensorflow), JAX (the actual research happens here) |
| LLM tooling | Some, mainly through `ellmer` and Ollama bindings | Vast: LangChain, LlamaIndex, transformers, vLLM, etc. |
| Bioinformatics / genomics | Bioconductor is the standard | Biopython exists but Bioconductor is bigger |
| Production web services | Plumber, Shiny | FastAPI, Django, Flask |
| General-purpose programming | Possible but awkward | Designed for it |

Most serious data teams now use both. There is no Python equivalent of Bioconductor for genomics, and no R equivalent of the PyTorch / Hugging Face ecosystem for deep learning. Pick the language that matches where the rest of the work in your subfield already lives.

## What is R used for?

R is the working language of a sizable chunk of academic and applied statistics. Specific strongholds: pharma and clinical trials (the FDA accepts R for regulatory submissions, and frameworks like the R Validation Hub are part of the toolchain); official statistics (Eurostat, the U.S. Bureau of Labor Statistics, Statistics Canada, Statistics NZ); genomics and biostatistics (Bioconductor and single-cell packages like Seurat are the default); econometrics and quantitative finance; ecology and geospatial analysis through `vegan`, `sf`, and `terra`; data journalism at outlets like the BBC, FiveThirtyEight, and the Financial Times; and most public NBA/NFL analytics work [5].

The textbook *An Introduction to Statistical Learning* by James, Witten, Hastie, and Tibshirani (2013, second edition 2021) was originally written with R examples and has become one of the most widely used ML textbooks in undergraduate statistics [19]. A companion Python edition appeared in 2023, which is itself a small data point about the shifting balance.

## Is R open source? Versioning and licensing

R is free software released under the GNU General Public License version 2 or 3, at the user's option [2]. CRAN does not require GPL for contributed packages; common alternatives include MIT, Apache 2.0, and BSD.

Major version timeline:

| Version | Release | Notes |
|---------|---------|-------|
| R 0.16 | 1995 | First public version |
| R 0.49 | April 1997 | First on CRAN |
| R 0.60 | December 1997 | Became a GNU project |
| R 1.0.0 | February 29, 2000 | First stable release |
| R 2.0.0 | October 4, 2004 | Lazy-loading package data |
| R 3.0.0 | April 3, 2013 | Long vectors; broke binary compatibility |
| R 3.4.0 | April 2017 | JIT compilation by default |
| R 3.5.0 | April 2018 | ALTREP framework |
| R 4.0.0 | April 24, 2020 | `stringsAsFactors = FALSE` default |
| R 4.1.0 | May 2021 | Native pipe `|>`; lambda `\(x) x + 1` |
| R 4.2.0 | April 2022 | Native pipe placeholder; UTF-8 on Windows |
| R 4.3.0 | April 2023 | `_` placeholder in pipes |
| R 4.4.0 | April 2024 | Language-level enhancements |
| R 4.5.0 | April 11, 2025 | "How About a Twenty-Six" |
| R 4.6.0 | April 24, 2026 | "Because it was There"; current stable |

## What are R's limitations and criticisms?

R is productive for what it was designed to do, but it has rough edges that practitioners have lived with for decades.

- Memory model. Until R 3.5, reference counting was naive and many operations made silent copies. R 4.0 improved this, but the language still loves making copies, and large in-memory datasets are easier in `data.table` or Arrow than in base R.
- Speed. Vectorized R is comparable to NumPy. Loop-heavy R is slow. The standard remedy is to call C++ via the `Rcpp` package by Dirk Eddelbuettel and Romain Francois (2008), which has its own thriving ecosystem [21].
- Surface inconsistency. Different packages disagree about `NA` handling, factor defaults, base graphics versus ggplot2, and a dozen other small things. The tidyverse cleaned up a lot of this, but old code still mixes idioms.
- Production deployment. Putting an R model behind a web service is harder than in Python. Plumber and Shiny work, but the broader microservice infrastructure (containers, observability, schedulers) is more Python-shaped.
- General-purpose programming. R is not a great fit for non-statistical work. Writing a parser, a game, or a queueing system in R is possible but unusual.
- Concurrency. R's threading is limited; the `parallel`, `future`, and `mirai` packages provide multi-process backends but there is nothing like Python's `asyncio` built into the language.

Most R users would say none of these are dealbreakers for the work R is actually used for, and the ones that matter (Rcpp for speed, Arrow for memory, Plumber for serving) have working answers.

## See also

- [Python](/wiki/python), [Julia](/wiki/julia), [Data science](/wiki/data_science), [Statistics](/wiki/statistics)
- [Machine learning](/wiki/machine_learning), [Deep learning](/wiki/deep_learning), [TensorFlow](/wiki/tensorflow), [PyTorch](/wiki/pytorch), [Keras](/wiki/keras)
- [XGBoost](/wiki/xgboost), [LightGBM](/wiki/lightgbm), [Random forest](/wiki/random_forest), [Decision tree](/wiki/decision_tree)
- [Linear regression](/wiki/linear_regression), [Logistic regression](/wiki/logistic_regression), [Principal component analysis](/wiki/principal_component_analysis_pca), [t-SNE](/wiki/t_sne)
- [Time series analysis](/wiki/time_series_analysis), [ARIMA](/wiki/arima), [Markov chain Monte Carlo](/wiki/markov_chain_monte_carlo), [Bayesian inference](/wiki/bayesian_inference)
- [pandas](/wiki/pandas), [NumPy](/wiki/numpy), [scikit-learn](/wiki/scikit_learn), [Matplotlib](/wiki/matplotlib), [DataFrame](/wiki/dataframe), [Kaggle](/wiki/kaggle)

## References

1. Ihaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. *Journal of Computational and Graphical Statistics*, 5(3), 299-314.
2. R Core Team (2026). *R: A Language and Environment for Statistical Computing*. R Foundation for Statistical Computing, Vienna, Austria. https://www.r-project.org/
3. Comprehensive R Archive Network (CRAN). https://cran.r-project.org/
4. R Foundation for Statistical Computing. https://www.r-project.org/foundation/
5. Bioconductor Project. https://www.bioconductor.org/
6. Chambers, J. M. (2008). *Software for Data Analysis: Programming with R*. Springer.
7. Wickham, H. (2014). Tidy data. *Journal of Statistical Software*, 59(10), 1-23.
8. Wickham, H., & Grolemund, G. (2017). *R for Data Science*. O'Reilly Media. https://r4ds.had.co.nz/
9. Wickham, H., Cetinkaya-Rundel, M., & Grolemund, G. (2023). *R for Data Science* (2nd ed.). O'Reilly. https://r4ds.hadley.nz/
10. Wickham, H. (2016). *ggplot2: Elegant Graphics for Data Analysis* (2nd ed.). Springer.
11. Wilkinson, L. (1999). *The Grammar of Graphics*. Springer.
12. Kuhn, M. (2008). Building predictive models in R using the caret package. *Journal of Statistical Software*, 28(5), 1-26.
13. Kuhn, M., & Silge, J. (2022). *Tidy Modeling with R*. O'Reilly. https://www.tmwr.org/
14. Bischl, B., Sonabend, R., Kotthoff, L., & Lang, M. (eds.) (2024). *Applied Machine Learning Using mlr3 in R*. Chapman & Hall/CRC. https://mlr3book.mlr-org.com/
15. Allaire, J. J., & Chollet, F. (2022). *Deep Learning with R* (2nd ed.). Manning.
16. Falbel, D. (2020). torch for R. https://torch.mlverse.org/
17. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. *Proceedings of KDD 2016*.
18. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. *Journal of Statistical Software*, 33(1), 1-22.
19. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). *An Introduction to Statistical Learning with Applications in R* (2nd ed.). Springer. https://www.statlearning.com/
20. Hyndman, R. J., & Athanasopoulos, G. (2021). *Forecasting: Principles and Practice* (3rd ed.). OTexts. https://otexts.com/fpp3/
21. Eddelbuettel, D., & Francois, R. (2011). Rcpp: Seamless R and C++ integration. *Journal of Statistical Software*, 40(8), 1-18.
22. Chang, W. (2014). R6: Encapsulated classes with reference semantics. https://r6.r-lib.org/
23. Posit PBC (formerly RStudio). https://posit.co/
24. Quarto. https://quarto.org/
25. Shiny. https://shiny.posit.co/
26. R Consortium. https://www.r-consortium.org/
27. Wikipedia: R (programming language). https://en.wikipedia.org/wiki/R_(programming_language)
28. Wikipedia: Tidyverse. https://en.wikipedia.org/wiki/Tidyverse
29. Wikipedia: ggplot2. https://en.wikipedia.org/wiki/Ggplot2
30. Wikipedia: RStudio. https://en.wikipedia.org/wiki/RStudio
31. Wikipedia: Bioconductor. https://en.wikipedia.org/wiki/Bioconductor
32. Wikipedia: S (programming language). https://en.wikipedia.org/wiki/S_(programming_language)

