The Iris dataset, sometimes referred to as Fisher's Iris dataset or the Iris flower dataset, is a multivariate dataset introduced by the British statistician and biologist Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems," published in the Annals of Eugenics. The dataset comprises 150 samples of iris flowers, with 50 specimens drawn from each of three species (Iris setosa, Iris versicolor, and Iris virginica). For each flower, four numerical features are recorded in centimeters: sepal length, sepal width, petal length, and petal width. A categorical label identifies the species [1].
The Iris dataset is one of the most widely used examples in pattern recognition, multivariate statistics, and machine learning education. It has served as the canonical "hello world" of classification and clustering algorithms for several decades, appearing in introductory textbooks, software tutorials, and the standard documentation of libraries such as scikit-learn, seaborn, and the R programming language [2][3]. Although the data themselves were not Fisher's original creation, his use of them to demonstrate linear discriminant analysis bound the dataset to his name and to the early history of statistical pattern recognition.
The measurements that make up the Iris dataset were collected and published by the American botanist Edgar Anderson (1897-1969). Anderson's botanical work focused on the genus Iris, and he was particularly interested in the morphological variation among closely related species and in questions of hybridization. In 1935 he published a short paper titled "The irises of the Gaspe Peninsula" in the Bulletin of the American Iris Society, reporting careful measurements of Iris setosa and Iris versicolor that he had gathered on the Gaspe Peninsula in Quebec, Canada [4].
For the dataset that became famous through Fisher's paper, Anderson combined his Gaspe measurements with measurements of a third species, Iris virginica, drawn from a different population. The two species I. setosa and I. versicolor in the dataset were collected on the same day, in the same place, by the same person, using the same instruments, while I. virginica came from a separate locality. This sampling structure, while perfectly reasonable for Anderson's botanical purposes, is sometimes overlooked when the data are treated as a generic classification benchmark.
Ronald Fisher (1890-1962), then working at the Rothamsted Experimental Station in England, used Anderson's data in his 1936 paper to illustrate a multivariate technique he had developed: the linear discriminant function. Fisher had previously corresponded with Anderson, and the choice of iris measurements suited his goal of showing how multiple correlated measurements could be combined into a single classification rule. The paper was about the method, not about iris taxonomy as such; the dataset was a convenient and well-documented example.
The canonical Iris dataset consists of 150 rows and 5 columns. Four columns hold continuous numerical features, all measured in centimeters with one decimal place. The fifth column is a categorical class label identifying the species.
| Column | Type | Description | Units |
|---|---|---|---|
| Sepal length | Numeric | Length of the sepal | cm |
| Sepal width | Numeric | Width of the sepal | cm |
| Petal length | Numeric | Length of the petal | cm |
| Petal width | Numeric | Width of the petal | cm |
| Species | Categorical | I. setosa, I. versicolor, or I. virginica | - |
The class distribution is perfectly balanced: 50 samples of Iris setosa, 50 samples of Iris versicolor, and 50 samples of Iris virginica. Each row is independent of the others within the table, and there are no missing values in the canonical version distributed by Fisher's paper.
In botanical terms, sepals are the green leaf-like structures that enclose the flower bud and protect the developing flower, while petals are typically the colored, showier parts of the bloom. In irises both sepals and petals are conspicuous and visually similar, with the sepals often referred to as "falls" and the petals as "standards" in horticultural usage. Anderson's measurements treated each pair as a length and a width, giving the four-feature schema that propagated into the dataset.
The Iris dataset has several well-known statistical properties that make it a useful pedagogical example, but also constrain what can be learned from it.
Iris setosa is linearly separable from the other two species. A simple threshold on petal length or petal width is enough to identify setosa with no errors. This is visually obvious in any pairwise scatter plot involving the petal dimensions: the setosa points form a tight, well-separated cluster with notably smaller petals than the other two species.
Iris versicolor and Iris virginica, by contrast, overlap in feature space and are not linearly separable using any single feature or any single linear combination of features without misclassifying a few cases. They can be distinguished with reasonable accuracy using all four features together, and a classifier such as linear discriminant analysis typically reaches around 96 to 98 percent accuracy on the dataset, depending on the train and test split. Non-linear methods like kernel support vector machines, neural networks, or tree ensembles can usually achieve perfect or near-perfect training accuracy, but the marginal cost is overfitting to a tiny dataset.
The four features are correlated, especially petal length and petal width, which are nearly collinear. Sepal width has the lowest correlation with the species label and is the noisiest predictor on its own. Mean values for each species and feature are summarized below.
| Species | Sepal length (cm) | Sepal width (cm) | Petal length (cm) | Petal width (cm) |
|---|---|---|---|---|
| Iris setosa | 5.006 | 3.428 | 1.462 | 0.246 |
| Iris versicolor | 5.936 | 2.770 | 4.260 | 1.326 |
| Iris virginica | 6.588 | 2.974 | 5.552 | 2.026 |
The class balance is exact, at 50/50/50 samples per species, which removes any need to worry about class imbalance during pedagogical use. There are no missing values in the original Fisher table.
Fisher's paper "The use of multiple measurements in taxonomic problems" appeared in the Annals of Eugenics in 1936 and laid out what is now known as Fisher's linear discriminant analysis, or LDA [1]. The journal was later renamed the Annals of Human Genetics and continues under that title.
The central question of the paper was how to combine several measurements taken on the same specimen into a single number that would best separate two or more taxonomic groups. Fisher proposed projecting the multivariate measurements onto a one-dimensional axis chosen to maximize the ratio of between-class variance to within-class variance. Solving this maximization problem reduces to a generalized eigenvalue problem in the scatter matrices of the data, and the resulting linear combination of variables is the discriminant function.
Fisher demonstrated the technique on Anderson's iris measurements, working out the discriminant between I. versicolor and I. virginica using all four variables. The setosa species was set aside as trivially separable. The paper's tables include the raw data for all 150 specimens, which is how the dataset was preserved and propagated. Subsequent users of the dataset have generally reproduced the values directly from those tables.
The paper is significant in the history of statistics for at least three reasons. First, it formalized a multivariate classification method that remains a baseline in machine learning textbooks. Second, it was an early application of matrix algebra to a biological problem and helped establish multivariate statistics as a coherent discipline. Third, the dataset itself escaped the original paper and went on to a life of its own in computing.
Despite its reputation for clean simplicity, the Iris dataset distributed in software libraries and online repositories contains a small number of transcription errors that have propagated for decades. The most thorough investigation is by James C. Bezdek and colleagues in their 1999 paper "Will the real iris data please stand up?" published in IEEE Transactions on Fuzzy Systems [5].
Bezdek and his coauthors compared the version of the dataset in the UCI Machine Learning Repository against Anderson's original 1935 publication and against Fisher's 1936 tables. They found that two rows of the UCI version differ from Anderson's printed values, specifically in the 35th and 38th rows of Iris setosa. The errors involve a small number of digits and do not change the gross structure of the data, but they are real and reproducible.
The consequence is that there is no single canonical version of the dataset. The version in the UCI repository, which has been the de facto standard since the late 1980s, is slightly different from Anderson's original. The version supplied by scikit-learn follows the UCI version. The version in R's built-in iris data frame matches Fisher's 1936 paper and differs from the UCI version in a couple of cells. Bezdek et al. published the corrected values, and a small number of researchers have since used those, but most practitioners continue to use the inherited UCI form.
For pedagogical work the discrepancies are negligible. For careful comparison of papers that report exact accuracies on the dataset, however, the version used should be specified, and any claim of "100 percent accuracy" should be evaluated with the underlying data version in mind.
The Iris dataset is one of the first datasets that students of statistics and machine learning encounter. Its appeal is practical: it is small enough to fit on a screen, its features are interpretable in terms anyone can picture, and its three classes can be visualized clearly in two-dimensional projections. The dataset shows up in nearly every introductory course and book on the subject.
Common educational uses include:
| Use case | Algorithms commonly demonstrated |
|---|---|
| Supervised classification | Logistic regression, k-nearest neighbors, decision trees, random forests, support vector machines, naive Bayes, neural networks |
| Clustering | k-means, hierarchical clustering, Gaussian mixture models, DBSCAN |
| Dimensionality reduction | Principal component analysis (PCA), linear discriminant analysis, t-SNE, UMAP |
| Visualization | Pair plots, scatter plot matrices, box plots, parallel coordinates |
| Model evaluation | Cross-validation, stratified sampling, ROC curves for one-vs-rest |
| Hyperparameter tuning | Grid search, random search, validation curves |
In pandas and seaborn tutorials, the dataset commonly appears as sns.load_dataset("iris"), returning a tidy DataFrame ready for plotting. The pair plot of the four features colored by species is one of the most reproduced charts in data science teaching: it shows at a glance that setosa sits apart while versicolor and virginica overlap, motivating the move from univariate thresholds to multivariate methods.
Clustering demonstrations on the Iris dataset typically run k-means with k = 3 and observe that the recovered clusters approximately, but not exactly, match the species labels. The mismatch between unsupervised clusters and the supervised ground truth is itself a useful teaching point about the difference between data structure and labels.
The Iris dataset is available in many forms across computing environments. The most widely used sources are listed below.
| Source | Access | Notes |
|---|---|---|
| UCI Machine Learning Repository | https://archive.ics.uci.edu/ml/datasets/iris | The de facto standard online source since 1988; contains the small Bezdek et al. errors |
| scikit-learn | from sklearn.datasets import load_iris | Mirrors UCI; returns NumPy arrays and a Bunch object |
| R | Built-in iris data frame in datasets package | Matches Fisher's 1936 paper; differs slightly from UCI |
| Seaborn | sns.load_dataset("iris") | Returns a pandas DataFrame with named columns |
| Kaggle | Multiple mirror datasets | Often used in beginner Kaggle notebooks |
| TensorFlow Datasets | tfds.load("iris") | Mirrors UCI |
| OpenML | Dataset 61 | Versioned with metadata |
The UCI Machine Learning Repository version, hosted by the University of California, Irvine, has been the standard reference since the late 1980s and is the source from which many other versions descend [6]. The scikit-learn distribution, in particular, copies the UCI values directly.
The Iris dataset is among the most-cited datasets in the history of statistics and machine learning, although the citation footprint is hard to measure precisely because many papers use it without citing the underlying paper. It has appeared in tens of thousands of textbooks, tutorials, blog posts, course slide decks, and software documentation pages.
Its cultural status is comparable to that of "Hello, World!" in introductory programming. It serves a few purposes simultaneously. As a teaching example, it is small enough to inspect by hand, and its three-class structure makes it instantly accessible. As a sanity check for new software, running a classifier on the dataset and observing reasonable accuracy is a common smoke test. As a worked example in documentation, it appears in the front-page tutorials of scikit-learn, R, MATLAB, Weka, and many others.
The dataset has also acquired a kind of mythical status. Many practitioners can recite the species names, the four feature names, and the rough shape of the data from memory. It appears in interview questions, on cheat sheets, and in puzzle-style exercises in machine learning courses.
For all its pedagogical value, the Iris dataset has significant limitations as a benchmark or as a basis for any modern claim about machine learning performance.
Tiny size. With only 150 samples, the dataset is far too small to support meaningful comparisons between modern algorithms. Random variation in train and test splits dominates differences between methods, and no method's reported accuracy on the dataset can be considered statistically meaningful at the level of fractions of a percent.
Low dimensionality. Four features, all highly correlated, do not reflect the high-dimensional, sparse, and noisy data on which contemporary machine learning operates. Lessons learned on Iris do not always transfer to text, image, or genomic data.
Trivial separability. The fact that setosa is linearly separable means that almost any reasonable classifier will appear to perform well, which can be misleading for beginners trying to understand the differences between methods. The hard part of the dataset, separating versicolor from virginica, is also fairly small in scope.
Sampling structure. As noted above, the three species were not drawn from a single combined sampling design. Iris setosa and Iris versicolor came from the Gaspe Peninsula, while Iris virginica was from another region. For a teaching example this is irrelevant, but it should be remembered that the dataset is not a clean experimental design.
Eugenics-publication context. Fisher's paper appeared in the Annals of Eugenics, a journal whose mid-20th-century history reflects the eugenics movement of the period. Fisher himself held views aligned with that movement, and he served on bodies that promoted eugenic policies. The journal was later renamed the Annals of Human Genetics and reformed in scope, but the original publication context has prompted academic discussion about how foundational statistical methods are taught and named, and has motivated some educators to seek alternative example datasets.
These criticisms do not mean the dataset is useless. They mean it should be used for what it is: a small, clean, historical example for illustrating techniques, rather than a benchmark for measuring real-world performance.
A number of datasets have stepped in to take over particular roles previously played by Iris.
| Dataset | Domain | Year | Role |
|---|---|---|---|
| MNIST | Handwritten digit images | 1998 | The new "hello world" for image classification at moderate scale |
| Fashion-MNIST | Clothing images | 2017 | Drop-in replacement for MNIST with harder tasks |
| CIFAR-10 | Small natural images | 2009 | Standard early benchmark for convolutional networks |
| Palmer Penguins | Penguin morphometrics | 2020 | Direct stylistic replacement for Iris with similar tabular structure |
| Wine, Boston Housing, Titanic | Various tabular | Various | Other small classical botanical_data-like or social datasets used in tutorials |
The Palmer Penguins dataset, assembled by ecologist Kristen Gorman and popularized in software form by Allison Horst, Alison Hill, and others, was explicitly designed as a tabular replacement for Iris [7]. It has three penguin species (Adelie, Gentoo, Chinstrap), four numerical body measurements, and a handful of additional features including sex and the island of collection. Its data are recent, the collection methods are documented, and it carries no eugenics-related publication history. Many machine learning courses and tutorials have switched to Palmer Penguins for introductory examples.
For higher-dimensional benchmarks, the lineage of "hello world" datasets has moved through MNIST, Fashion-MNIST, and CIFAR-10, each adding more features, more samples, or more difficulty. None of these replaces Iris in its original niche of small, four-feature, three-class tabular data, but in practice they cover the role that Iris played in early demonstrations of classifiers.
The Iris dataset remains a fixture of introductory teaching, library documentation, and quick-start examples, even as it has been displaced from serious benchmarking. A typical first scikit-learn tutorial still loads Iris, fits a logistic regression model, prints the confusion matrix, and reports accuracy in the high nineties. Course slides on PCA and LDA still illustrate the methods on Iris, because the resulting two-dimensional plots are particularly clean.
At the same time, modern teaching has begun to diversify. Some instructors prefer Palmer Penguins for its more recent provenance and absence of eugenics associations. Others reach immediately for higher-dimensional data such as MNIST or text datasets, skipping small tabular examples entirely. Both shifts reflect a broader change in machine learning education, away from a single canonical example and toward a wider range of datasets that better represent the variety of problems practitioners encounter.
The Iris dataset's continued role, then, is largely historical and pedagogical. It is a familiar shared reference point that lets explanations focus on the method rather than on the data, and it is small enough to debug by hand. For those uses it remains valuable, even as the warnings about its limitations become more standard in introductory texts.