# Positive class

> Source: https://aiwiki.ai/wiki/positive_class
> Updated: 2026-06-27
> Categories: Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

In [binary classification](/wiki/binary_classification), the **positive class** is the class a model is testing for: the outcome it exists to detect, such as "spam," "fraud," or "tumor." The other label, covering everything else, is the [negative class](/wiki/negative_class).[1] The Google Machine Learning Glossary defines it as "the class you are testing for," and notes that "the positive class in a cancer model might be 'tumor.'"[1] The positive class is the label referenced by [true positives](/wiki/true_positive) and [false positives](/wiki/false_positive), and it is the class that anchors precision, recall, and the rest of the binary classification metrics.

The positive class is a design choice, not a property of the data. Two engineers labeling the same dataset can swap which label is positive, and every downstream metric (precision, recall, the [confusion matrix](/wiki/confusion_matrix), the ROC curve) flips its meaning along with that choice. By convention the positive class is the outcome of interest, and it is often (though not always) the rarer or higher-cost class.

The label is written as the integer 1 in most code, with the negative class written as 0. In scikit-learn this default appears in the `pos_label` parameter on metrics such as `precision_score`, `recall_score`, and `roc_curve`; when targets are in {0, 1} or {-1, 1}, `pos_label` defaults to 1.[5] The convention is a holdover from medical and statistical testing, where a positive test result means the suspected condition is present and a negative result means it is absent.[4]

## What is the positive class?

The positive class is the label assigned to the event a binary classifier is built to flag, while the negative class is assigned to everything else. The Google Machine Learning Glossary puts the contrast plainly: "one class is termed positive and the other is termed negative. The positive class is the thing or event that the model is testing for and the negative class is the other possibility."[1] For a spam detector the positive class is spam; for a cancer model it might be "tumor" and the negative class "not tumor."[1]

A positive class can be rare or common; what makes it positive is that the practitioner has decided this is the outcome the model exists to flag. A radiologist scoring mammograms cares about catching tumors, so tumor is positive. A search engine engineer cares about retrieving relevant documents, so relevant is positive.[4] A fraud team cares about catching fraudulent transactions, so fraud is positive.

## Where does the positive/negative convention come from?

The positive/negative vocabulary predates [machine learning](/wiki/machine_learning) by about a century. It originated in epidemiology and diagnostic medicine, where a "positive" test indicated the presence of the disease under investigation, and migrated into statistical hypothesis testing, signal detection theory, and information retrieval.[4] By the time the modern machine learning vocabulary stabilized in the 1990s, the same words already anchored the language of the confusion matrix, [precision](/wiki/precision), [recall](/wiki/recall), and receiver operating characteristic (ROC) analysis.

## How do you choose the positive class?

The choice of positive class is rarely arbitrary, but it is always deliberate. A useful default is to pick the class whose detection triggers an action. If the model predicts churn and a high score means the customer service team will call, then churn is the positive class. If the model predicts spam and a positive score sends the email to a junk folder, then spam is the positive class. Picking the wrong side does not break the math, but it makes every report harder to read because metrics describe the opposite of what people are tracking.

A second common rule of thumb is that the positive class is the rare or [minority class](/wiki/minority_class). In imbalanced settings such as fraud detection, rare disease screening, or manufacturing defect detection, the event of interest is uncommon and the negative class dominates the data. The Google class-imbalanced datasets guide defines the majority class as the more common label and the minority class as the less common one, and most practitioners treat the minority class as positive.[3] This convention lines up with both intuition (a positive screening result is the alarming one) and the math: precision and recall focus on the positive class, and that focus is most useful when the positive class is small.

A third consideration is downstream cost. False negatives and false positives usually carry different consequences, and naming the more dangerous error often determines which side is positive. In rare-disease screening a [false negative](/wiki/false_negative) (a sick patient told they are fine) is far worse than a false positive (a healthy patient sent for follow-up tests), so the disease is the positive class and recall is the headline metric.[4] In a content recommendation system a false positive (showing the wrong post) is mostly harmless, but a false negative (hiding the perfect post) is invisible to the user; precision becomes the headline metric and the positive class is "would engage."

A few practical patterns:

| Domain | Positive class | Why |
| --- | --- | --- |
| Email spam filter | Spam | The category that triggers a quarantine or filter action |
| Medical screening | Disease present | The condition the test is designed to catch |
| Credit card fraud | Fraudulent transaction | The minority outcome with material financial cost |
| Manufacturing QC | Defect | The minority outcome that triggers a stop or inspection |
| Sentiment analysis | Positive sentiment (or negative, depending on use) | Whichever class the downstream system acts on |
| Churn prediction | Will churn | The event that triggers a retention campaign |
| Click prediction | Click | The event used to rank or bid |
| Information retrieval | Relevant document | The document the user wants to see |

When the two classes are roughly symmetric (cat versus dog, English versus French), the choice is mostly arbitrary and teams pick whatever makes the metrics easier to read. When the two classes are clearly asymmetric, the positive class is whatever the deployed system reacts to.

## What are the four outcomes tied to the positive class?

Once the positive class is fixed, every prediction made by the model falls into one of four buckets, summarized in the confusion matrix:[1]

| Outcome | Definition |
| --- | --- |
| True positive (TP) | Actual positive, predicted positive. The model correctly detects the target. |
| False positive (FP) | Actual negative, predicted positive. The model raises a false alarm. Also called a type I error. |
| True negative (TN) | Actual negative, predicted negative. The model correctly leaves the case alone. |
| False negative (FN) | Actual positive, predicted negative. The model misses the target. Also called a type II error. |

The four cells are not symmetric in importance. Most reporting metrics use only the row or column tied to the positive class. A model with high true-negative counts and high false-negative counts can still be a poor detector of the positive class, even though its accuracy looks fine.

## How does the positive class relate to precision and recall?

The positive class anchors most binary classification metrics. The formulas below all use the four counts from the confusion matrix.

**Precision** (positive predictive value) measures how trustworthy the model's positive predictions are. Google's classification crash course defines it as "the proportion of all the model's positive classifications that are actually positive," with the formula:[2]

Precision = TP / (TP + FP)

**Recall** (sensitivity, true positive rate) measures how many actual positives the model finds. Google defines it as "the proportion of all actual positives that were classified correctly as positives."[2]

Recall = TP / (TP + FN)

**F1 score** is the harmonic mean of precision and recall and gives a single number when both matter. It rewards models that are good at both, and Google notes it is "preferable to accuracy for class-imbalanced datasets."[2]

F1 = 2 * (Precision * Recall) / (Precision + Recall)

**False positive rate (FPR)** is the share of actual negatives that the model wrongly flags as positive.[2]

FPR = FP / (FP + TN)

**Accuracy** counts every correct prediction, both positive and negative.[2]

Accuracy = (TP + TN) / (TP + TN + FP + FN)

All five formulas change meaning when the positive class is swapped. Precision for the original positive class becomes "negative predictive value" for the swapped one. Recall becomes specificity.[4] F1 changes value, often by a large margin if the classes are imbalanced. This is why reports should always state which label is treated as positive; numbers that look identical can hide opposite results.

The ROC curve plots recall (TPR) against FPR for every threshold the model could use, and the precision-recall curve plots precision against recall. Both curves are defined relative to the positive class. A model with strong ROC curves for the original positive class can have weak precision-recall curves on the same data once the classes are imbalanced, because the FPR can stay low while precision collapses.

## How does a probability output become a positive prediction?

Most classifiers in practice output a probability rather than a hard label. For a binary classifier this is a single number between 0 and 1, interpreted as the probability that the input belongs to the positive class. In scikit-learn this is accessed through `predict_proba(X)[:, 1]`, where the second column corresponds to the positive label.[5] The model produces a class prediction by comparing the probability against a threshold (usually 0.5 by default), and predictions above the threshold are labeled positive.

Moving the threshold trades precision against recall. A higher threshold demands more confidence to flag a positive, raising precision and lowering recall. A lower threshold flags more cases, raising recall and lowering precision. The right threshold depends on the cost of the two types of error, and it is almost never 0.5 in production.

## Why is the positive class usually the minority class?

In most real applications the positive class is much smaller than the negative class. Credit card fraud, rare disease screening, manufacturing defects, click-through prediction, and intrusion detection all sit at imbalance ratios where the positive class is well under five percent of the data, sometimes under one tenth of one percent. The Google Machine Learning Glossary calls a dataset "class-imbalanced" "in which the total number of labels of each class differs significantly," giving the example of a binary dataset with 1,000,000 negative labels and 10 positive labels.[1]

Imbalance is not a bug in the data; it is a feature of the world being modeled. A naive classifier on a 1:99 problem can hit 99 percent accuracy by predicting "negative" for everything, with zero true positives. Google warns that such "a model that predicts negative 100% of the time would score 99% on accuracy, despite being useless."[2] Accuracy is therefore close to useless beyond about a 10:1 ratio, and metrics that emphasize the positive class (precision, recall, F1, area under the precision-recall curve) take over.

Several techniques are used to give the model a fair shot at the positive class. Resampling rebalances the training set: random undersampling drops negatives, random oversampling duplicates positives, and [SMOTE](/wiki/smote) generates synthetic positives by interpolating between minority examples. Cost-sensitive learning weights the loss so that mistakes on the positive class hurt more, which is the strategy behind `class_weight='balanced'` in scikit-learn and behind [focal loss](/wiki/focal_loss) in deep learning.[5] The Google class-imbalanced datasets guide pairs downsampling with upweighting so the model still sees enough minority examples per batch without biasing the prediction distribution.[3] See [class imbalance](/wiki/class_imbalance) for the full treatment.

All of these methods center the positive class. Choosing which side is positive determines what the model is being asked to be good at, which examples are duplicated or weighted, and which curve a team will watch during model selection.

## How does the positive class extend to multiclass problems?

The positive/negative framing extends into [multi-class classification](/wiki/multi-class_classification) through the one-vs-rest (OvR) reduction. For each class, a binary classifier is trained that treats that class as positive and the union of every other class as negative. With ten classes, the same training example serves as positive for one classifier and as negative for the other nine. Per-class precision, recall, and F1 are reported by treating each class in turn as the positive one, then aggregated with macro, weighted, or micro averaging depending on the use case. The `average` parameter on scikit-learn's classification metrics controls this aggregation.[5]

## What are common confusions about the positive class?

A few traps catch new practitioners. The positive class is not the more common one; in most useful problems it is the rare one. A positive prediction is not necessarily correct; correctness requires the actual label to also be positive. A false positive does not mean the model labeled something as false; it means the model labeled a negative case as positive. A high accuracy does not imply a high recall on the positive class; on a 1:99 problem it can hide a recall of zero. A model that looks great on one labeling can look bad after a label swap, even though nothing about its behavior changed.

Reports that compare models from different teams should always state explicitly which label is treated as positive. Otherwise precision and recall can be reported in either direction without anyone noticing.

## Explain like I'm 5 (ELI5)

Imagine you are looking through a basket of mostly apples to find the one rotten apple. The rotten apple is the **positive class**, because that is what you are trying to find. The good apples are the **negative class**, because they are everything else.

Now think about how to score yourself:

* If you find the rotten apple and say "rotten," that is a **true positive**.
* If you point at a good apple and say "rotten," that is a **false positive** (a false alarm).
* If you miss the rotten apple and say it is good, that is a **false negative** (you missed the bad one).
* If you correctly leave the good apples alone, those are **true negatives**.

The positive class is just a name. It is the thing you actually care about catching.

## References

1. Google for Developers, *Machine Learning Glossary: Fundamentals*. Definitions of positive class, negative class, binary classification, true/false positive, true/false negative, class imbalance. https://developers.google.com/machine-learning/glossary/fundamentals
2. Google for Developers, *Machine Learning Crash Course: Classification: Accuracy, recall, precision, and related metrics*. Formulas for accuracy, recall, precision, FPR, F1. https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall
3. Google for Developers, *Machine Learning Crash Course: Datasets: Class-imbalanced datasets*. Definitions of majority/minority class, downsampling, upweighting. https://developers.google.com/machine-learning/crash-course/overfitting/imbalanced-datasets
4. Wikipedia, *Binary classification*. History of the positive/negative convention; sensitivity, specificity, precision, recall; domain conventions in medicine and information retrieval. https://en.wikipedia.org/wiki/Binary_classification
5. scikit-learn developers, *Metrics and scoring*. `pos_label` parameter, `predict_proba`, average parameter for multiclass metrics. https://scikit-learn.org/stable/modules/model_evaluation.html
