Positive class
Last reviewed
May 11, 2026
Sources
5 citations
Review status
Source-backed
Revision
v2 · 2,205 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
5 citations
Review status
Source-backed
Revision
v2 · 2,205 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Machine learning terms
In binary classification, the positive class is the label assigned to the outcome a model is trying to detect. The other label, used for everything else, is the negative class. The Google Machine Learning Glossary defines it directly: in a binary classifier that detects spam, the positive class is "spam," and the positive class in a cancer model might be "tumor." It is not a property of the data but a design choice. Two engineers labeling the same dataset can swap which label is positive, and every downstream metric (precision, recall, the confusion matrix, the ROC curve) flips its meaning along with that choice.
The label is written as the integer 1 in most code, with the negative class written as 0. In scikit-learn this default appears in the pos_label parameter on metrics such as precision_score, recall_score, and roc_curve. The convention is a holdover from medical and statistical testing, where a positive test result means the suspected condition is present and a negative result means it is absent.
The positive/negative vocabulary predates machine learning by about a century. It originated in epidemiology and diagnostic medicine, where a "positive" test indicated the presence of the disease under investigation, and migrated into statistical hypothesis testing, signal detection theory, and information retrieval. By the time the modern machine learning vocabulary stabilized in the 1990s, the same words already anchored the language of the confusion matrix, precision, recall, and receiver operating characteristic (ROC) analysis.
A positive class can be rare or common; what makes it positive is that the practitioner has decided this is the outcome the model exists to flag. A radiologist scoring mammograms cares about catching tumors, so tumor is positive. A search engine engineer cares about retrieving relevant documents, so relevant is positive. A fraud team cares about catching fraudulent transactions, so fraud is positive.
The choice of positive class is rarely arbitrary, but it is always deliberate. A useful default is to pick the class whose detection triggers an action. If the model predicts churn and a high score means the customer service team will call, then churn is the positive class. If the model predicts spam and a positive score sends the email to a junk folder, then spam is the positive class. Picking the wrong side does not break the math, but it makes every report harder to read because metrics describe the opposite of what people are tracking.
A second common rule of thumb is that the positive class is the rare or minority class. In imbalanced settings such as fraud detection, rare disease screening, or manufacturing defect detection, the event of interest is uncommon and the negative class dominates the data. The Google class-imbalanced datasets guide defines the majority class as the more common label and the minority class as the less common one, and most practitioners treat the minority class as positive. This convention lines up with both intuition (a positive screening result is the alarming one) and the math: precision and recall focus on the positive class, and that focus is most useful when the positive class is small.
A third consideration is downstream cost. False negatives and false positives usually carry different consequences, and naming the more dangerous error often determines which side is positive. In rare-disease screening a false negative (a sick patient told they are fine) is far worse than a false positive (a healthy patient sent for follow-up tests), so the disease is the positive class and recall is the headline metric. In a content recommendation system a false positive (showing the wrong post) is mostly harmless, but a false negative (hiding the perfect post) is invisible to the user; precision becomes the headline metric and the positive class is "would engage."
A few practical patterns:
| Domain | Positive class | Why |
|---|---|---|
| Email spam filter | Spam | The category that triggers a quarantine or filter action |
| Medical screening | Disease present | The condition the test is designed to catch |
| Credit card fraud | Fraudulent transaction | The minority outcome with material financial cost |
| Manufacturing QC | Defect | The minority outcome that triggers a stop or inspection |
| Sentiment analysis | Positive sentiment (or negative, depending on use) | Whichever class the downstream system acts on |
| Churn prediction | Will churn | The event that triggers a retention campaign |
| Click prediction | Click | The event used to rank or bid |
| Information retrieval | Relevant document | The document the user wants to see |
When the two classes are roughly symmetric (cat versus dog, English versus French), the choice is mostly arbitrary and teams pick whatever makes the metrics easier to read. When the two classes are clearly asymmetric, the positive class is whatever the deployed system reacts to.
Once the positive class is fixed, every prediction made by the model falls into one of four buckets, summarized in the confusion matrix:
| Outcome | Definition |
|---|---|
| True positive (TP) | Actual positive, predicted positive. The model correctly detects the target. |
| False positive (FP) | Actual negative, predicted positive. The model raises a false alarm. Also called a type I error. |
| True negative (TN) | Actual negative, predicted negative. The model correctly leaves the case alone. |
| False negative (FN) | Actual positive, predicted negative. The model misses the target. Also called a type II error. |
The four cells are not symmetric in importance. Most reporting metrics use only the row or column tied to the positive class. A model with high true-negative counts and high false-negative counts can still be a poor detector of the positive class, even though its accuracy looks fine.
The positive class anchors most binary classification metrics. The formulas below all use the four counts from the confusion matrix.
Precision (positive predictive value) measures how trustworthy the model's positive predictions are. Google's classification crash course defines it as "the proportion of all the model's positive classifications that are actually positive," with the formula:
Precision = TP / (TP + FP)
Recall (sensitivity, true positive rate) measures how many actual positives the model finds. Google defines it as "the proportion of all actual positives that were classified correctly as positives."
Recall = TP / (TP + FN)
F1 score is the harmonic mean of precision and recall and gives a single number when both matter. It rewards models that are good at both.
F1 = 2 * (Precision * Recall) / (Precision + Recall)
False positive rate (FPR) is the share of actual negatives that the model wrongly flags as positive.
FPR = FP / (FP + TN)
Accuracy counts every correct prediction, both positive and negative.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
All five formulas change meaning when the positive class is swapped. Precision for the original positive class becomes "negative predictive value" for the swapped one. Recall becomes specificity. F1 changes value, often by a large margin if the classes are imbalanced. This is why reports should always state which label is treated as positive; numbers that look identical can hide opposite results.
The ROC curve plots recall (TPR) against FPR for every threshold the model could use, and the precision-recall curve plots precision against recall. Both curves are defined relative to the positive class. A model with strong ROC curves for the original positive class can have weak precision-recall curves on the same data once the classes are imbalanced, because the FPR can stay low while precision collapses.
Most classifiers in practice output a probability rather than a hard label. For a binary classifier this is a single number between 0 and 1, interpreted as the probability that the input belongs to the positive class. In scikit-learn this is accessed through predict_proba(X)[:, 1], where the second column corresponds to the positive label. The model produces a class prediction by comparing the probability against a threshold (usually 0.5 by default), and predictions above the threshold are labeled positive.
Moving the threshold trades precision against recall. A higher threshold demands more confidence to flag a positive, raising precision and lowering recall. A lower threshold flags more cases, raising recall and lowering precision. The right threshold depends on the cost of the two types of error, and it is almost never 0.5 in production.
In most real applications the positive class is much smaller than the negative class. Credit card fraud, rare disease screening, manufacturing defects, click-through prediction, and intrusion detection all sit at imbalance ratios where the positive class is well under five percent of the data, sometimes under one tenth of one percent. The Google ML Glossary calls a dataset "class-imbalanced" when the total number of labels of each class differs significantly.
Imbalance is not a bug in the data; it is a feature of the world being modeled. A naive classifier on a 1:99 problem can hit 99 percent accuracy by predicting "negative" for everything, with zero true positives. Accuracy is therefore close to useless beyond about a 10:1 ratio, and metrics that emphasize the positive class (precision, recall, F1, area under the precision-recall curve) take over.
Several techniques are used to give the model a fair shot at the positive class. Resampling rebalances the training set: random undersampling drops negatives, random oversampling duplicates positives, and SMOTE generates synthetic positives by interpolating between minority examples. Cost-sensitive learning weights the loss so that mistakes on the positive class hurt more, which is the strategy behind class_weight='balanced' in scikit-learn and behind focal loss in deep learning. The Google class-imbalanced datasets guide pairs downsampling with upweighting so the model still sees enough minority examples per batch without biasing the prediction distribution.
All of these methods center the positive class. Choosing which side is positive determines what the model is being asked to be good at, which examples are duplicated or weighted, and which curve a team will watch during model selection.
The positive/negative framing extends into multi-class classification through the one-vs-rest (OvR) reduction. For each class, a binary classifier is trained that treats that class as positive and the union of every other class as negative. With ten classes, the same training example serves as positive for one classifier and as negative for the other nine. Per-class precision, recall, and F1 are reported by treating each class in turn as the positive one, then aggregated with macro, weighted, or micro averaging depending on the use case. The average parameter on scikit-learn's classification metrics controls this aggregation.
A few traps catch new practitioners. The positive class is not the more common one; in most useful problems it is the rare one. A positive prediction is not necessarily correct; correctness requires the actual label to also be positive. A false positive does not mean the model labeled something as false; it means the model labeled a negative case as positive. A high accuracy does not imply a high recall on the positive class; on a 1:99 problem it can hide a recall of zero. A model that looks great on one labeling can look bad after a label swap, even though nothing about its behavior changed.
Reports that compare models from different teams should always state explicitly which label is treated as positive. Otherwise precision and recall can be reported in either direction without anyone noticing.
Imagine you are looking through a basket of mostly apples to find the one rotten apple. The rotten apple is the positive class, because that is what you are trying to find. The good apples are the negative class, because they are everything else.
Now think about how to score yourself:
The positive class is just a name. It is the thing you actually care about catching.