True positive rate (TPR)

The true positive rate (TPR) is the proportion of actual positive cases that a classifier correctly identifies as positive. It is one of the most widely used evaluation metrics in binary classification, medical diagnostics, signal detection, and information retrieval, and it appears under several names depending on the field. In statistics and medicine it is called sensitivity; in machine learning and information retrieval it is called recall; in radar and psychophysics it is called the hit rate or probability of detection. All of these refer to the same quantity: the conditional probability that a positive case is correctly flagged.

Formally, TPR answers the question: "Of all the cases that are truly positive, how many did the system catch?" It is computed as the number of true positives (TP) divided by the total number of actual positives, which is the sum of true positives and false negatives (FN). TPR is the y-axis of the ROC curve and the recall component of the F1 score.

Explain like I'm 5 (ELI5)

Imagine you are playing hide and seek with ten friends. They all hide, and your job is to find them. If you find seven of them and miss three, your true positive rate is 7 out of 10, or 70 percent. The TPR does not care how many bushes you searched or how many times you yelled "Found you!" at a squirrel. It only cares about the friends who were actually hiding and whether you found them. A high TPR means you are good at finding the people who are really there. A low TPR means a lot of your friends are still hiding and waiting to be found.

Formal definition

The true positive rate is defined as the conditional probability of a positive prediction given that the true class is positive:

TPR = TP / (TP + FN)

where:

TP (true positives) is the number of positive instances correctly classified as positive
FN (false negatives) is the number of positive instances incorrectly classified as negative
TP + FN is the total number of actual positive instances in the data

In probability notation:

TPR = P(predicted positive | actual positive)

TPR ranges from 0 to 1. A value of 1 means every positive case was caught; a value of 0 means every positive case was missed. Because TPR conditions only on the actual positive class, it is mathematically independent of the number of true negatives and false positives, and therefore independent of class prevalence in the test set. This is why it remains stable under class imbalance, although interpreting it correctly still requires looking at the false positive rate or precision alongside it.

TPR is also the complement of the false negative rate (FNR):

TPR = 1 - FNR

where FNR = FN / (TP + FN).

Synonyms across fields

TPR is a rare case of the same statistic being independently invented in different disciplines, each of which gave it a different name. The table below summarizes the common synonyms.

Term	Field	Typical context
True positive rate (TPR)	Machine learning, statistics	ROC analysis, classifier evaluation
Sensitivity	Epidemiology, medicine	Diagnostic test performance
Recall	Information retrieval, NLP	Search, document classification
Hit rate	Signal detection theory, psychophysics	Stimulus detection experiments
Probability of detection (Pd)	Radar, sonar, engineering	Target detection systems
Statistical power (1 - beta)	Hypothesis testing	Likelihood of detecting a true effect

The equivalence is exact. A 0.92 sensitivity in a clinical study, a 0.92 recall in a search engine, and a 0.92 hit rate in a radar trial all describe the same fraction of true positives that were correctly flagged.

Relationship to the confusion matrix

TPR is one of several metrics derived from the confusion matrix, the 2x2 table that tabulates the four possible outcomes of a binary classification task. The matrix is conventionally laid out with actual classes as rows and predicted classes as columns:

	Predicted positive	Predicted negative
Actual positive	True positive (TP)	False negative (FN)
Actual negative	False positive (FP)	True negative (TN)

TPR is computed by dividing the top-left cell (TP) by the sum of the top row (TP + FN). In other words, TPR is a row-wise statistic of the matrix: it normalizes by the actual positive count. Several related metrics use the same matrix but normalize differently:

Metric	Formula	What it normalizes by
True positive rate (TPR), recall, sensitivity	TP / (TP + FN)	Actual positives (row)
False positive rate (FPR)	FP / (FP + TN)	Actual negatives (row)
Specificity, true negative rate (TNR)	TN / (TN + FP)	Actual negatives (row)
False negative rate (FNR)	FN / (TP + FN)	Actual positives (row)
Precision, positive predictive value	TP / (TP + FP)	Predicted positives (column)
Negative predictive value	TN / (TN + FN)	Predicted negatives (column)
F1 score	2 * P * R / (P + R)	Harmonic mean of precision and recall

A useful mental shortcut: TPR and specificity are paired (one for each true class), and precision and recall are paired (precision conditions on the prediction, recall on the truth).

Worked numerical example

Suppose a fraud detection model is evaluated on 10,000 credit card transactions, of which 200 are actual fraud. The model produces the following confusion matrix:

	Predicted fraud	Predicted legitimate	Total
Actually fraud	160 (TP)	40 (FN)	200
Actually legitimate	320 (FP)	9,480 (TN)	9,800
Total	480	9,520	10,000

The true positive rate is:

TPR = 160 / (160 + 40) = 160 / 200 = 0.80 (80 percent)

The model catches 80 percent of fraudulent transactions. For comparison, the false positive rate is 320 / 9,800 = 0.033 (3.3 percent), and the precision is 160 / 480 = 0.333 (33.3 percent). This combination is typical of fraud and rare-event detection: a respectable TPR, a low FPR, but a precision that suffers because the negative class is so much larger than the positive class. The TPR alone says nothing about the cost of those 320 false alarms, which is why production systems usually report TPR together with at least one prediction-conditioned metric.

Historical origins

The concept of measuring how often a true positive is detected has roots in two parallel traditions: medical diagnostics in the 1940s and signal detection theory in the 1950s.

Yerushalmy and medical diagnosis (1947)

The terms sensitivity and specificity were introduced by the American biostatistician Jacob Yerushalmy in a 1947 paper for the U.S. Public Health Reports titled "Statistical Problems in Assessing Methods of Medical Diagnosis, with Special Reference to X-ray Techniques." Yerushalmy was studying how reliably radiologists could diagnose tuberculosis from chest X-rays. He found that two readers looking at the same film often disagreed, and that the same reader looking at the same film twice could disagree with himself. To quantify this, he proposed evaluating any diagnostic test using two probabilities: "a measure of sensitivity or the probability of correct diagnosis of positive cases, and a measure of specificity or the probability of correct diagnosis of negative cases." That paper effectively gave the modern definition of TPR.

Signal detection theory (1954)

Independently, during and after World War II, radar engineers needed a way to describe the trade-off between detecting real aircraft and reacting to noise. Wilson P. Tanner and John A. Swets applied the framework to human perception in a 1954 paper in Psychological Review, "A decision-making theory of visual detection," introducing the receiver operating characteristic to psychology. In this tradition the true positive rate is called the hit rate and the false positive rate is called the false alarm rate. The signal detection framework established that any detector, biological or mechanical, can be characterized by a curve of hit rate versus false alarm rate as its decision criterion is varied, which is the modern ROC curve.

Convergence in machine learning

The medical and signal detection lineages converged in machine learning in the 1990s, when researchers including Foster Provost, Tom Fawcett, and others popularized ROC analysis as a tool for comparing classifiers under skewed class distributions and asymmetric misclassification costs. The recall name entered the same vocabulary from information retrieval, where it had been used since at least Cyril Cleverdon's Cranfield experiments in the 1960s. By the 2000s the three terms (TPR, sensitivity, recall) were used interchangeably in the machine learning literature, with the choice usually signaling the author's home discipline.

TPR and the ROC curve

Most classifiers do not output hard 0/1 labels directly. They output a continuous score, such as a probability, a margin, or a logit, and a threshold converts that score into a class label. Sweeping the threshold from very strict (predict positive only when very confident) to very lenient (predict positive almost always) traces out a curve in TPR-FPR space. This curve is the receiver operating characteristic (ROC). TPR sits on the y-axis, FPR on the x-axis.

A few canonical points on the ROC curve are worth memorizing:

Threshold behavior	TPR	FPR	Meaning
Predict positive for everything	1	1	All positives caught, but every negative is also flagged
Predict negative for everything	0	0	No false alarms, but every positive is missed
Perfect classifier	1	0	Top-left corner of ROC space
Random guessing	p	p	Diagonal line from (0,0) to (1,1)

The area under this curve, the AUC, has a clean probabilistic interpretation: it equals the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative instance. An AUC of 1.0 is a perfect ranker; 0.5 is no better than chance. Because the ROC curve is built from TPR and FPR, both of which condition only on the true class, the curve and its AUC are invariant to class prior shifts in the test set. That property makes ROC analysis attractive for imbalanced problems, although for very rare positives the precision-recall curve often gives a more honest picture.

Threshold tuning and the precision trade-off

Moving the decision threshold changes TPR and FPR in lockstep, and it changes precision in the opposite direction from recall. Lowering the threshold increases TPR (you catch more positives) but also increases FPR and lowers precision (you raise more false alarms). Raising the threshold does the reverse. The right operating point depends on the relative cost of false negatives versus false positives in the application.

Application	Cost asymmetry	Typical threshold choice
Cancer screening	Missing a tumor is far worse than a follow-up scan	Low threshold, high TPR, accept high FPR
Spam filtering	A wrongly blocked legitimate email is worse than a missed spam	High threshold, lower TPR, very low FPR
Fraud detection	Missing fraud and blocking a real customer both cost money	Tuned per merchant using cost-weighted ROC
Search ranking	Missing a relevant document hurts recall, surfacing junk hurts precision	Tuned by F1 or position-aware metrics

In medical screening it is common to fix a target sensitivity (for example, 95 percent) and report the corresponding specificity. In information retrieval it is more common to fix recall and report precision, or to compute F1. Both conventions are picking a single point on the same underlying ROC or PR curve.

TPR and class imbalance

A frequent source of confusion is whether TPR is robust to class imbalance. The arithmetic answer is yes: because TPR = TP / (TP + FN) only involves actual positives, multiplying the number of negatives by ten or by ten thousand leaves TPR unchanged. The interpretive answer is more nuanced. A model that predicts positive for every input has a TPR of 1.0 regardless of how rare the positive class is, because it catches every positive by definition. That model is useless. So TPR is necessary but not sufficient as a summary of classifier quality on imbalanced data, and it is almost always reported together with at least one of FPR, precision, or specificity.

For very imbalanced problems where the positive class is the minority of interest (rare disease screening, fraud, defect detection), the precision-recall curve and its area (average precision) are often more informative than the ROC curve, because precision actively penalizes the false positives that a low-prevalence ROC plot can hide.

Multi-class and multi-label extensions

TPR is naturally a binary-classification metric, but it generalizes to multi-class problems through one-vs-rest decomposition. For each class, treat that class as the positive label and all others as negative, compute the per-class TPR (which is the per-class recall), and then aggregate. The standard aggregation strategies, as implemented in scikit-learn's recall_score function, are:

Average	Formula	Behavior
macro	unweighted mean of per-class recalls	Treats every class equally, regardless of frequency
micro	total TP / (total TP + total FN) summed across classes	Equivalent to overall accuracy in the multi-class single-label case
weighted	mean of per-class recalls weighted by class support	Reflects performance on the actual class distribution
samples	per-sample recall averaged over samples	Used for multi-label problems

The choice matters. Macro recall punishes a model that ignores rare classes; micro recall does not. Reporting both is a common diagnostic for understanding whether a multi-class classifier is genuinely competent or is leaning on the dominant class.

Domain examples

Medical screening. A mammography program reports a sensitivity of 0.87, meaning 87 percent of women who actually have breast cancer in the screened population get a positive screening result. The remaining 13 percent are false negatives, the most consequential kind of error in cancer screening.

Fraud detection. A credit card issuer's model has a TPR of 0.65 at an FPR of 0.001. Sixty-five percent of fraudulent charges are flagged, and only 0.1 percent of legitimate charges trigger a false alarm. Because legitimate transactions outnumber fraud by roughly 1,000 to 1, even that low FPR translates into a substantial absolute number of false alarms, which is why precision is also tracked.

Information retrieval. A search engine returns 80 of the 100 truly relevant documents in its index for a given query, achieving a recall (TPR) of 0.80. The remaining 20 relevant documents are false negatives that the user never sees.

Spam filtering. An email provider tunes its classifier to a recall of 0.99 on spam while keeping the false positive rate on legitimate mail under 0.001, reflecting the asymmetric cost of accidentally blocking a real message.

Object detection. In computer vision, the recall of an object detector at a given intersection-over-union (IoU) threshold is the fraction of ground-truth objects that the detector successfully localized. Average precision, the area under the per-class precision-recall curve, is the standard summary for benchmarks like COCO.

Implementation in scikit-learn

The standard Python implementation lives in scikit-learn's metrics module. The relevant functions are recall_score, roc_curve, roc_auc_score, and the RocCurveDisplay plotting helper.

from sklearn.metrics import recall_score, roc_curve, roc_auc_score, RocCurveDisplay
import matplotlib.pyplot as plt

# Binary classification: TPR equals recall
y_true = [0, 1, 1, 0, 1, 0, 1, 1]
y_pred = [0, 1, 0, 0, 1, 1, 1, 1]

recall = recall_score(y_true, y_pred)
print(f"TPR (recall) = {recall:.3f}")  # 0.800

# ROC curve from scores
y_scores = [0.1, 0.9, 0.4, 0.2, 0.8, 0.6, 0.7, 0.95]
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
auc = roc_auc_score(y_true, y_scores)

RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=auc).plot()
plt.show()

The roc_curve function returns three arrays: false positive rates, true positive rates, and the decreasing thresholds used to compute them. As of scikit-learn 1.3, the first threshold is set to np.inf to represent a classifier that always predicts the negative class, so the curve always starts at (0, 0). For multi-class problems, recall_score accepts an average parameter taking the values described above.

Limitations and common mistakes

TPR is a single number summarizing one corner of classifier behavior, and treating it as a complete report is the most common mistake. A few specific pitfalls:

A classifier that always predicts the positive class has a perfect TPR of 1.0 and is useless. Always check FPR or precision alongside TPR.

TPR computed at a single fixed threshold can hide large differences between models that would be visible from the full ROC or PR curve. Whenever scores are available, prefer curve-based summaries.

TPR is unstable for very small positive classes. With ten true positives, a single misclassification swings TPR by 10 percentage points. Confidence intervals (Wilson, Clopper-Pearson, or bootstrap) are worth reporting.

In multi-class settings, micro-averaged recall in a single-label problem equals accuracy, which can mask poor performance on rare classes. Reporting macro recall as well is standard hygiene.

TPR is invariant to class prior on the test set, but precision is not. Models tuned and validated on a balanced test set may behave differently in deployment if the underlying base rate shifts. The TPR will hold; the precision will not.

References

Yerushalmy, J. (1947). "Statistical Problems in Assessing Methods of Medical Diagnosis, with Special Reference to X-ray Techniques." *Public Health Reports*, 62(40), 1432-1449.
Tanner, W. P., and Swets, J. A. (1954). "A decision-making theory of visual detection." *Psychological Review*, 61(6), 401-409.
Peterson, W. W., Birdsall, T. G., and Fox, W. C. (1954). "The theory of signal detectability." *Transactions of the IRE Professional Group on Information Theory*, 4(4), 171-212.
Fawcett, T. (2006). "An introduction to ROC analysis." *Pattern Recognition Letters*, 27(8), 861-874.
Provost, F., and Fawcett, T. (2001). "Robust Classification for Imprecise Environments." *Machine Learning*, 42(3), 203-231.
Pedregosa, F., et al. (2011). "Scikit-learn: Machine Learning in Python." *Journal of Machine Learning Research*, 12, 2825-2830. See `sklearn.metrics.recall_score` and `sklearn.metrics.roc_curve` documentation: https://scikit-learn.org/stable/modules/model_evaluation.html
Wikipedia contributors. "Sensitivity and specificity." https://en.wikipedia.org/wiki/Sensitivity_and_specificity
Wikipedia contributors. "Receiver operating characteristic." https://en.wikipedia.org/wiki/Receiver_operating_characteristic
Saito, T., and Rehmsmeier, M. (2015). "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets." *PLoS ONE*, 10(3), e0118432.
Hand, D. J. (2009). "Measuring classifier performance: a coherent alternative to the area under the ROC curve." *Machine Learning*, 77(1), 103-123.

True positive rate (TPR)

Explain like I'm 5 (ELI5)

Formal definition

Synonyms across fields

Relationship to the confusion matrix

Worked numerical example

Historical origins

Yerushalmy and medical diagnosis (1947)

Signal detection theory (1954)

Convergence in machine learning

TPR and the ROC curve

Threshold tuning and the precision trade-off

TPR and class imbalance

Multi-class and multi-label extensions

Domain examples

Implementation in scikit-learn

Limitations and common mistakes

See also

References

Improve this article

Explain like I'm 5 (ELI5)

Formal definition

Synonyms across fields

Relationship to the confusion matrix

Worked numerical example

Historical origins

Yerushalmy and medical diagnosis (1947)

Signal detection theory (1954)

Convergence in machine learning

TPR and the ROC curve

Threshold tuning and the precision trade-off

TPR and class imbalance

Multi-class and multi-label extensions

Domain examples

Implementation in scikit-learn

Limitations and common mistakes

See also

References

Explain like I'm 5 (ELI5)

Formal definition

Synonyms across fields

Relationship to the confusion matrix

Worked numerical example

Historical origins

Yerushalmy and medical diagnosis (1947)

Signal detection theory (1954)

Convergence in machine learning

TPR and the ROC curve

Threshold tuning and the precision trade-off

TPR and class imbalance

Multi-class and multi-label extensions

Domain examples

Implementation in scikit-learn

Limitations and common mistakes

See also

References

Improve this article

Related Articles

PR AUC (area under the PR curve)

True positive (TP)

IoU

Squared Hinge Loss

AUC (Area Under the ROC Curve)

Accuracy

Explain like I'm 5 (ELI5)

Formal definition

Synonyms across fields

Relationship to the confusion matrix

Worked numerical example

Historical origins

Yerushalmy and medical diagnosis (1947)

Signal detection theory (1954)

Convergence in machine learning

TPR and the ROC curve

Threshold tuning and the precision trade-off

TPR and class imbalance

Multi-class and multi-label extensions

Domain examples

Implementation in scikit-learn

Limitations and common mistakes

See also

References

Related Articles

PR AUC (area under the PR curve)

True positive (TP)

IoU

Squared Hinge Loss

AUC (Area Under the ROC Curve)

Accuracy