False Positive Rate (FPR)

The false positive rate (FPR), also called fall-out or the false alarm rate, is the proportion of actual negative instances that are incorrectly classified as positive by a test, model, or decision process. It is one of the most widely used metrics in binary classification, hypothesis testing, medical diagnostics, and signal detection theory. Formally, the FPR answers the question: "Of all the cases that are truly negative, how many did the system mistakenly label as positive?"

The false positive rate is mathematically equivalent to the Type I error rate (alpha) in statistical hypothesis testing and equals 1 minus the specificity (true negative rate) of a classifier or diagnostic test. It serves as the x-axis of the receiver operating characteristic (ROC) curve, making it central to evaluating classifier performance across all possible decision thresholds.

Explain like I'm 5 (ELI5)

Imagine you have a metal detector at the beach. Every time it beeps, you dig in the sand looking for treasure. Sometimes you find a real coin (that is a true positive). But sometimes the detector beeps over a soda can or a bottle cap, and you dig for nothing (that is a false positive). The false positive rate tells you how often the detector beeps when there is no treasure at all. If you checked 100 spots with no treasure and the detector beeped at 5 of them, your false positive rate would be 5 out of 100, or 5%. A lower false positive rate means the detector wastes less of your time digging up junk.

Formal definition

The false positive rate is defined as the conditional probability of a positive prediction given that the true class is negative:

FPR = FP / (FP + TN)

where:

FP (false positives) is the number of negative instances incorrectly classified as positive
TN (true negatives) is the number of negative instances correctly classified as negative
FP + TN is the total number of actual negative instances

Equivalently, using probability notation:

FPR = P(Predicted Positive | Actual Negative)

The FPR can also be expressed in terms of specificity:

FPR = 1 - Specificity = 1 - TNR

where Specificity (also called the true negative rate, TNR) equals TN / (TN + FP).

Relationship to the confusion matrix

The false positive rate is derived from the confusion matrix, which tabulates the four possible outcomes of a binary classification task. The following table summarizes these outcomes:

	Predicted positive	Predicted negative
Actual positive	True positive (TP)	False negative (FN)
Actual negative	False positive (FP)	True negative (TN)

From this matrix, several related metrics can be computed:

Metric	Formula	Description
Accuracy	(TP + TN) / (TP + TN + FP + FN)	Proportion of all correct predictions
Precision (PPV)	TP / (TP + FP)	Proportion of positive predictions that are correct
Recall (sensitivity, TPR)	TP / (TP + FN)	Proportion of actual positives correctly identified
Specificity (TNR)	TN / (TN + FP)	Proportion of actual negatives correctly identified
False positive rate (FPR)	FP / (FP + TN)	Proportion of actual negatives incorrectly classified as positive
False negative rate (FNR)	FN / (FN + TP)	Proportion of actual positives incorrectly classified as negative
F1 score	2 * (Precision * Recall) / (Precision + Recall)	Harmonic mean of precision and recall

Worked example

Consider a spam filter that classifies 1,000 emails. Suppose 200 emails are actually spam (positive class) and 800 are legitimate (negative class). The filter produces the following results:

	Predicted spam	Predicted not spam	Total
Actually spam	170 (TP)	30 (FN)	200
Actually not spam	40 (FP)	760 (TN)	800
Total	210	790	1,000

The false positive rate is:

FPR = 40 / (40 + 760) = 40 / 800 = 0.05 (5%)

This means 5% of legitimate emails were incorrectly flagged as spam. By contrast, the recall (sensitivity) is 170 / 200 = 0.85 (85%), and the precision is 170 / 210 = 0.81 (81%).

Historical background

Neyman-Pearson framework

The formal treatment of the false positive rate originated with Jerzy Neyman and Egon Pearson, who developed a rigorous framework for hypothesis testing between 1928 and 1933. In their landmark 1933 paper, "On the Problem of the Most Efficient Tests of Statistical Hypotheses" (published in the Philosophical Transactions of the Royal Society), they distinguished between two types of errors: incorrectly rejecting a true null hypothesis (Type I error, corresponding to the false positive rate) and failing to reject a false null hypothesis (Type II error). They formally labeled these "errors of type I and errors of type II respectively."

The Neyman-Pearson lemma demonstrates that the likelihood ratio test is the most powerful test for a given false positive rate. Specifically, it shows how to construct a test that maximizes statistical power (the probability of correctly rejecting a false null hypothesis) while constraining the Type I error rate to a pre-specified level alpha. This insight established the FPR as a controllable design parameter in statistical testing rather than merely an observed outcome.

Signal detection theory

In the 1940s and 1950s, the concept of the false positive rate found a parallel development in signal detection theory (SDT), which originated from radar research during World War II. Radar operators needed to distinguish genuine aircraft signals from random noise, and the false alarm rate (the probability of reporting a target when none was present) became a core metric.

Peterson, Birdsall, and Fox formalized this framework in 1954, and psychologists Wilson P. Tanner, David M. Green, and John A. Swets extended it to human perception and decision-making. Green and Swets's influential work demonstrated that traditional psychophysics methods failed to separate an observer's genuine sensitivity from their response bias. In SDT, the false alarm rate is the probability that an observer reports "signal present" when only noise is present, directly analogous to the false positive rate in classification.

FPR and the ROC curve

The receiver operating characteristic (ROC) curve plots the true positive rate (sensitivity) on the y-axis against the false positive rate on the x-axis at every possible classification threshold. Each point on the ROC curve represents an operating point, a specific threshold setting that yields a particular combination of TPR and FPR.

Key properties of ROC space include:

Point in ROC space	FPR	TPR	Interpretation
(0, 1)	0	1	Perfect classifier: no false positives, all true positives detected
(0, 0)	0	0	Classifier predicts all instances as negative
(1, 1)	1	1	Classifier predicts all instances as positive
(0.5, 0.5)	0.5	0.5	Random guessing (diagonal line)

The area under the ROC curve (AUC) summarizes overall classifier performance across all thresholds. An AUC of 1.0 indicates a perfect classifier, while an AUC of 0.5 corresponds to random chance. The AUC can be interpreted as the probability that the classifier assigns a higher score to a randomly chosen positive instance than to a randomly chosen negative instance, a property linked to the Mann-Whitney U statistic.

Threshold selection and the FPR-TPR tradeoff

Adjusting the classification threshold controls the tradeoff between the false positive rate and the true positive rate:

Lowering the threshold causes the model to classify more instances as positive, increasing both the TPR (more true positives are captured) and the FPR (more false positives occur).
Raising the threshold makes the model more selective, decreasing both the TPR and FPR.

The optimal threshold depends on the relative costs of false positives and false negatives in a given application. In some contexts, a high FPR is acceptable if the cost of missing a true positive is severe (for example, cancer screening). In other contexts, a low FPR is essential because false positives carry heavy consequences (for example, criminal sentencing).

FPR in signal detection theory

Signal detection theory provides a complementary framework for understanding the false positive rate, separating observer sensitivity from decision criteria.

Sensitivity (d-prime)

The sensitivity index d' (d-prime) measures how well an observer can distinguish signal from noise, independent of their tendency to say "yes" or "no." It is computed as:

d' = z(Hit Rate) - z(False Alarm Rate)

where z() denotes the inverse of the standard normal cumulative distribution function. A higher d' indicates better discrimination. Importantly, d' remains constant regardless of where the observer sets their decision criterion; only the balance between hits and false alarms changes.

Response bias and criterion

The decision criterion (often denoted as c or beta) determines the observer's willingness to report a signal. A liberal criterion means the observer says "yes" more readily, increasing both the hit rate and the false alarm rate. A conservative criterion means the observer requires stronger evidence before saying "yes," reducing both the hit rate and the false alarm rate. Bias and sensitivity are independent: an observer can have excellent discrimination (high d') but a very liberal criterion (high false alarm rate), or poor discrimination (low d') with a conservative criterion (low false alarm rate).

The false positive rate is sometimes confused with other error metrics that have different denominators and interpretations. The following table clarifies these distinctions:

Metric	Formula	Denominator	Interpretation
False positive rate (FPR)	FP / (FP + TN)	All actual negatives	Fraction of negatives misclassified as positive
False discovery rate (FDR)	FP / (FP + TP)	All predicted positives	Fraction of positive predictions that are wrong
Family-wise error rate (FWER)	P(at least 1 FP)	All tests (joint probability)	Probability of making any false positive across multiple tests
Type I error rate (alpha)	Set a priori	Theoretical, under H0	Pre-specified probability of rejecting a true null hypothesis
Precision (PPV)	TP / (TP + FP)	All predicted positives	Fraction of positive predictions that are correct (1 - FDR)

False positive rate vs. false discovery rate

A common source of confusion is the difference between FPR and FDR. The FPR conditions on actual negatives (how many negatives did the system misclassify?), while the FDR conditions on predicted positives (of the results flagged as positive, how many are actually negative?). When a single hypothesis test is performed, FPR and FDR can coincide. However, when multiple hypothesis tests are conducted simultaneously, they diverge. A p-value threshold of 0.05 implies that 5% of all tests will produce false positives (controlling the FPR). An FDR-adjusted threshold of 0.05 implies that 5% of the tests declared significant will be false positives.

The Benjamini-Hochberg procedure controls the FDR rather than the FPR or FWER and provides greater statistical power at the cost of allowing some individual false positives. This approach has become standard in genomics, neuroimaging, and other fields where thousands of tests are performed simultaneously.

False positive rate vs. family-wise error rate

The FWER is the probability of making at least one Type I error across a family of hypothesis tests. Unlike the FPR, which remains fixed at alpha for each individual test, the FWER increases with the number of tests. For m independent tests each conducted at significance level alpha, the FWER is:

FWER = 1 - (1 - alpha)^m

For example, with 20 independent tests at alpha = 0.05, the FWER is approximately 1 - (0.95)^20 = 0.64, meaning there is a 64% chance of at least one false positive. The Bonferroni correction addresses this by testing each hypothesis at alpha/m, controlling the FWER at alpha but at the cost of reduced power.

The base rate fallacy and FPR

The false positive rate alone does not determine how much trust to place in a positive result. The base rate (prevalence) of the condition in the population is equally important. The base rate fallacy occurs when people interpret a positive test result without accounting for how rare the condition actually is.

To understand why a positive result from a test with a low FPR can still be unreliable, consider Bayes' theorem:

P(Condition | Positive Test) = [Sensitivity * Prevalence] / [Sensitivity * Prevalence + FPR * (1 - Prevalence)]

This formula computes the positive predictive value (PPV), the probability that a positive result is correct.

Numerical example

Suppose a disease affects 1 in 1,000 people (prevalence = 0.1%). A diagnostic test has a sensitivity of 99% and a false positive rate of 1%:

Group	Population	Test positive	Test negative
Diseased (1 in 1,000)	100	99 (TP)	1 (FN)
Healthy (999 in 1,000)	99,900	999 (FP)	98,901 (TN)
Total	100,000	1,098	98,902

The PPV is 99 / 1,098 = 9.0%. Despite a 1% false positive rate and 99% sensitivity, only about 9% of positive results are correct because the condition is rare. This counterintuitive result demonstrates why FPR must always be considered alongside prevalence.

Applications across domains

Medical diagnostics

In clinical medicine, the false positive rate directly affects patient outcomes. Mammography screening for breast cancer has an FPR of approximately 10%, meaning roughly 1 in 10 women without cancer receive a false alarm that requires follow-up testing, additional imaging, or biopsy. While this rate is accepted because the cost of missing cancer is high, it still imposes financial burden and psychological stress.

During the COVID-19 pandemic, RT-PCR tests had false positive rates between 0.2% and 0.9%. Although this seems low, in populations with low prevalence, even a small FPR led to a substantial fraction of positive results being false. Rapid antigen tests exhibited higher false positive rates; one large workplace screening study found that 42% of positive rapid tests were false positives when confirmed by PCR.

Cybersecurity and intrusion detection

Intrusion detection systems (IDS) analyze network traffic to identify potential attacks. These systems face a particular challenge with false positives because normal network traffic vastly outnumbers actual attacks. Studies have found that over 90% of IDS alerts can be false positives, leading to alert fatigue where security analysts begin ignoring warnings. Reducing the FPR while maintaining detection sensitivity is one of the primary engineering challenges in this field.

Spam filtering

Email spam filters must balance catching unwanted messages against accidentally blocking legitimate emails. A false positive in spam filtering means a legitimate email is sent to the junk folder, potentially causing the recipient to miss important communication. Because users perceive missed legitimate emails as more damaging than receiving occasional spam, spam filters are typically tuned to maintain very low FPR (often below 0.1%) even at the cost of letting some spam through.

Criminal justice and biometrics

Facial recognition systems used in law enforcement must contend with extremely low base rates of wanted individuals in the general population. Even a system with a 0.1% false positive rate will generate many false matches when scanning thousands of faces daily. This has raised concerns about wrongful identification, especially given documented disparities in FPR across different demographic groups.

Drug discovery and clinical trials

In pharmaceutical research, a false positive occurs when a drug is concluded to be effective when it is not. The conventional alpha level of 0.05 means that approximately 1 in 20 tests of an ineffective drug will produce a statistically significant result by chance alone. For this reason, clinical trials require replication and often use more stringent significance thresholds.

Strategies for reducing the false positive rate

Several techniques can be applied to lower the FPR in classification and testing scenarios:

Strategy	How it works	Tradeoff
Raise the classification threshold	Require higher confidence before predicting positive	Increases false negatives (lower recall)
Cost-sensitive learning	Assign higher misclassification cost to false positives during training	May reduce overall accuracy
Ensemble methods	Combine multiple models and require consensus for positive predictions	Higher computational cost
Better feature engineering	Improve the input representation to help the model distinguish classes	Requires domain expertise
Data augmentation	Increase training data for underrepresented classes	Not always feasible
Model calibration	Use Platt scaling or isotonic regression to produce well-calibrated probabilities	Requires a held-out calibration set
Bonferroni or Benjamini-Hochberg correction	Adjust significance levels for multiple comparisons	Reduces statistical power
Two-stage testing	Use a sensitive first-pass screen followed by a specific confirmatory test	Increases time and cost

Threshold adjustment in practice

The most direct way to reduce FPR is to raise the decision threshold. In a logistic regression classifier, for example, the default threshold of 0.5 can be increased to 0.7 or 0.9. This forces the model to predict positive only when the estimated probability is very high. The ROC curve provides a visual guide for selecting the threshold: moving left along the x-axis (lower FPR) also moves down on the y-axis (lower TPR), so the optimal operating point depends on the application's tolerance for each type of error.

FPR in multiple hypothesis testing

When researchers test many hypotheses simultaneously (for example, testing whether each of 20,000 genes is differentially expressed), the expected number of false positives increases even if each individual test maintains a low FPR. If 20,000 tests are each conducted at alpha = 0.05, the expected number of false positives is 20,000 * 0.05 = 1,000, regardless of how many hypotheses are truly alternative.

Three main approaches address this problem:

Bonferroni correction: Test each hypothesis at alpha/m. For 20,000 tests at alpha = 0.05, each test uses a threshold of 0.0000025. This controls the FWER but is very conservative.
Holm-Bonferroni method: A step-down procedure that is uniformly more powerful than Bonferroni while still controlling the FWER.
Benjamini-Hochberg procedure: Controls the FDR rather than the FWER, allowing more discoveries at the cost of a controlled proportion of false positives among the rejected hypotheses.

Relationship to other concepts

The false positive rate connects to a broad network of statistical and machine learning concepts:

Specificity: FPR = 1 - specificity. A classifier with 95% specificity has a 5% false positive rate.
Sensitivity (recall, TPR): Together with FPR, sensitivity defines the ROC curve. The two are linked through the classification threshold but are not inherently inversely proportional; a good model can have both high sensitivity and low FPR.
Precision: Precision depends on both FPR and the class distribution (prevalence). Even a low FPR can yield low precision when positives are rare.
Negative predictive value (NPV): The probability that a negative prediction is correct. Like PPV, it depends on prevalence.
Loss function: Asymmetric loss functions can penalize false positives more heavily than false negatives (or vice versa), directly influencing the learned FPR.
Overfitting: A model that overfits to training data may have an artificially low FPR on training data but a much higher FPR on unseen test data.

Common misconceptions

"A low FPR means the test is reliable." A low FPR is necessary but not sufficient. If the condition being tested for is rare, even a low FPR can produce a high proportion of false results among positive predictions (low PPV), as the base rate fallacy demonstrates.
"FPR and FDR are the same thing." They use different denominators. FPR is conditioned on actual negatives; FDR is conditioned on predicted positives. In most practical settings, they yield different values.
"The significance level alpha is the false positive rate." While alpha is the maximum allowable probability of a Type I error under the null hypothesis, the observed false positive rate in practice can differ from alpha due to violations of test assumptions, multiple testing, or data dependencies.
"Minimizing FPR is always the right goal." In many applications, minimizing FPR at the expense of a high false negative rate can be worse. A cancer screening test that never flags anyone (FPR = 0) also catches no cancers (recall = 0).

Summary of key formulas

Formula	Expression
False positive rate	FP / (FP + TN)
Specificity	1 - FPR = TN / (TN + FP)
Sensitivity (TPR)	TP / (TP + FN)
Positive predictive value	TP / (TP + FP)
d-prime (SDT)	z(Hit Rate) - z(False Alarm Rate)
FWER (independent tests)	1 - (1 - alpha)^m
Bayes' PPV formula	(Sensitivity * Prevalence) / (Sensitivity * Prevalence + FPR * (1 - Prevalence))

References

Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. *Philosophical Transactions of the Royal Society of London, Series A*, 231, 289-337.
Green, D. M., & Swets, J. A. (1966). *Signal Detection Theory and Psychophysics*. New York: Wiley.
Fawcett, T. (2006). An introduction to ROC analysis. *Pattern Recognition Letters*, 27(8), 861-874.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. *Journal of the Royal Statistical Society, Series B*, 57(1), 289-300.
Burke, D. S., Brundage, J. F., Redfield, R. R., et al. (1988). Measurement of the false positive rate in a screening program for human immunodeficiency virus infections. *The New England Journal of Medicine*, 319(15), 961-964.
Peterson, W. W., Birdsall, T. G., & Fox, W. C. (1954). The theory of signal detectability. *Transactions of the IRE Professional Group on Information Theory*, 4(4), 171-212.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*, 143(1), 29-36.
Suissa, S., & Shuster, J. J. (1991). The 2 x 2 matched-pairs trial: Exact unconditional design and analysis. *Biometrics*, 47(2), 361-372.
Lehmann, E. L., & Romano, J. P. (2005). *Testing Statistical Hypotheses* (3rd ed.). New York: Springer.
Macmillan, N. A., & Creelman, C. D. (2005). *Detection Theory: A User's Guide* (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. *Scandinavian Journal of Statistics*, 6(2), 65-70.
Gigerenzer, G. (2002). *Calculated Risks: How to Know When Numbers Deceive You*. New York: Simon & Schuster.
Suero, M., et al. (2021). False positive results with SARS-CoV-2 RT-PCR tests and how to evaluate a RT-PCR-positive test for the possibility of a false positive result. *Journal of Clinical Microbiology*, 59(4), e02080-21.
Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. *Journal of Machine Learning Technologies*, 2(1), 37-63.

Explain like I'm 5 (ELI5)

Formal definition

Relationship to the confusion matrix

Worked example

Historical background

Neyman-Pearson framework

Signal detection theory

FPR and the ROC curve

Threshold selection and the FPR-TPR tradeoff

FPR in signal detection theory

Sensitivity (d-prime)

Response bias and criterion

FPR versus related error metrics

False positive rate vs. false discovery rate

False positive rate vs. family-wise error rate

The base rate fallacy and FPR

Numerical example

Applications across domains

Medical diagnostics

Cybersecurity and intrusion detection

Spam filtering

Criminal justice and biometrics

Drug discovery and clinical trials

Strategies for reducing the false positive rate

Threshold adjustment in practice

FPR in multiple hypothesis testing

Relationship to other concepts

Common misconceptions

Summary of key formulas

See also

References

Improve this article

Related Articles

False Negative Rate

ARC-AGI 2

AUC (Area Under the ROC Curve)

Accuracy

Classification Threshold

Confusion Matrix

Explain like I'm 5 (ELI5)

Formal definition

Relationship to the confusion matrix

Worked example

Historical background

Neyman-Pearson framework

Signal detection theory

FPR and the ROC curve

Threshold selection and the FPR-TPR tradeoff

FPR in signal detection theory

Sensitivity (d-prime)

Response bias and criterion

FPR versus related error metrics

False positive rate vs. false discovery rate

False positive rate vs. family-wise error rate

The base rate fallacy and FPR

Numerical example

Applications across domains

Medical diagnostics

Cybersecurity and intrusion detection

Spam filtering

Criminal justice and biometrics

Drug discovery and clinical trials

Strategies for reducing the false positive rate

Threshold adjustment in practice

FPR in multiple hypothesis testing

Relationship to other concepts

Common misconceptions

Summary of key formulas

See also

References

Related Articles

False Negative Rate

ARC-AGI 2

AUC (Area Under the ROC Curve)

Accuracy

Classification Threshold

Confusion Matrix