# False Positive Rate (FPR)

> Source: https://aiwiki.ai/wiki/false_positive_rate_fpr
> Updated: 2026-06-23
> Categories: Machine Learning, Model Evaluation, Statistics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

The **false positive rate** (**FPR**) is the proportion of actual negative cases that a test, model, or decision process incorrectly classifies as positive, defined as **FPR = FP / (FP + TN)** where FP is the number of false positives and TN the number of true negatives. Also called **fall-out** or the **false alarm rate**, it equals 1 minus [specificity](/wiki/specificity), is mathematically equal to the [Type I error](/wiki/type_i_and_type_ii_errors) rate (alpha) in hypothesis testing, and forms the x-axis of the [receiver operating characteristic (ROC) curve](/wiki/roc_receiver_operating_characteristic_curve).[1][3][14] In plain terms, the FPR answers: "Of all the cases that are truly negative, how many did the system mistakenly label as positive?"

The false positive rate is one of the most widely used metrics in [binary classification](/wiki/binary_classification), [hypothesis testing](/wiki/hypothesis_testing), medical diagnostics, and [signal detection theory](/wiki/signal_detection_theory).[14] It is distinct from a single [false positive](/wiki/false_positive) (one misclassified case) and from the false discovery rate, which has a different denominator, and a low FPR alone does not guarantee a trustworthy positive result because of the base rate fallacy (see below).[4][12]

## Explain like I'm 5 (ELI5)

Imagine you have a metal detector at the beach. Every time it beeps, you dig in the sand looking for treasure. Sometimes you find a real coin (that is a true positive). But sometimes the detector beeps over a soda can or a bottle cap, and you dig for nothing (that is a false positive). The false positive rate tells you how often the detector beeps when there is no treasure at all. If you checked 100 spots with no treasure and the detector beeped at 5 of them, your false positive rate would be 5 out of 100, or 5%. A lower false positive rate means the detector wastes less of your time digging up junk.

## What is the formula for the false positive rate?

The false positive rate is defined as the conditional probability of a positive prediction given that the true class is negative:

**FPR = FP / (FP + TN)**

where:

- **FP** (false positives) is the number of negative instances incorrectly classified as positive
- **TN** (true negatives) is the number of negative instances correctly classified as negative
- **FP + TN** is the total number of actual negative instances

Equivalently, using probability notation:

**FPR = P(Predicted Positive | Actual Negative)**

The FPR can also be expressed in terms of specificity:

**FPR = 1 - Specificity = 1 - TNR**

where Specificity (also called the true negative rate, TNR) equals TN / (TN + FP). Tom Fawcett, whose 2006 ROC tutorial is among the most-cited references in the field, defines the same quantity directly as "Negatives incorrectly classified / Total negatives" and notes that "specificity = 1 - fp rate."[3]

## How does the FPR relate to the confusion matrix?

The false positive rate is derived from the [confusion matrix](/wiki/confusion_matrix), which tabulates the four possible outcomes of a binary classification task.[14] The following table summarizes these outcomes:

| | **Predicted positive** | **Predicted negative** |
|---|---|---|
| **Actual positive** | True positive (TP) | False negative (FN) |
| **Actual negative** | **False positive (FP)** | True negative (TN) |

From this matrix, several related metrics can be computed:

| Metric | Formula | Description |
|---|---|---|
| [Accuracy](/wiki/accuracy) | (TP + TN) / (TP + TN + FP + FN) | Proportion of all correct predictions |
| [Precision](/wiki/precision) (PPV) | TP / (TP + FP) | Proportion of positive predictions that are correct |
| [Recall](/wiki/recall) (sensitivity, TPR) | TP / (TP + FN) | Proportion of actual positives correctly identified |
| Specificity (TNR) | TN / (TN + FP) | Proportion of actual negatives correctly identified |
| **False positive rate (FPR)** | **FP / (FP + TN)** | **Proportion of actual negatives incorrectly classified as positive** |
| [False negative rate](/wiki/false_negative_fn) (FNR) | FN / (FN + TP) | Proportion of actual positives incorrectly classified as negative |
| [F1 score](/wiki/f1_score) | 2 * (Precision * Recall) / (Precision + Recall) | Harmonic mean of precision and recall |

## Worked example

Consider a spam filter that classifies 1,000 emails. Suppose 200 emails are actually spam (positive class) and 800 are legitimate (negative class). The filter produces the following results:

| | **Predicted spam** | **Predicted not spam** | **Total** |
|---|---|---|---|
| **Actually spam** | 170 (TP) | 30 (FN) | 200 |
| **Actually not spam** | 40 (FP) | 760 (TN) | 800 |
| **Total** | 210 | 790 | 1,000 |

The false positive rate is:

**FPR = 40 / (40 + 760) = 40 / 800 = 0.05 (5%)**

This means 5% of legitimate emails were incorrectly flagged as spam. By contrast, the recall (sensitivity) is 170 / 200 = 0.85 (85%), and the precision is 170 / 210 = 0.81 (81%).

## Where does the false positive rate come from?

### Neyman-Pearson framework

The formal treatment of the false positive rate originated with [Jerzy Neyman](/wiki/jerzy_neyman) and [Egon Pearson](/wiki/egon_pearson), who developed a rigorous framework for hypothesis testing between 1928 and 1933. In their landmark 1933 paper, "On the Problem of the Most Efficient Tests of Statistical Hypotheses" (published in the *Philosophical Transactions of the Royal Society*), they distinguished between two types of errors: incorrectly rejecting a true null hypothesis (Type I error, corresponding to the false positive rate) and failing to reject a false null hypothesis (Type II error). They formally labeled these "errors of type I and errors of type II respectively."[1]

The Neyman-Pearson lemma demonstrates that the likelihood ratio test is the most powerful test for a given false positive rate. Specifically, it shows how to construct a test that maximizes statistical power (the probability of correctly rejecting a false null hypothesis) while constraining the Type I error rate to a pre-specified level alpha.[1] This insight established the FPR as a controllable design parameter in statistical testing rather than merely an observed outcome.

### Signal detection theory

In the 1940s and 1950s, the concept of the false positive rate found a parallel development in signal detection theory (SDT), which originated from radar research during World War II. Radar operators needed to distinguish genuine aircraft signals from random noise, and the false alarm rate (the probability of reporting a target when none was present) became a core metric.

Peterson, Birdsall, and Fox formalized this framework in 1954, and psychologists Wilson P. Tanner, David M. Green, and John A. Swets extended it to human perception and decision-making.[6] Green and Swets's influential work demonstrated that traditional psychophysics methods failed to separate an observer's genuine sensitivity from their response bias.[2] In SDT, the false alarm rate is the probability that an observer reports "signal present" when only noise is present, directly analogous to the false positive rate in classification.

## Why is the FPR the x-axis of the ROC curve?

The [receiver operating characteristic (ROC) curve](/wiki/roc_receiver_operating_characteristic_curve) plots the true positive rate (sensitivity) on the y-axis against the false positive rate on the x-axis at every possible classification threshold.[3] As Fawcett states in his canonical tutorial, "ROC graphs are two-dimensional graphs in which tp rate is plotted on the Y axis and fp rate is plotted on the X axis."[3] Each point on the ROC curve represents an operating point, a specific threshold setting that yields a particular combination of TPR and FPR.

Key properties of ROC space include:

| Point in ROC space | FPR | TPR | Interpretation |
|---|---|---|---|
| (0, 1) | 0 | 1 | Perfect classifier: no false positives, all true positives detected |
| (0, 0) | 0 | 0 | Classifier predicts all instances as negative |
| (1, 1) | 1 | 1 | Classifier predicts all instances as positive |
| (0.5, 0.5) | 0.5 | 0.5 | Random guessing (diagonal line) |

The [area under the ROC curve (AUC)](/wiki/auc_area_under_the_roc_curve) summarizes overall classifier performance across all thresholds. An AUC of 1.0 indicates a perfect classifier, while an AUC of 0.5 corresponds to random chance.[7] The AUC can be interpreted as the probability that the classifier assigns a higher score to a randomly chosen positive instance than to a randomly chosen negative instance, a property linked to the Mann-Whitney U statistic.[7]

### Threshold selection and the FPR-TPR tradeoff

Adjusting the classification threshold controls the tradeoff between the false positive rate and the true positive rate:

- **Lowering the threshold** causes the model to classify more instances as positive, increasing both the TPR (more true positives are captured) and the FPR (more false positives occur).
- **Raising the threshold** makes the model more selective, decreasing both the TPR and FPR.

The optimal threshold depends on the relative costs of false positives and false negatives in a given application.[3] In some contexts, a high FPR is acceptable if the cost of missing a true positive is severe (for example, cancer screening). In other contexts, a low FPR is essential because false positives carry heavy consequences (for example, criminal sentencing).

## FPR in signal detection theory

Signal detection theory provides a complementary framework for understanding the false positive rate, separating observer sensitivity from decision criteria.[2]

### Sensitivity (d-prime)

The sensitivity index d' (d-prime) measures how well an observer can distinguish signal from noise, independent of their tendency to say "yes" or "no." It is computed as:

**d' = z(Hit Rate) - z(False Alarm Rate)**

where z() denotes the inverse of the standard normal cumulative distribution function. A higher d' indicates better discrimination. Importantly, d' remains constant regardless of where the observer sets their decision criterion; only the balance between hits and false alarms changes.[10]

### Response bias and criterion

The decision criterion (often denoted as *c* or *beta*) determines the observer's willingness to report a signal. A liberal criterion means the observer says "yes" more readily, increasing both the hit rate and the false alarm rate. A conservative criterion means the observer requires stronger evidence before saying "yes," reducing both the hit rate and the false alarm rate. Bias and sensitivity are independent: an observer can have excellent discrimination (high d') but a very liberal criterion (high false alarm rate), or poor discrimination (low d') with a conservative criterion (low false alarm rate).[10]

## How does the FPR differ from related error metrics?

The false positive rate is sometimes confused with other error metrics that have different denominators and interpretations. The following table clarifies these distinctions:

| Metric | Formula | Denominator | Interpretation |
|---|---|---|---|
| False positive rate (FPR) | FP / (FP + TN) | All actual negatives | Fraction of negatives misclassified as positive |
| False discovery rate (FDR) | FP / (FP + TP) | All predicted positives | Fraction of positive predictions that are wrong |
| Family-wise error rate (FWER) | P(at least 1 FP) | All tests (joint probability) | Probability of making any false positive across multiple tests |
| Type I error rate (alpha) | Set a priori | Theoretical, under H0 | Pre-specified probability of rejecting a true null hypothesis |
| [Precision](/wiki/precision) (PPV) | TP / (TP + FP) | All predicted positives | Fraction of positive predictions that are correct (1 - FDR) |

### False positive rate vs. false discovery rate

A common source of confusion is the difference between FPR and FDR. The FPR conditions on actual negatives (how many negatives did the system misclassify?), while the FDR conditions on predicted positives (of the results flagged as positive, how many are actually negative?). When a single hypothesis test is performed, FPR and FDR can coincide. However, when multiple hypothesis tests are conducted simultaneously, they diverge. A p-value threshold of 0.05 implies that 5% of all tests will produce false positives (controlling the FPR). An FDR-adjusted threshold of 0.05 implies that 5% of the tests declared significant will be false positives.[4]

The Benjamini-Hochberg procedure controls the FDR rather than the FPR or FWER and provides greater statistical power at the cost of allowing some individual false positives.[4] This approach has become standard in genomics, neuroimaging, and other fields where thousands of tests are performed simultaneously.

### False positive rate vs. family-wise error rate

The FWER is the probability of making at least one Type I error across a family of hypothesis tests. Unlike the FPR, which remains fixed at alpha for each individual test, the FWER increases with the number of tests. For *m* independent tests each conducted at significance level alpha, the FWER is:

**FWER = 1 - (1 - alpha)^m**

For example, with 20 independent tests at alpha = 0.05, the FWER is approximately 1 - (0.95)^20 = 0.64, meaning there is a 64% chance of at least one false positive. The Bonferroni correction addresses this by testing each hypothesis at alpha/m, controlling the FWER at alpha but at the cost of reduced power.[11]

## Why can a positive result be wrong even with a low FPR?

The false positive rate alone does not determine how much trust to place in a positive result. The [base rate](/wiki/base_rate) (prevalence) of the condition in the population is equally important. The **base rate fallacy** occurs when people interpret a positive test result without accounting for how rare the condition actually is.[12]

To understand why a positive result from a test with a low FPR can still be unreliable, consider [Bayes' theorem](/wiki/bayes_theorem):

**P(Condition | Positive Test) = [Sensitivity * Prevalence] / [Sensitivity * Prevalence + FPR * (1 - Prevalence)]**

This formula computes the positive predictive value (PPV), the probability that a positive result is correct.

### Numerical example

Suppose a disease affects 1 in 1,000 people (prevalence = 0.1%). A diagnostic test has a sensitivity of 99% and a false positive rate of 1%:

| Group | Population | Test positive | Test negative |
|---|---|---|---|
| Diseased (1 in 1,000) | 100 | 99 (TP) | 1 (FN) |
| Healthy (999 in 1,000) | 99,900 | 999 (FP) | 98,901 (TN) |
| **Total** | **100,000** | **1,098** | **98,902** |

The PPV is 99 / 1,098 = 9.0%. Despite a 1% false positive rate and 99% sensitivity, only about 9% of positive results are correct because the condition is rare.[12] This counterintuitive result demonstrates why FPR must always be considered alongside prevalence.

## Applications across domains

### Medical diagnostics

In clinical medicine, the false positive rate directly affects patient outcomes. Mammography screening for breast cancer recalls roughly 10-12% of women for additional workup, meaning about 1 in 10 women without cancer receive a false alarm that requires follow-up imaging or biopsy.[5] The burden compounds over time: a JAMA Network Open analysis of more than 900,000 screening exams found that after 10 years of annual digital mammography, the cumulative probability of at least one false-positive recall reached roughly 50% for women aged 40 to 79, falling to about 36% with biennial (every two years) screening.[15] While these rates are accepted because the cost of missing cancer is high, they still impose financial burden and psychological stress.

During the COVID-19 pandemic, RT-PCR tests had false positive rates between 0.2% and 0.9%.[13] Although this seems low, in populations with low prevalence, even a small FPR led to a substantial fraction of positive results being false. Rapid antigen tests exhibited higher false positive rates; one large workplace screening study found that 42% of positive rapid tests were false positives when confirmed by PCR.[13]

### Cybersecurity and intrusion detection

Intrusion detection systems (IDS) analyze network traffic to identify potential attacks. These systems face a particular challenge with false positives because normal network traffic vastly outnumbers actual attacks. Studies have found that over 90% of IDS alerts can be false positives, leading to alert fatigue where security analysts begin ignoring warnings. Reducing the FPR while maintaining detection sensitivity is one of the primary engineering challenges in this field.

### Spam filtering

Email spam filters must balance catching unwanted messages against accidentally blocking legitimate emails. A false positive in spam filtering means a legitimate email is sent to the junk folder, potentially causing the recipient to miss important communication. Because users perceive missed legitimate emails as more damaging than receiving occasional spam, spam filters are typically tuned to maintain very low FPR (often below 0.1%) even at the cost of letting some spam through.

### Criminal justice and biometrics

Facial recognition systems used in law enforcement must contend with extremely low base rates of wanted individuals in the general population. Even a system with a 0.1% false positive rate will generate many false matches when scanning thousands of faces daily.[12] This has raised concerns about wrongful identification, especially given documented disparities in FPR across different demographic groups.

### Drug discovery and clinical trials

In pharmaceutical research, a false positive occurs when a drug is concluded to be effective when it is not. The conventional alpha level of 0.05 means that approximately 1 in 20 tests of an ineffective drug will produce a statistically significant result by chance alone.[9] For this reason, clinical trials require replication and often use more stringent significance thresholds.

## How can the false positive rate be reduced?

Several techniques can be applied to lower the FPR in classification and testing scenarios:

| Strategy | How it works | Tradeoff |
|---|---|---|
| Raise the classification threshold | Require higher confidence before predicting positive | Increases false negatives (lower recall) |
| Cost-sensitive learning | Assign higher misclassification cost to false positives during [training](/wiki/training) | May reduce overall accuracy |
| [Ensemble methods](/wiki/ensemble) | Combine multiple models and require consensus for positive predictions | Higher computational cost |
| Better [feature engineering](/wiki/feature_engineering) | Improve the input representation to help the model distinguish classes | Requires domain expertise |
| [Data augmentation](/wiki/data_augmentation) | Increase training data for underrepresented classes | Not always feasible |
| Model calibration | Use Platt scaling or isotonic regression to produce well-calibrated probabilities | Requires a held-out calibration set |
| Bonferroni or Benjamini-Hochberg correction | Adjust significance levels for multiple comparisons | Reduces statistical power |
| Two-stage testing | Use a sensitive first-pass screen followed by a specific confirmatory test | Increases time and cost |

### Threshold adjustment in practice

The most direct way to reduce FPR is to raise the decision threshold. In a [logistic regression](/wiki/logistic_regression) classifier, for example, the default threshold of 0.5 can be increased to 0.7 or 0.9. This forces the model to predict positive only when the estimated probability is very high. The ROC curve provides a visual guide for selecting the threshold: moving left along the x-axis (lower FPR) also moves down on the y-axis (lower TPR), so the optimal operating point depends on the application's tolerance for each type of error.[3]

## FPR in multiple hypothesis testing

When researchers test many hypotheses simultaneously (for example, testing whether each of 20,000 genes is differentially expressed), the expected number of false positives increases even if each individual test maintains a low FPR. If 20,000 tests are each conducted at alpha = 0.05, the expected number of false positives is 20,000 * 0.05 = 1,000, regardless of how many hypotheses are truly alternative.[4]

Three main approaches address this problem:

1. **Bonferroni correction**: Test each hypothesis at alpha/m. For 20,000 tests at alpha = 0.05, each test uses a threshold of 0.0000025. This controls the FWER but is very conservative.
2. **Holm-Bonferroni method**: A step-down procedure that is uniformly more powerful than Bonferroni while still controlling the FWER.[11]
3. **Benjamini-Hochberg procedure**: Controls the FDR rather than the FWER, allowing more discoveries at the cost of a controlled proportion of false positives among the rejected hypotheses.[4]

## Relationship to other concepts

The false positive rate connects to a broad network of statistical and machine learning concepts:

- **Specificity**: FPR = 1 - specificity. A classifier with 95% specificity has a 5% false positive rate.
- **Sensitivity (recall, TPR)**: Together with FPR, sensitivity defines the ROC curve. The two are linked through the classification threshold but are not inherently inversely proportional; a good model can have both high sensitivity and low FPR.
- **[Precision](/wiki/precision)**: Precision depends on both FPR and the class distribution (prevalence). Even a low FPR can yield low precision when positives are rare.
- **[Negative predictive value](/wiki/negative_predictive_value) (NPV)**: The probability that a negative prediction is correct. Like PPV, it depends on prevalence.
- **[Loss function](/wiki/loss_function)**: Asymmetric loss functions can penalize false positives more heavily than false negatives (or vice versa), directly influencing the learned FPR.
- **[Overfitting](/wiki/overfitting)**: A model that overfits to training data may have an artificially low FPR on training data but a much higher FPR on unseen test data.

## Common misconceptions

1. **"A low FPR means the test is reliable."** A low FPR is necessary but not sufficient. If the condition being tested for is rare, even a low FPR can produce a high proportion of false results among positive predictions (low PPV), as the base rate fallacy demonstrates.[12]

2. **"FPR and FDR are the same thing."** They use different denominators. FPR is conditioned on actual negatives; FDR is conditioned on predicted positives. In most practical settings, they yield different values.[4]

3. **"The significance level alpha is the false positive rate."** While alpha is the maximum allowable probability of a Type I error under the null hypothesis, the observed false positive rate in practice can differ from alpha due to violations of test assumptions, multiple testing, or data dependencies.[9]

4. **"Minimizing FPR is always the right goal."** In many applications, minimizing FPR at the expense of a high false negative rate can be worse. A cancer screening test that never flags anyone (FPR = 0) also catches no cancers (recall = 0).

## Summary of key formulas

| Formula | Expression |
|---|---|
| False positive rate | FP / (FP + TN) |
| Specificity | 1 - FPR = TN / (TN + FP) |
| Sensitivity (TPR) | TP / (TP + FN) |
| Positive predictive value | TP / (TP + FP) |
| d-prime (SDT) | z(Hit Rate) - z(False Alarm Rate) |
| FWER (independent tests) | 1 - (1 - alpha)^m |
| Bayes' PPV formula | (Sensitivity * Prevalence) / (Sensitivity * Prevalence + FPR * (1 - Prevalence)) |

## See also

- [Confusion matrix](/wiki/confusion_matrix)
- [ROC curve](/wiki/roc_receiver_operating_characteristic_curve)
- [AUC (area under the ROC curve)](/wiki/auc_area_under_the_roc_curve)
- [False positive](/wiki/false_positive)
- [Precision](/wiki/precision)
- [Recall](/wiki/recall)
- [F1 score](/wiki/f1_score)
- [Accuracy](/wiki/accuracy)
- [Binary classification](/wiki/binary_classification)
- [Type I and Type II errors](/wiki/type_i_and_type_ii_errors)
- [Overfitting](/wiki/overfitting)

## References

1. Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. *Philosophical Transactions of the Royal Society of London, Series A*, 231, 289-337.
2. Green, D. M., & Swets, J. A. (1966). *Signal Detection Theory and Psychophysics*. New York: Wiley.
3. Fawcett, T. (2006). An introduction to ROC analysis. *Pattern Recognition Letters*, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
4. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. *Journal of the Royal Statistical Society, Series B*, 57(1), 289-300.
5. Burke, D. S., Brundage, J. F., Redfield, R. R., et al. (1988). Measurement of the false positive rate in a screening program for human immunodeficiency virus infections. *The New England Journal of Medicine*, 319(15), 961-964.
6. Peterson, W. W., Birdsall, T. G., & Fox, W. C. (1954). The theory of signal detectability. *Transactions of the IRE Professional Group on Information Theory*, 4(4), 171-212.
7. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*, 143(1), 29-36.
8. Suissa, S., & Shuster, J. J. (1991). The 2 x 2 matched-pairs trial: Exact unconditional design and analysis. *Biometrics*, 47(2), 361-372.
9. Lehmann, E. L., & Romano, J. P. (2005). *Testing Statistical Hypotheses* (3rd ed.). New York: Springer.
10. Macmillan, N. A., & Creelman, C. D. (2005). *Detection Theory: A User's Guide* (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
11. Holm, S. (1979). A simple sequentially rejective multiple test procedure. *Scandinavian Journal of Statistics*, 6(2), 65-70.
12. Gigerenzer, G. (2002). *Calculated Risks: How to Know When Numbers Deceive You*. New York: Simon & Schuster.
13. Suero, M., et al. (2021). False positive results with SARS-CoV-2 RT-PCR tests and how to evaluate a RT-PCR-positive test for the possibility of a false positive result. *Journal of Clinical Microbiology*, 59(4), e02080-21.
14. Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. *Journal of Machine Learning Technologies*, 2(1), 37-63.
15. Ho, T. H., Bissell, M. C. S., Kerlikowske, K., et al. (2022). Cumulative probability of false-positive results after 10 years of screening with digital breast tomosynthesis vs digital mammography. *JAMA Network Open*, 5(3), e222440. https://doi.org/10.1001/jamanetworkopen.2022.2440
