# False Negative Rate

> Source: https://aiwiki.ai/wiki/false_negative_rate
> Updated: 2026-06-24
> Categories: Machine Learning, Model Evaluation, Statistics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

The **false negative rate** (FNR), also known as the **miss rate**, is the proportion of actual positive instances that a model or test incorrectly classifies as negative, computed as FNR = FN / (FN + TP). It is the complement of [sensitivity](/wiki/true_positive_rate_tpr) (also called [recall](/wiki/recall) or the true positive rate), so FNR = 1 - Recall, and in [statistical hypothesis testing](/wiki/true_positive_rate_tpr) it equals the probability of a Type II error, denoted by the Greek letter beta. The standard reference defines it tersely: "False negative rate (FNR), miss rate = FN / P = 1 - TPR." [1]

The false negative rate matters most in domains where failing to detect a positive case carries serious consequences, such as medical screening, [fraud detection](/wiki/fraud_detection), and security systems. A model with a high FNR misses too many positive cases, which can lead to delayed diagnoses, undetected threats, or financial losses. Because FNR = 1 - Recall, the two metrics move in lockstep: any increase in recall lowers the false negative rate by the same amount.

## Explain like I'm 5 (ELI5)

Imagine you have a basket full of red and blue balls, and your job is to pick out all the red ones. Every time you look at a red ball but accidentally think it is blue and leave it behind, that is a "false negative." The false negative rate tells you how often you make this kind of mistake. If there are 10 red balls and you miss 2 of them, your false negative rate is 2 out of 10, or 20%. A lower number means you are better at finding the red balls.

## What is the false negative rate (formal definition)?

The false negative rate is defined as the conditional probability that a classifier produces a negative prediction given that the true class is positive.

**FNR = FN / (FN + TP)**

Where:

- **FN** (false negatives): the number of positive instances incorrectly predicted as negative
- **TP** (true positives): the number of positive instances correctly predicted as positive
- **FN + TP**: the total number of actual positive instances (also called the condition positive count, P)

The FNR always falls between 0 and 1 (or 0% and 100%). A value of 0 means the classifier correctly identifies every positive instance, while a value of 1 means it misses all of them. [3] [11]

### How does the false negative rate relate to other metrics?

The false negative rate is directly related to several other evaluation metrics. The table below summarizes these connections.

| Metric | Formula | Relationship to FNR |
|---|---|---|
| [Sensitivity](/wiki/true_positive_rate_tpr) (recall, true positive rate) | TP / (TP + FN) | FNR = 1 - Sensitivity |
| [Recall](/wiki/recall) | TP / (TP + FN) | Recall and sensitivity are identical; FNR = 1 - Recall |
| Specificity (true negative rate) | TN / (TN + FP) | Independent of FNR; measures performance on negative class |
| [False positive rate](/wiki/false_positive_rate_fpr) | FP / (FP + TN) | FPR = 1 - Specificity; measures errors on negatives, not positives |
| [Precision](/wiki/precision) (positive predictive value) | TP / (TP + FP) | Precision and FNR are not complements but are both affected by threshold changes |
| [F1 score](/wiki/f1_score) | 2 x (Precision x Recall) / (Precision + Recall) | FNR affects F1 through its impact on recall |
| Type II error rate (beta) | P(fail to reject H0 \| H1 true) | In hypothesis testing, beta equals the FNR |
| Statistical power (1 - beta) | P(reject H0 \| H1 true) | Power = 1 - FNR = Sensitivity |

Because FNR = 1 - Recall, any improvement in recall directly lowers the false negative rate, and vice versa. The same identity holds on the hypothesis-testing side: the standard formulation states that "Power = sensitivity = 1 - beta," which makes the false negative rate (beta) and statistical power exact complements. [1]

## How is the false negative rate read from a confusion matrix?

The false negative rate is derived from the [confusion matrix](/wiki/confusion_matrix), a table that summarizes the predictions of a [binary classifier](/wiki/binary_classification) against the ground truth labels. [3]

| | Predicted positive | Predicted negative |
|---|---|---|
| **Actual positive** | True positive (TP) | False negative (FN) |
| **Actual negative** | False positive (FP) | True negative (TN) |

The FNR is computed from the top row of the confusion matrix: the false negatives divided by the sum of true positives and false negatives. This is the error rate conditioned on the actual positives.

### Worked example

Consider a medical screening model that classifies 1,000 patient samples as either "disease present" (positive) or "disease absent" (negative). The results are:

| | Predicted positive | Predicted negative | Row total |
|---|---|---|---|
| **Actual positive** | 72 (TP) | 8 (FN) | 80 |
| **Actual negative** | 45 (FP) | 875 (TN) | 920 |
| **Column total** | 117 | 883 | 1,000 |

From this confusion matrix:

- FNR = FN / (FN + TP) = 8 / (8 + 72) = 8 / 80 = 0.10 (10%)
- Sensitivity (recall) = TP / (TP + FN) = 72 / 80 = 0.90 (90%)
- FNR + Sensitivity = 0.10 + 0.90 = 1.00

The model misses 10% of patients who actually have the disease. In medical contexts, this 10% miss rate may be unacceptable because those patients will not receive timely treatment.

## How does FNR relate to Type II error in hypothesis testing?

In the framework developed by Jerzy Neyman and Egon Pearson in 1933, hypothesis testing involves deciding between a null hypothesis (H0) and an alternative hypothesis (H1). Neyman and Pearson named the two sources of error "errors of type I and errors of type II respectively," and showed that without an explicit alternative hypothesis it is impossible to compute the probability of the second kind of error. [2]

| Error type | Definition | Probability | Also known as |
|---|---|---|---|
| Type I error | Rejecting H0 when H0 is true | alpha | False positive, false alarm |
| Type II error | Failing to reject H0 when H1 is true | beta | False negative, miss |

The false negative rate in this context is beta, the probability of failing to detect a real effect. The complement of beta is the **statistical power** of the test (1 - beta), which represents the probability of correctly rejecting a false null hypothesis.

The Neyman-Pearson lemma provides a theoretical foundation for constructing hypothesis tests that maximize statistical power (minimize beta) for a given significance level alpha. This lemma shows that the likelihood ratio test is the most powerful test for simple hypotheses. [2]

Several factors influence the Type II error rate:

- **Sample size**: Larger samples reduce beta by increasing the test's ability to detect a true effect.
- **Effect size**: Larger effects are easier to detect, leading to lower beta.
- **Significance level (alpha)**: Decreasing alpha (making the test more stringent against false positives) generally increases beta, and vice versa. This is the fundamental tradeoff between Type I and Type II errors.
- **Variance**: Higher variability in the data makes it harder to detect effects, increasing beta.

## False negative rate in machine learning

### Binary classification

In [binary classification](/wiki/binary_classification), a model assigns each input to one of two classes: positive or negative. The false negative rate measures the fraction of true positives that the model fails to identify. Many classifiers output a probability score, and a [classification threshold](/wiki/classification_threshold) determines the cutoff above which an instance is labeled positive.

Lowering the threshold causes the model to classify more instances as positive, which typically reduces the number of false negatives (and thus reduces the FNR), but it also increases the number of false positives. This fundamental tradeoff is visualized by the [ROC curve](/wiki/roc_receiver_operating_characteristic_curve) and the [precision-recall curve](/wiki/precision-recall_curve). [3]

### Multiclass classification

In [multiclass classification](/wiki/multi-class_classification), the false negative rate can be computed on a per-class basis using the one-vs-rest approach. For each class, instances belonging to that class are treated as positives, and all other instances are treated as negatives. The FNR for each class is then calculated using the same formula: FN / (FN + TP). [5]

Per-class FNR values can be aggregated in several ways:

| Aggregation method | Description |
|---|---|
| Macro average | Unweighted mean of per-class FNR values; treats all classes equally |
| Weighted average | Weighted mean of per-class FNR values using class support (number of instances) as weights |
| Micro average | Computed from global FN and TP counts across all classes; equivalent to overall FNR |

### Imbalanced datasets

The false negative rate is particularly informative when working with [imbalanced datasets](/wiki/class-imbalanced_dataset), where one class is much more prevalent than the other. In such settings, overall [accuracy](/wiki/accuracy) can be misleadingly high because the model can achieve good accuracy simply by predicting the majority class. The FNR reveals whether the model is actually detecting the minority (positive) class. [9]

For example, in a dataset where only 1% of transactions are fraudulent, a model that labels every transaction as legitimate achieves 99% accuracy but has an FNR of 100%, meaning it misses every single fraudulent transaction.

## How does the decision threshold trade off false negatives against false positives?

Most probabilistic classifiers produce a continuous score (e.g., a probability between 0 and 1), which is converted into a binary prediction using a decision threshold. The choice of threshold directly controls the balance between false negatives and [false positives](/wiki/false_positive_rate_fpr). [3]

| Threshold change | Effect on FNR | Effect on FPR | Practical implication |
|---|---|---|---|
| Lower threshold | Decreases (fewer misses) | Increases (more false alarms) | Better for applications where missing positives is costly |
| Higher threshold | Increases (more misses) | Decreases (fewer false alarms) | Better for applications where false alarms are costly |

Selecting the right threshold requires domain knowledge. A cancer screening test, for instance, should use a low threshold to minimize missed diagnoses, even if it produces more false positives that can be resolved through follow-up testing. Conversely, a system that flags emails as spam might tolerate a higher FNR to avoid incorrectly blocking legitimate messages.

### ROC and precision-recall curves

The [ROC curve](/wiki/roc_receiver_operating_characteristic_curve) plots the true positive rate (1 - FNR) against the [false positive rate](/wiki/false_positive_rate_fpr) across all possible thresholds. The area under the ROC curve ([AUC](/wiki/auc_area_under_the_roc_curve)) summarizes the model's ability to discriminate between classes across thresholds. [3]

The [precision-recall curve](/wiki/precision-recall_curve) is often preferred for imbalanced datasets. It plots [precision](/wiki/precision) against [recall](/wiki/recall) (which equals 1 - FNR), showing the tradeoff more clearly when the positive class is rare. Saito and Rehmsmeier (2015) demonstrated that "the precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets," because a large number of true negatives can make ROC curves look deceptively optimistic. [10] A model with a low FNR will have high recall, pulling the curve toward the upper-right corner.

## How can you reduce the false negative rate?

Several strategies can reduce the false negative rate in [machine learning](/wiki/machine_learning) models:

### Threshold adjustment

The simplest approach is to lower the classification threshold. This is effective when the model already assigns reasonably high probabilities to positive instances but the default threshold (often 0.5) is too strict. Threshold adjustment does not require retraining the model.

### Cost-sensitive learning

[Cost-sensitive learning](/wiki/cost-sensitive_learning) modifies the [loss function](/wiki/loss_function) to assign different penalties to different types of errors. By assigning a higher cost to false negatives than to false positives, the training process encourages the model to prioritize recall over precision. [9]

### Data-level techniques

When the positive class is underrepresented in the training data, the model may not learn enough about positive instances to detect them reliably. Techniques to address this include:

- **Oversampling** the minority class (e.g., using [SMOTE](/wiki/smote), which generates synthetic examples of the minority class) [7]
- **Undersampling** the majority class to balance class proportions
- **[Data augmentation](/wiki/data_augmentation)** to increase the diversity of positive training examples

### Ensemble methods

[Ensemble methods](/wiki/ensemble) such as [bagging](/wiki/bagging) and [boosting](/wiki/boosting) combine multiple models to improve predictive performance. [Gradient boosting](/wiki/gradient_boosting) in particular can be configured with custom loss functions that penalize false negatives more heavily.

### Model selection and feature engineering

Choosing a more expressive model architecture or performing better [feature engineering](/wiki/feature_engineering) can improve the model's ability to distinguish positive from negative cases, thereby reducing both FNR and [FPR](/wiki/false_positive_rate_fpr). Adding informative features that are predictive of the positive class often has the most direct impact.

### Model calibration

Poorly calibrated models produce probability estimates that do not reflect true likelihoods, which makes threshold selection unreliable. [Calibration](/wiki/calibration_layer) techniques such as Platt scaling and isotonic regression can improve the quality of predicted probabilities, enabling better threshold choices and indirectly reducing FNR.

## What is the false negative rate used for? (applications)

### Medical diagnosis and screening

In healthcare, the false negative rate of a diagnostic test directly affects patient outcomes. A false negative in cancer screening means a patient with cancer is told they are healthy, potentially delaying treatment until the disease progresses. The terms "sensitivity" and "specificity" were introduced by American biostatistician Jacob Yerushalmy in 1947 specifically for evaluating medical diagnostic tests, in his study of chest X-ray screening for tuberculosis. [6]

Examples of FNR in medical testing:

| Test | Typical FNR range | Notes |
|---|---|---|
| Pap smear (cervical cancer) | 15-25% | Sampling false negatives (abnormal cells absent from the smear) are slightly more common than laboratory errors [6] |
| Mammography (breast cancer) | 10-20% | Sensitivity falls from roughly 75-85% in fatty breasts to 30-50% in dense breasts, which can mask tumors [6] |
| RT-PCR for COVID-19 | 2-29% | A review of 34 studies (12,057 confirmed cases) found a highly heterogeneous false-negative proportion [8] |
| Rapid antigen tests (COVID-19) | 50-100% in early infection | Sensitivity can be near 0% in the first 48 hours of infection |

The timing of a test strongly affects its false negative rate. A pooled analysis by Kucirka and colleagues (2020) found that the false-negative rate of RT-PCR for SARS-CoV-2 was 100% on day 1 after exposure, fell to a minimum of about 20% on day 8 (roughly 3 days after symptom onset), then began rising again. [12] This illustrates that a single FNR figure can be misleading when detectability changes over time.

For screening programs applied to large populations, [Bayes' theorem](/wiki/bayes_theorem) shows that the predictive value of test results depends not only on sensitivity and specificity but also on disease prevalence. When prevalence is low, even a test with a low FNR may produce a large number of false positives relative to true positives. [6]

### Fraud detection

In financial [fraud detection](/wiki/fraud_detection), a false negative occurs when a fraudulent transaction is classified as legitimate. This results in direct monetary losses to the financial institution and its customers. Fraud detection systems typically prioritize low FNR (high recall) and accept a higher false positive rate, because the cost of a missed fraud case usually exceeds the cost of investigating a flagged legitimate transaction.

Common approaches to reducing FNR in fraud detection include digital footprint analysis, device and browser fingerprinting, and regularly updating models to account for evolving fraud techniques.

### Cybersecurity and intrusion detection

In [cybersecurity](/wiki/cybersecurity), a false negative means that an actual security threat, such as a network intrusion or malware infection, goes undetected. The consequences can include data breaches, system compromise, and regulatory penalties. Intrusion detection systems face a persistent tension between minimizing false negatives (ensuring all threats are detected) and minimizing false positives (avoiding alert fatigue for security analysts).

### Spam filtering

In email spam filtering, a false negative occurs when a spam message passes through the filter and reaches the user's inbox. While less immediately dangerous than false negatives in medical or security contexts, spam false negatives expose users to phishing attempts and malware. Spam filters must balance FNR against the false positive rate, since incorrectly filtering legitimate email (a false positive) can also cause significant problems.

### Information retrieval

In [information retrieval](/wiki/information_retrieval) systems such as search engines, the false negative rate corresponds to the proportion of relevant documents that the system fails to return. Recall (1 - FNR) measures the system's completeness in retrieving relevant results. In legal discovery and patent searches, high recall (low FNR) is often more important than high precision, because missing a relevant document can have serious legal consequences. [13]

### Object detection in computer vision

In [object detection](/wiki/object_detection) tasks within [computer vision](/wiki/computer_vision), false negatives occur when the model fails to detect an object that is present in the image. For safety applications like autonomous driving or pedestrian detection, the FNR must be extremely low because missing a pedestrian or obstacle can result in accidents. Techniques such as non-maximum suppression tuning and multi-scale detection help reduce false negatives in these systems.

## How does FNR differ from FPR and the false discovery rate?

The following table compares the false negative rate with other commonly used error and performance metrics in classification. The key distinction is what each rate conditions on: FNR conditions on the actual positives, FPR on the actual negatives, and the false discovery rate on the predicted positives. [4]

| Metric | Formula | What it measures | When to prioritize |
|---|---|---|---|
| False negative rate (FNR) | FN / (FN + TP) | Proportion of positives missed | Medical screening, fraud detection, safety systems |
| [False positive rate](/wiki/false_positive_rate_fpr) (FPR) | FP / (FP + TN) | Proportion of negatives incorrectly flagged | Spam filtering, justice systems, quality control |
| False discovery rate (FDR) | FP / (FP + TP) | Proportion of positive predictions that are wrong | [Genomics](/wiki/evo_2), multiple hypothesis testing |
| False omission rate (FOR) | FN / (FN + TN) | Proportion of negative predictions that are wrong | Evaluating negative predictive value |
| [Accuracy](/wiki/accuracy) | (TP + TN) / (TP + TN + FP + FN) | Overall correctness | Balanced datasets with equal error costs |
| Error rate | (FP + FN) / (TP + TN + FP + FN) | Overall proportion of mistakes | General model evaluation |

A common point of confusion is FNR versus the false discovery rate. The FNR answers "of all the truly positive cases, what fraction did I miss?", while the FDR answers "of all the cases I flagged as positive, what fraction were wrong?". The two can move in opposite directions as the decision threshold changes. [4]

## When was the false negative rate first defined? (history and terminology)

The concept of two types of errors in hypothesis testing was formalized by Jerzy Neyman and Egon Pearson in their foundational 1933 paper on the theory of testing statistical hypotheses. They introduced the terms "error of the first kind" (Type I) and "error of the second kind" (Type II), corresponding to false positives and false negatives respectively. [2]

The application of these ideas to diagnostic testing came through the work of Jacob Yerushalmy, who introduced the terms "sensitivity" and "specificity" in 1947 while studying chest X-ray screening for tuberculosis. Since sensitivity equals 1 minus the false negative rate, Yerushalmy's framework implicitly defined FNR as a key property of diagnostic tests. [6]

In the machine learning literature, the false negative rate became widely used alongside the adoption of the [confusion matrix](/wiki/confusion_matrix) as a standard tool for evaluating classifiers. [3] The closely related concept of [recall](/wiki/recall) has roots in [information retrieval](/wiki/information_retrieval) research dating back to the 1950s and 1960s, particularly the Cranfield experiments conducted by Cyril Cleverdon. [13]

The term "miss rate" is commonly used in signal detection theory and radar engineering, where it refers to the probability of failing to detect a signal that is present. This usage predates its adoption in machine learning but describes the same mathematical quantity. [1]

## See also

- [Confusion matrix](/wiki/confusion_matrix)
- [Recall](/wiki/recall)
- [Precision](/wiki/precision)
- [F1 score](/wiki/f1_score)
- [True positive rate (TPR)](/wiki/true_positive_rate_tpr)
- [ROC curve](/wiki/roc_receiver_operating_characteristic_curve)
- [AUC](/wiki/auc_area_under_the_roc_curve)
- [False positive rate (FPR)](/wiki/false_positive_rate_fpr)
- [Binary classification](/wiki/binary_classification)
- [Classification threshold](/wiki/classification_threshold)

## References

1. "Sensitivity and specificity." Wikipedia. Defines false negative rate (miss rate) = FN / P = 1 - TPR and states Power = sensitivity = 1 - beta. Accessed 2026. https://en.wikipedia.org/wiki/Sensitivity_and_specificity
2. Neyman, J., & Pearson, E. S. (1933). "On the Problem of the Most Efficient Tests of Statistical Hypotheses." *Philosophical Transactions of the Royal Society A*, 231(694-706), 289-337.
3. Fawcett, T. (2006). "An Introduction to ROC Analysis." *Pattern Recognition Letters*, 27(8), 861-874.
4. Powers, D. M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation." *Journal of Machine Learning Technologies*, 2(1), 37-63.
5. Sokolova, M., & Lapalme, G. (2009). "A Systematic Analysis of Performance Measures for Classification Tasks." *Information Processing & Management*, 45(4), 427-437.
6. Yerushalmy, J. (1947). "Statistical Problems in Assessing Methods of Medical Diagnosis with Special Reference to X-Ray Techniques." *Public Health Reports*, 62(40), 1432-1449.
7. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: Synthetic Minority Over-sampling Technique." *Journal of Artificial Intelligence Research*, 16, 321-357.
8. Arevalo-Rodriguez, I., et al. (2020). "False-Negative Results of Initial RT-PCR Assays for COVID-19: A Systematic Review." *PLoS ONE*, 15(12), e0242958.
9. He, H., & Garcia, E. A. (2009). "Learning from Imbalanced Data." *IEEE Transactions on Knowledge and Data Engineering*, 21(9), 1263-1284.
10. Saito, T., & Rehmsmeier, M. (2015). "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets." *PLoS ONE*, 10(3), e0118432.
11. Flach, P. (2012). *Machine Learning: The Art and Science of Algorithms that Make Sense of Data.* Cambridge University Press.
12. Kucirka, L. M., Lauer, S. A., Laeyendecker, O., Boon, D., & Lessler, J. (2020). "Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure." *Annals of Internal Medicine*, 173(4), 262-267.
13. Cleverdon, C. W. (1960). "The Aslib Cranfield Research Project on the Comparative Efficiency of Indexing Systems." *Aslib Proceedings*, 12(12), 421-431.

