False Negative Rate

The false negative rate (FNR), also known as the miss rate, is a classification metric that measures the proportion of actual positive instances that a model or test incorrectly classifies as negative. In statistical hypothesis testing, it corresponds to the probability of a Type II error, denoted by the Greek letter beta (β). Mathematically, the false negative rate is the complement of sensitivity (also called recall or the true positive rate), meaning FNR = 1 − Recall.

The false negative rate is especially important in domains where failing to detect a positive case carries serious consequences, such as medical screening, fraud detection, and security systems. A model with a high FNR misses too many positive cases, which can lead to delayed diagnoses, undetected threats, or financial losses.

Explain like I'm 5 (ELI5)

Imagine you have a basket full of red and blue balls, and your job is to pick out all the red ones. Every time you look at a red ball but accidentally think it is blue and leave it behind, that is a "false negative." The false negative rate tells you how often you make this kind of mistake. If there are 10 red balls and you miss 2 of them, your false negative rate is 2 out of 10, or 20%. A lower number means you are better at finding the red balls.

Formal definition

The false negative rate is defined as the conditional probability that a classifier produces a negative prediction given that the true class is positive.

FNR = FN / (FN + TP)

Where:

FN (false negatives): the number of positive instances incorrectly predicted as negative
TP (true positives): the number of positive instances correctly predicted as positive
FN + TP: the total number of actual positive instances (also called the condition positive count)

The FNR always falls between 0 and 1 (or 0% and 100%). A value of 0 means the classifier correctly identifies every positive instance, while a value of 1 means it misses all of them.

Relationship to other metrics

The false negative rate is directly related to several other evaluation metrics. The table below summarizes these connections.

Metric	Formula	Relationship to FNR
Sensitivity (recall, true positive rate)	TP / (TP + FN)	FNR = 1 − Sensitivity
Recall	TP / (TP + FN)	Recall and sensitivity are identical; FNR = 1 − Recall
Specificity (true negative rate)	TN / (TN + FP)	Independent of FNR; measures performance on negative class
False positive rate	FP / (FP + TN)	FPR = 1 − Specificity; measures errors on negatives, not positives
Precision (positive predictive value)	TP / (TP + FP)	Precision and FNR are not complements but are both affected by threshold changes
F1 score	2 × (Precision × Recall) / (Precision + Recall)	FNR affects F1 through its impact on recall
Type II error rate (β)	P(fail to reject H₀ \| H₁ true)	In hypothesis testing, β equals the FNR
Statistical power (1 − β)	P(reject H₀ \| H₁ true)	Power = 1 − FNR = Sensitivity

Because FNR = 1 − Recall, any improvement in recall directly lowers the false negative rate, and vice versa.

The confusion matrix

The false negative rate is derived from the confusion matrix, a table that summarizes the predictions of a binary classifier against the ground truth labels.

	Predicted positive	Predicted negative
Actual positive	True positive (TP)	False negative (FN)
Actual negative	False positive (FP)	True negative (TN)

The FNR is computed from the top row of the confusion matrix: the false negatives divided by the sum of true positives and false negatives. This is the error rate conditioned on the actual positives.

Worked example

Consider a medical screening model that classifies 1,000 patient samples as either "disease present" (positive) or "disease absent" (negative). The results are:

	Predicted positive	Predicted negative	Row total
Actual positive	72 (TP)	8 (FN)	80
Actual negative	45 (FP)	875 (TN)	920
Column total	117	883	1,000

From this confusion matrix:

FNR = FN / (FN + TP) = 8 / (8 + 72) = 8 / 80 = 0.10 (10%)
Sensitivity (recall) = TP / (TP + FN) = 72 / 80 = 0.90 (90%)
FNR + Sensitivity = 0.10 + 0.90 = 1.00

The model misses 10% of patients who actually have the disease. In medical contexts, this 10% miss rate may be unacceptable because those patients will not receive timely treatment.

Statistical hypothesis testing context

In the framework developed by Jerzy Neyman and Egon Pearson in 1933, hypothesis testing involves deciding between a null hypothesis (H₀) and an alternative hypothesis (H₁). Two types of errors can occur:

Error type	Definition	Probability	Also known as
Type I error	Rejecting H₀ when H₀ is true	α (alpha)	False positive, false alarm
Type II error	Failing to reject H₀ when H₁ is true	β (beta)	False negative, miss

The false negative rate in this context is β, the probability of failing to detect a real effect. The complement of β is the statistical power of the test (1 − β), which represents the probability of correctly rejecting a false null hypothesis.

The Neyman-Pearson lemma provides a theoretical foundation for constructing hypothesis tests that maximize statistical power (minimize β) for a given significance level α. This lemma shows that the likelihood ratio test is the most powerful test for simple hypotheses.

Several factors influence the Type II error rate:

Sample size: Larger samples reduce β by increasing the test's ability to detect a true effect.
Effect size: Larger effects are easier to detect, leading to lower β.
Significance level (α): Decreasing α (making the test more stringent against false positives) generally increases β, and vice versa. This is the fundamental tradeoff between Type I and Type II errors.
Variance: Higher variability in the data makes it harder to detect effects, increasing β.

False negative rate in machine learning

Binary classification

In binary classification, a model assigns each input to one of two classes: positive or negative. The false negative rate measures the fraction of true positives that the model fails to identify. Many classifiers output a probability score, and a classification threshold determines the cutoff above which an instance is labeled positive.

Lowering the threshold causes the model to classify more instances as positive, which typically reduces the number of false negatives (and thus reduces the FNR), but it also increases the number of false positives. This fundamental tradeoff is visualized by the ROC curve and the precision-recall curve.

Multiclass classification

In multiclass classification, the false negative rate can be computed on a per-class basis using the one-vs-rest approach. For each class, instances belonging to that class are treated as positives, and all other instances are treated as negatives. The FNR for each class is then calculated using the same formula: FN / (FN + TP).

Per-class FNR values can be aggregated in several ways:

Aggregation method	Description
Macro average	Unweighted mean of per-class FNR values; treats all classes equally
Weighted average	Weighted mean of per-class FNR values using class support (number of instances) as weights
Micro average	Computed from global FN and TP counts across all classes; equivalent to overall FNR

Imbalanced datasets

The false negative rate is particularly informative when working with imbalanced datasets, where one class is much more prevalent than the other. In such settings, overall accuracy can be misleadingly high because the model can achieve good accuracy simply by predicting the majority class. The FNR reveals whether the model is actually detecting the minority (positive) class.

For example, in a dataset where only 1% of transactions are fraudulent, a model that labels every transaction as legitimate achieves 99% accuracy but has an FNR of 100%, meaning it misses every single fraudulent transaction.

Threshold selection and tradeoffs

Most probabilistic classifiers produce a continuous score (e.g., a probability between 0 and 1), which is converted into a binary prediction using a decision threshold. The choice of threshold directly controls the balance between false negatives and false positives.

Threshold change	Effect on FNR	Effect on FPR	Practical implication
Lower threshold	Decreases (fewer misses)	Increases (more false alarms)	Better for applications where missing positives is costly
Higher threshold	Increases (more misses)	Decreases (fewer false alarms)	Better for applications where false alarms are costly

Selecting the right threshold requires domain knowledge. A cancer screening test, for instance, should use a low threshold to minimize missed diagnoses, even if it produces more false positives that can be resolved through follow-up testing. Conversely, a system that flags emails as spam might tolerate a higher FNR to avoid incorrectly blocking legitimate messages.

ROC and precision-recall curves

The ROC curve plots the true positive rate (1 − FNR) against the false positive rate across all possible thresholds. The area under the ROC curve (AUC) summarizes the model's ability to discriminate between classes across thresholds.

The precision-recall curve is often preferred for imbalanced datasets. It plots precision against recall (which equals 1 − FNR), showing the tradeoff more clearly when the positive class is rare. A model with a low FNR will have high recall, pulling the curve toward the upper-right corner.

Techniques for reducing false negative rate

Several strategies can reduce the false negative rate in machine learning models:

Threshold adjustment

The simplest approach is to lower the classification threshold. This is effective when the model already assigns reasonably high probabilities to positive instances but the default threshold (often 0.5) is too strict. Threshold adjustment does not require retraining the model.

Cost-sensitive learning

Cost-sensitive learning modifies the loss function to assign different penalties to different types of errors. By assigning a higher cost to false negatives than to false positives, the training process encourages the model to prioritize recall over precision.

Data-level techniques

When the positive class is underrepresented in the training data, the model may not learn enough about positive instances to detect them reliably. Techniques to address this include:

Oversampling the minority class (e.g., using SMOTE, which generates synthetic examples of the minority class)
Undersampling the majority class to balance class proportions
Data augmentation to increase the diversity of positive training examples

Ensemble methods

Ensemble methods such as bagging and boosting combine multiple models to improve predictive performance. Gradient boosting in particular can be configured with custom loss functions that penalize false negatives more heavily.

Model selection and feature engineering

Choosing a more expressive model architecture or performing better feature engineering can improve the model's ability to distinguish positive from negative cases, thereby reducing both FNR and FPR. Adding informative features that are predictive of the positive class often has the most direct impact.

Model calibration

Poorly calibrated models produce probability estimates that do not reflect true likelihoods, which makes threshold selection unreliable. Calibration techniques such as Platt scaling and isotonic regression can improve the quality of predicted probabilities, enabling better threshold choices and indirectly reducing FNR.

Applications and domain-specific considerations

Medical diagnosis and screening

In healthcare, the false negative rate of a diagnostic test directly affects patient outcomes. A false negative in cancer screening means a patient with cancer is told they are healthy, potentially delaying treatment until the disease progresses. The terms "sensitivity" and "specificity" were introduced by American biostatistician Jacob Yerushalmy in 1947 specifically for evaluating medical diagnostic tests.

Examples of FNR in medical testing:

Test	Typical FNR range	Notes
Pap smear (cervical cancer)	5-30%	Two-thirds of false negatives result from improper collection and processing of cellular material
Mammography (breast cancer)	10-20%	Dense breast tissue can obscure tumors, increasing the FNR
RT-PCR for COVID-19	2-29%	FNR varies with timing; nasal samples show approximately 27% FNR in the first week of symptoms
Rapid antigen tests (COVID-19)	50-100% in early infection	Sensitivity can be near 0% in the first 48 hours of infection

For screening programs applied to large populations, Bayes' theorem shows that the predictive value of test results depends not only on sensitivity and specificity but also on disease prevalence. When prevalence is low, even a test with a low FNR may produce a large number of false positives relative to true positives.

Fraud detection

In financial fraud detection, a false negative occurs when a fraudulent transaction is classified as legitimate. This results in direct monetary losses to the financial institution and its customers. Fraud detection systems typically prioritize low FNR (high recall) and accept a higher false positive rate, because the cost of a missed fraud case usually exceeds the cost of investigating a flagged legitimate transaction.

Common approaches to reducing FNR in fraud detection include digital footprint analysis, device and browser fingerprinting, and regularly updating models to account for evolving fraud techniques.

Cybersecurity and intrusion detection

In cybersecurity, a false negative means that an actual security threat, such as a network intrusion or malware infection, goes undetected. The consequences can include data breaches, system compromise, and regulatory penalties. Intrusion detection systems face a persistent tension between minimizing false negatives (ensuring all threats are detected) and minimizing false positives (avoiding alert fatigue for security analysts).

Spam filtering

In email spam filtering, a false negative occurs when a spam message passes through the filter and reaches the user's inbox. While less immediately dangerous than false negatives in medical or security contexts, spam false negatives expose users to phishing attempts and malware. Spam filters must balance FNR against the false positive rate, since incorrectly filtering legitimate email (a false positive) can also cause significant problems.

Information retrieval

In information retrieval systems such as search engines, the false negative rate corresponds to the proportion of relevant documents that the system fails to return. Recall (1 − FNR) measures the system's completeness in retrieving relevant results. In legal discovery and patent searches, high recall (low FNR) is often more important than high precision, because missing a relevant document can have serious legal consequences.

Object detection in computer vision

In object detection tasks within computer vision, false negatives occur when the model fails to detect an object that is present in the image. For safety applications like autonomous driving or pedestrian detection, the FNR must be extremely low because missing a pedestrian or obstacle can result in accidents. Techniques such as non-maximum suppression tuning and multi-scale detection help reduce false negatives in these systems.

The following table compares the false negative rate with other commonly used error and performance metrics in classification.

Metric	Formula	What it measures	When to prioritize
False negative rate (FNR)	FN / (FN + TP)	Proportion of positives missed	Medical screening, fraud detection, safety systems
False positive rate (FPR)	FP / (FP + TN)	Proportion of negatives incorrectly flagged	Spam filtering, justice systems, quality control
False discovery rate (FDR)	FP / (FP + TP)	Proportion of positive predictions that are wrong	Genomics, multiple hypothesis testing
False omission rate (FOR)	FN / (FN + TN)	Proportion of negative predictions that are wrong	Evaluating negative predictive value
Accuracy	(TP + TN) / (TP + TN + FP + FN)	Overall correctness	Balanced datasets with equal error costs
Error rate	(FP + FN) / (TP + TN + FP + FN)	Overall proportion of mistakes	General model evaluation

History and terminology

The concept of two types of errors in hypothesis testing was formalized by Jerzy Neyman and Egon Pearson in their foundational 1933 paper on the theory of testing statistical hypotheses. They introduced the terms "error of the first kind" (Type I) and "error of the second kind" (Type II), corresponding to false positives and false negatives respectively.

The application of these ideas to diagnostic testing came through the work of Jacob Yerushalmy, who introduced the terms "sensitivity" and "specificity" in 1947 while studying chest X-ray screening for tuberculosis. Since sensitivity equals 1 minus the false negative rate, Yerushalmy's framework implicitly defined FNR as a key property of diagnostic tests.

In the machine learning literature, the false negative rate became widely used alongside the adoption of the confusion matrix as a standard tool for evaluating classifiers. The closely related concept of recall has roots in information retrieval research dating back to the 1950s and 1960s, particularly the Cranfield experiments conducted by Cyril Cleverdon.

The term "miss rate" is commonly used in signal detection theory and radar engineering, where it refers to the probability of failing to detect a signal that is present. This usage predates its adoption in machine learning but describes the same mathematical quantity.

References

Neyman, J., & Pearson, E. S. (1933). "On the Problem of the Most Efficient Tests of Statistical Hypotheses." *Philosophical Transactions of the Royal Society A*, 231(694-706), 289-337.
Yerushalmy, J. (1947). "Statistical Problems in Assessing Methods of Medical Diagnosis with Special Reference to X-Ray Techniques." *Public Health Reports*, 62(40), 1432-1449.
Fawcett, T. (2006). "An Introduction to ROC Analysis." *Pattern Recognition Letters*, 27(8), 861-874.
Powers, D. M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation." *Journal of Machine Learning Technologies*, 2(1), 37-63.
Sokolova, M., & Lapalme, G. (2009). "A Systematic Analysis of Performance Measures for Classification Tasks." *Information Processing & Management*, 45(4), 427-437.
Kohn, M. A., Carpenter, C. R., & Newman, T. B. (2013). "Understanding the Direction of Bias in Studies of Diagnostic Test Accuracy." *Academic Emergency Medicine*, 20(11), 1194-1206.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: Synthetic Minority Over-sampling Technique." *Journal of Artificial Intelligence Research*, 16, 321-357.
Arevalo-Rodriguez, I., et al. (2020). "False-Negative Results of Initial RT-PCR Assays for COVID-19: A Systematic Review." *PLoS ONE*, 15(12), e0242958.
He, H., & Garcia, E. A. (2009). "Learning from Imbalanced Data." *IEEE Transactions on Knowledge and Data Engineering*, 21(9), 1263-1284.
Saito, T., & Rehmsmeier, M. (2015). "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets." *PLoS ONE*, 10(3), e0118432.
Flach, P. (2012). *Machine Learning: The Art and Science of Algorithms that Make Sense of Data.* Cambridge University Press.
Kruppa, E., Kechagioglou, I., & Menon, B. (2022). "False-Negative Rate of SARS-CoV-2 RT-PCR Tests: A Discordant Testing Analysis." *Virology Journal*, 19, 1-7.
Cleverdon, C. W. (1960). "The Aslib Cranfield Research Project on the Comparative Efficiency of Indexing Systems." *Aslib Proceedings*, 12(12), 421-431.

Explain like I'm 5 (ELI5)

Formal definition

Relationship to other metrics

The confusion matrix

Worked example

Statistical hypothesis testing context

False negative rate in machine learning

Binary classification

Multiclass classification

Imbalanced datasets

Threshold selection and tradeoffs

ROC and precision-recall curves

Techniques for reducing false negative rate

Threshold adjustment

Cost-sensitive learning

Data-level techniques

Ensemble methods

Model selection and feature engineering

Model calibration

Applications and domain-specific considerations

Medical diagnosis and screening

Fraud detection

Cybersecurity and intrusion detection

Spam filtering

Information retrieval

Object detection in computer vision

Comparison of FNR, FPR, and related error metrics

History and terminology

See also

References

Improve this article

Related Articles

False Positive Rate (FPR)

ARC-AGI 2

AUC (Area Under the ROC Curve)

Accuracy

Classification Threshold

Confusion Matrix

Explain like I'm 5 (ELI5)

Formal definition

Relationship to other metrics

The confusion matrix

Worked example

Statistical hypothesis testing context

False negative rate in machine learning

Binary classification

Multiclass classification

Imbalanced datasets

Threshold selection and tradeoffs

ROC and precision-recall curves

Techniques for reducing false negative rate

Threshold adjustment

Cost-sensitive learning

Data-level techniques

Ensemble methods

Model selection and feature engineering

Model calibration

Applications and domain-specific considerations

Medical diagnosis and screening

Fraud detection

Cybersecurity and intrusion detection

Spam filtering

Information retrieval

Object detection in computer vision

Comparison of FNR, FPR, and related error metrics

History and terminology

See also

References

Related Articles

False Positive Rate (FPR)

ARC-AGI 2

AUC (Area Under the ROC Curve)

Accuracy

Classification Threshold

Confusion Matrix