The false negative rate (FNR), also known as the miss rate, is a classification metric that measures the proportion of actual positive instances that a model or test incorrectly classifies as negative. In statistical hypothesis testing, it corresponds to the probability of a Type II error, denoted by the Greek letter beta (β). Mathematically, the false negative rate is the complement of sensitivity (also called recall or the true positive rate), meaning FNR = 1 − Recall.
The false negative rate is especially important in domains where failing to detect a positive case carries serious consequences, such as medical screening, fraud detection, and security systems. A model with a high FNR misses too many positive cases, which can lead to delayed diagnoses, undetected threats, or financial losses.
Imagine you have a basket full of red and blue balls, and your job is to pick out all the red ones. Every time you look at a red ball but accidentally think it is blue and leave it behind, that is a "false negative." The false negative rate tells you how often you make this kind of mistake. If there are 10 red balls and you miss 2 of them, your false negative rate is 2 out of 10, or 20%. A lower number means you are better at finding the red balls.
The false negative rate is defined as the conditional probability that a classifier produces a negative prediction given that the true class is positive.
FNR = FN / (FN + TP)
Where:
The FNR always falls between 0 and 1 (or 0% and 100%). A value of 0 means the classifier correctly identifies every positive instance, while a value of 1 means it misses all of them.
The false negative rate is directly related to several other evaluation metrics. The table below summarizes these connections.
| Metric | Formula | Relationship to FNR |
|---|---|---|
| Sensitivity (recall, true positive rate) | TP / (TP + FN) | FNR = 1 − Sensitivity |
| Recall | TP / (TP + FN) | Recall and sensitivity are identical; FNR = 1 − Recall |
| Specificity (true negative rate) | TN / (TN + FP) | Independent of FNR; measures performance on negative class |
| False positive rate | FP / (FP + TN) | FPR = 1 − Specificity; measures errors on negatives, not positives |
| Precision (positive predictive value) | TP / (TP + FP) | Precision and FNR are not complements but are both affected by threshold changes |
| F1 score | 2 × (Precision × Recall) / (Precision + Recall) | FNR affects F1 through its impact on recall |
| Type II error rate (β) | P(fail to reject H₀ | H₁ true) | In hypothesis testing, β equals the FNR |
| Statistical power (1 − β) | P(reject H₀ | H₁ true) | Power = 1 − FNR = Sensitivity |
Because FNR = 1 − Recall, any improvement in recall directly lowers the false negative rate, and vice versa.
The false negative rate is derived from the confusion matrix, a table that summarizes the predictions of a binary classifier against the ground truth labels.
| Predicted positive | Predicted negative | |
|---|---|---|
| Actual positive | True positive (TP) | False negative (FN) |
| Actual negative | False positive (FP) | True negative (TN) |
The FNR is computed from the top row of the confusion matrix: the false negatives divided by the sum of true positives and false negatives. This is the error rate conditioned on the actual positives.
Consider a medical screening model that classifies 1,000 patient samples as either "disease present" (positive) or "disease absent" (negative). The results are:
| Predicted positive | Predicted negative | Row total | |
|---|---|---|---|
| Actual positive | 72 (TP) | 8 (FN) | 80 |
| Actual negative | 45 (FP) | 875 (TN) | 920 |
| Column total | 117 | 883 | 1,000 |
From this confusion matrix:
The model misses 10% of patients who actually have the disease. In medical contexts, this 10% miss rate may be unacceptable because those patients will not receive timely treatment.
In the framework developed by Jerzy Neyman and Egon Pearson in 1933, hypothesis testing involves deciding between a null hypothesis (H₀) and an alternative hypothesis (H₁). Two types of errors can occur:
| Error type | Definition | Probability | Also known as |
|---|---|---|---|
| Type I error | Rejecting H₀ when H₀ is true | α (alpha) | False positive, false alarm |
| Type II error | Failing to reject H₀ when H₁ is true | β (beta) | False negative, miss |
The false negative rate in this context is β, the probability of failing to detect a real effect. The complement of β is the statistical power of the test (1 − β), which represents the probability of correctly rejecting a false null hypothesis.
The Neyman-Pearson lemma provides a theoretical foundation for constructing hypothesis tests that maximize statistical power (minimize β) for a given significance level α. This lemma shows that the likelihood ratio test is the most powerful test for simple hypotheses.
Several factors influence the Type II error rate:
In binary classification, a model assigns each input to one of two classes: positive or negative. The false negative rate measures the fraction of true positives that the model fails to identify. Many classifiers output a probability score, and a classification threshold determines the cutoff above which an instance is labeled positive.
Lowering the threshold causes the model to classify more instances as positive, which typically reduces the number of false negatives (and thus reduces the FNR), but it also increases the number of false positives. This fundamental tradeoff is visualized by the ROC curve and the precision-recall curve.
In multiclass classification, the false negative rate can be computed on a per-class basis using the one-vs-rest approach. For each class, instances belonging to that class are treated as positives, and all other instances are treated as negatives. The FNR for each class is then calculated using the same formula: FN / (FN + TP).
Per-class FNR values can be aggregated in several ways:
| Aggregation method | Description |
|---|---|
| Macro average | Unweighted mean of per-class FNR values; treats all classes equally |
| Weighted average | Weighted mean of per-class FNR values using class support (number of instances) as weights |
| Micro average | Computed from global FN and TP counts across all classes; equivalent to overall FNR |
The false negative rate is particularly informative when working with imbalanced datasets, where one class is much more prevalent than the other. In such settings, overall accuracy can be misleadingly high because the model can achieve good accuracy simply by predicting the majority class. The FNR reveals whether the model is actually detecting the minority (positive) class.
For example, in a dataset where only 1% of transactions are fraudulent, a model that labels every transaction as legitimate achieves 99% accuracy but has an FNR of 100%, meaning it misses every single fraudulent transaction.
Most probabilistic classifiers produce a continuous score (e.g., a probability between 0 and 1), which is converted into a binary prediction using a decision threshold. The choice of threshold directly controls the balance between false negatives and false positives.
| Threshold change | Effect on FNR | Effect on FPR | Practical implication |
|---|---|---|---|
| Lower threshold | Decreases (fewer misses) | Increases (more false alarms) | Better for applications where missing positives is costly |
| Higher threshold | Increases (more misses) | Decreases (fewer false alarms) | Better for applications where false alarms are costly |
Selecting the right threshold requires domain knowledge. A cancer screening test, for instance, should use a low threshold to minimize missed diagnoses, even if it produces more false positives that can be resolved through follow-up testing. Conversely, a system that flags emails as spam might tolerate a higher FNR to avoid incorrectly blocking legitimate messages.
The ROC curve plots the true positive rate (1 − FNR) against the false positive rate across all possible thresholds. The area under the ROC curve (AUC) summarizes the model's ability to discriminate between classes across thresholds.
The precision-recall curve is often preferred for imbalanced datasets. It plots precision against recall (which equals 1 − FNR), showing the tradeoff more clearly when the positive class is rare. A model with a low FNR will have high recall, pulling the curve toward the upper-right corner.
Several strategies can reduce the false negative rate in machine learning models:
The simplest approach is to lower the classification threshold. This is effective when the model already assigns reasonably high probabilities to positive instances but the default threshold (often 0.5) is too strict. Threshold adjustment does not require retraining the model.
Cost-sensitive learning modifies the loss function to assign different penalties to different types of errors. By assigning a higher cost to false negatives than to false positives, the training process encourages the model to prioritize recall over precision.
When the positive class is underrepresented in the training data, the model may not learn enough about positive instances to detect them reliably. Techniques to address this include:
Ensemble methods such as bagging and boosting combine multiple models to improve predictive performance. Gradient boosting in particular can be configured with custom loss functions that penalize false negatives more heavily.
Choosing a more expressive model architecture or performing better feature engineering can improve the model's ability to distinguish positive from negative cases, thereby reducing both FNR and FPR. Adding informative features that are predictive of the positive class often has the most direct impact.
Poorly calibrated models produce probability estimates that do not reflect true likelihoods, which makes threshold selection unreliable. Calibration techniques such as Platt scaling and isotonic regression can improve the quality of predicted probabilities, enabling better threshold choices and indirectly reducing FNR.
In healthcare, the false negative rate of a diagnostic test directly affects patient outcomes. A false negative in cancer screening means a patient with cancer is told they are healthy, potentially delaying treatment until the disease progresses. The terms "sensitivity" and "specificity" were introduced by American biostatistician Jacob Yerushalmy in 1947 specifically for evaluating medical diagnostic tests.
Examples of FNR in medical testing:
| Test | Typical FNR range | Notes |
|---|---|---|
| Pap smear (cervical cancer) | 5-30% | Two-thirds of false negatives result from improper collection and processing of cellular material |
| Mammography (breast cancer) | 10-20% | Dense breast tissue can obscure tumors, increasing the FNR |
| RT-PCR for COVID-19 | 2-29% | FNR varies with timing; nasal samples show approximately 27% FNR in the first week of symptoms |
| Rapid antigen tests (COVID-19) | 50-100% in early infection | Sensitivity can be near 0% in the first 48 hours of infection |
For screening programs applied to large populations, Bayes' theorem shows that the predictive value of test results depends not only on sensitivity and specificity but also on disease prevalence. When prevalence is low, even a test with a low FNR may produce a large number of false positives relative to true positives.
In financial fraud detection, a false negative occurs when a fraudulent transaction is classified as legitimate. This results in direct monetary losses to the financial institution and its customers. Fraud detection systems typically prioritize low FNR (high recall) and accept a higher false positive rate, because the cost of a missed fraud case usually exceeds the cost of investigating a flagged legitimate transaction.
Common approaches to reducing FNR in fraud detection include digital footprint analysis, device and browser fingerprinting, and regularly updating models to account for evolving fraud techniques.
In cybersecurity, a false negative means that an actual security threat, such as a network intrusion or malware infection, goes undetected. The consequences can include data breaches, system compromise, and regulatory penalties. Intrusion detection systems face a persistent tension between minimizing false negatives (ensuring all threats are detected) and minimizing false positives (avoiding alert fatigue for security analysts).
In email spam filtering, a false negative occurs when a spam message passes through the filter and reaches the user's inbox. While less immediately dangerous than false negatives in medical or security contexts, spam false negatives expose users to phishing attempts and malware. Spam filters must balance FNR against the false positive rate, since incorrectly filtering legitimate email (a false positive) can also cause significant problems.
In information retrieval systems such as search engines, the false negative rate corresponds to the proportion of relevant documents that the system fails to return. Recall (1 − FNR) measures the system's completeness in retrieving relevant results. In legal discovery and patent searches, high recall (low FNR) is often more important than high precision, because missing a relevant document can have serious legal consequences.
In object detection tasks within computer vision, false negatives occur when the model fails to detect an object that is present in the image. For safety applications like autonomous driving or pedestrian detection, the FNR must be extremely low because missing a pedestrian or obstacle can result in accidents. Techniques such as non-maximum suppression tuning and multi-scale detection help reduce false negatives in these systems.
The following table compares the false negative rate with other commonly used error and performance metrics in classification.
| Metric | Formula | What it measures | When to prioritize |
|---|---|---|---|
| False negative rate (FNR) | FN / (FN + TP) | Proportion of positives missed | Medical screening, fraud detection, safety systems |
| False positive rate (FPR) | FP / (FP + TN) | Proportion of negatives incorrectly flagged | Spam filtering, justice systems, quality control |
| False discovery rate (FDR) | FP / (FP + TP) | Proportion of positive predictions that are wrong | Genomics, multiple hypothesis testing |
| False omission rate (FOR) | FN / (FN + TN) | Proportion of negative predictions that are wrong | Evaluating negative predictive value |
| Accuracy | (TP + TN) / (TP + TN + FP + FN) | Overall correctness | Balanced datasets with equal error costs |
| Error rate | (FP + FN) / (TP + TN + FP + FN) | Overall proportion of mistakes | General model evaluation |
The concept of two types of errors in hypothesis testing was formalized by Jerzy Neyman and Egon Pearson in their foundational 1933 paper on the theory of testing statistical hypotheses. They introduced the terms "error of the first kind" (Type I) and "error of the second kind" (Type II), corresponding to false positives and false negatives respectively.
The application of these ideas to diagnostic testing came through the work of Jacob Yerushalmy, who introduced the terms "sensitivity" and "specificity" in 1947 while studying chest X-ray screening for tuberculosis. Since sensitivity equals 1 minus the false negative rate, Yerushalmy's framework implicitly defined FNR as a key property of diagnostic tests.
In the machine learning literature, the false negative rate became widely used alongside the adoption of the confusion matrix as a standard tool for evaluating classifiers. The closely related concept of recall has roots in information retrieval research dating back to the 1950s and 1960s, particularly the Cranfield experiments conducted by Cyril Cleverdon.
The term "miss rate" is commonly used in signal detection theory and radar engineering, where it refers to the probability of failing to detect a signal that is present. This usage predates its adoption in machine learning but describes the same mathematical quantity.