True negative (TN)

Introduction

True negative (TN) is when the machine learning model correctly predicts the negative class. Machine learning classification is the process of accurately predicting a data point's class based on features. This classification can lead to four distinct outcomes: true positive (TP), true negative (TN), false positive (FP) and false negative (FN).

A true negative is the diagonal counterpart of a true positive. Both represent correct predictions; the difference is that a true positive correctly identifies a member of the positive class, while a true negative correctly identifies an example that does not belong to the positive class. Counting true negatives is a basic step in evaluating a binary classifier, and it feeds into several widely used metrics, including accuracy, specificity, and negative predictive value.

What is True Negative (TN)?

True Negative (TN) is one of the possible outcomes in a binary classification problem when the model predicts a negative result when the result or label is in fact negative. In other words, when the model correctly recognizes a data point as not belonging to any class, it is treated as a true negative.

Let us consider a machine learning model that predicts whether an email is spam or not. If the model correctly predicts that an email is not spam and it indeed is not, this would be considered a true negative result, meaning the model correctly identified that this input data point does not belong in a certain class. Google's machine learning crash course gives the same example: a legitimate email that the spam filter correctly routes to the inbox is a true negative, while a spam email that the filter correctly sends to the spam folder is a true positive.

The outcome depends on two things: the ground truth label of the example and the model's predicted label. A prediction is a true negative only when both are negative. If the ground truth is negative but the model says positive, the result is a false positive. If the ground truth is positive and the model says negative, the result is a false negative.

Position in the confusion matrix

A confusion matrix for a binary classifier is a two by two table that counts how many examples fall into each combination of actual class and predicted class. The standard layout places actual classes in rows and predicted classes in columns. True negatives sit in the bottom right cell, the position where both the actual class and the predicted class are negative. True positives sit in the top left cell along the same diagonal. The two off diagonal cells hold false positives and false negatives.

	Predicted positive	Predicted negative
Actual positive	True positive (TP)	False negative (FN)
Actual negative	False positive (FP)	True negative (TN)

Reading the diagonal of the matrix gives the total number of correct predictions. TP plus TN is the number of examples the model labeled correctly. FP plus FN is the number of mistakes. This layout makes it easy to spot the kind of error a model is making, since false positives and false negatives have different costs in most real applications.

Formulas that use TN

Several common evaluation metrics depend directly on the count of true negatives.

Accuracy. Accuracy is the share of all predictions that the model got right.

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$

If a model classifies 9,000 of 10,000 emails correctly across both classes, its accuracy is 0.9. True negatives contribute to the numerator on equal footing with true positives.

Specificity, also called true negative rate (TNR). Specificity measures how well the model identifies examples from the negative class.

$$\text{Specificity} = \frac{TN}{TN + FP}$$

The denominator is the total number of actual negatives in the test set. A specificity of 0.95 in a spam filter means that 95 percent of legitimate emails are correctly routed to the inbox, and 5 percent are wrongly flagged as spam. Specificity is the complement of the false positive rate: $\text{FPR} = 1 - \text{Specificity}$.

Negative predictive value (NPV). NPV looks at the same true negatives from a different angle. Instead of asking what fraction of actual negatives the model catches, it asks what fraction of negative predictions are correct.

$$\text{NPV} = \frac{TN}{TN + FN}$$

This distinction matters in practice. Specificity and sensitivity are intrinsic properties of a classifier and stay the same regardless of how common the positive class is. NPV and positive predictive value depend on the base rate, so they shift when the prevalence of the positive class changes in the population being tested.

False positive rate (FPR). FPR is the share of actual negatives that the model misclassifies as positive.

$$\text{FPR} = \frac{FP}{FP + TN} = 1 - \text{Specificity}$$

FPR is the horizontal axis of the ROC curve, so models with more true negatives at a given threshold sit further to the left on the curve.

A worked example

Suppose a hospital screens 2,030 people for colorectal cancer. The true labels are known from follow up testing. The screening model produces this confusion matrix:

	Predicted positive	Predicted negative
Actual positive (30)	20 (TP)	10 (FN)
Actual negative (2,000)	180 (FP)	1,820 (TN)

From these counts:

Accuracy = (20 + 1,820) / 2,030 = 1,840 / 2,030 = 0.906.
Specificity = 1,820 / (1,820 + 180) = 1,820 / 2,000 = 0.91.
NPV = 1,820 / (1,820 + 10) = 1,820 / 1,830 = 0.995.
FPR = 180 / 2,000 = 0.09.

The 1,820 true negatives are people who do not have cancer and were correctly told their screening result was negative. They dominate the dataset because the prevalence is low, which is why the accuracy and NPV come out high even though the model misses a third of the actual positives.

Why true negatives matter

True negatives are essential in machine learning as they help gauge the overall accuracy of a model. Accuracy refers to how accurately it predicts correct outcomes; an abundance of true negatives indicates that the model is performing well and can be trusted to make accurate predictions.

In some instances, the true negative rate is more critical than the true positive rate. For instance, in medical diagnosis, a high true negative rate helps prevent false positives which could lead to unnecessary medical treatments or surgeries. In a screening test, a patient who is truly healthy and receives a negative result is spared the cost, anxiety, and risk of follow up procedures. Specificity is the metric clinicians use to quantify this property of the test.

The same logic shows up in other domains:

In spam filtering, true negatives are legitimate emails that reach the inbox. A spam filter that lets through real mail without mistakes has a high true negative rate. Users tend to be more upset about a missing wedding invitation than a spam message slipping through, so specificity is often weighted heavily.
In fraud detection, true negatives are genuine transactions that the model correctly leaves alone. Falsely flagging a normal purchase as fraud annoys customers and triggers manual review costs, so banks track FPR (the share of legitimate transactions wrongly flagged) closely.
In content moderation, true negatives are posts that follow the rules and that the model correctly leaves up. A high TN count means the platform is not over removing legitimate speech.
In defect inspection, true negatives are non defective items that pass quality control. A factory line with a high TN rate keeps throughput high because good parts are not rejected.

The accuracy paradox and class imbalance

True negatives can also mislead, especially when the negative class is much larger than the positive class. The accuracy paradox refers to a situation where a classifier looks excellent on accuracy but is useless in practice.

Consider credit card fraud detection. A typical dataset might contain 990 genuine transactions and 10 fraudulent ones. A naive model that labels every transaction as genuine produces:

TN = 990, FN = 10, TP = 0, FP = 0.
Accuracy = 990 / 1,000 = 0.99.
Specificity = 990 / 990 = 1.0.
Recall = 0 / 10 = 0.

The model has perfect specificity and 99 percent accuracy, yet it detects no fraud. In imbalanced settings, true negatives can dominate the diagonal and inflate accuracy while the positive class is ignored entirely. This is why practitioners on imbalanced problems often prefer metrics like precision, recall, F1 score, precision recall curves, or AUC ROC, which are more sensitive to performance on the minority class.

Effect of the classification threshold

Many classifiers output a probability score instead of a hard class label. To turn the score into a decision, the model compares it against a classification threshold. The threshold directly controls how many true negatives the model produces.

As the threshold for predicting positive increases:

More examples are classified as negative.
True negatives increase, since more actual negatives now sit below the threshold and get the negative label.
False positives decrease for the same reason.
False negatives increase, because some actual positives also fall below the threshold.
True positives decrease.

Lowering the threshold has the opposite effect. Choosing a threshold is a trade off between catching positives and avoiding false alarms, and the ROC curve plots this trade off across all possible thresholds.

Factors that affect True Negative

The true negative rate is dependent upon factors such as the quality and quantity of training data, model complexity, and hyperparameter selection. If these training data are insufficiently representative of actual test data, a model might struggle to accurately predict negative data points. Furthermore, if it's too simple, it might not capture all features that distinguish negative from positive data points; conversely, if it's too complex it could overfit and perform poorly on actual testing data.

A few other factors influence TN counts in practice:

Class balance in the training set. If the negative class is heavily oversampled, the model may learn to default to negative predictions, which inflates TN at the cost of TP.
Label noise. If some examples that are truly negative are labeled positive in the training data, or vice versa, the model will internalize those errors and the test set TN count will suffer.
Feature engineering. Adding features that distinguish typical negatives from positives can raise both TN and TP without changing the threshold.
Calibration of the score. A poorly calibrated model may produce probability scores that cluster in a narrow range, making the choice of threshold awkward and producing unstable TN counts.

Relationship with the other three outcomes

The four outcomes always sum to the total number of test examples:

$$TP + TN + FP + FN = N$$

They also pair up by ground truth label and by predicted label.

Actual negatives equal $TN + FP$.
Actual positives equal $TP + FN$.
Predicted negatives equal $TN + FN$.
Predicted positives equal $TP + FP$.

Specificity uses the first sum as its denominator. NPV uses the third. Switching between row sums and column sums is a useful way to keep these metrics straight.

Explain Like I'm 5 (ELI5)

Machine learning often refers to a "true negative," such as when your mom inspects your room for messes and doesn't find any, even though you didn't make them.

Imagine your mom checking your room and saying, "Good job, there are no messes in here!" This would be a true negative as there was no mess made and she correctly identified that there weren't any in the room.

Machine learning teaches computers how to recognize things, like pictures of dogs. A true negative is when the computer correctly determines that something is not a dog when it actually is not one.

So it's like the computer playing "find the dog", correctly saying "this picture is not a dog" when there is none present. This is good as we want the computer to be able to accurately identify when something isn't a dog.

References

Wikipedia. "Confusion matrix." https://en.wikipedia.org/wiki/Confusion_matrix
Wikipedia. "Sensitivity and specificity." https://en.wikipedia.org/wiki/Sensitivity_and_specificity
Wikipedia. "Positive and negative predictive values." https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values
Wikipedia. "Accuracy paradox." https://en.wikipedia.org/wiki/Accuracy_paradox
Google for Developers. "Thresholds and the confusion matrix." Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/classification/thresholding
Google for Developers. "Classification: Accuracy, recall, precision, and related metrics." Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall
GeeksforGeeks. "Essential Metrics for Model Assessment: TP, TN, FP, FN in Machine Learning." https://www.geeksforgeeks.org/machine-learning/essential-metrics-for-model-assessment-tp-tn-fp-fn-in-machine-learning/
GeeksforGeeks. "Understanding the Confusion Matrix in Machine Learning." https://www.geeksforgeeks.org/machine-learning/confusion-matrix-machine-learning/
Monaghan TF, Rahman SN, Agudelo CW, et al. "Foundational Statistical Principles in Medical Research: Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value." Medicina (Kaunas). 2021. PMC8156826.

Introduction

What is True Negative (TN)?

Position in the confusion matrix

Formulas that use TN

A worked example

Why true negatives matter

The accuracy paradox and class imbalance

Effect of the classification threshold

Factors that affect True Negative

Relationship with the other three outcomes

Explain Like I'm 5 (ELI5)

References

Improve this article

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Static

Agglomerative clustering

Introduction

What is True Negative (TN)?

Position in the confusion matrix

Formulas that use TN

A worked example

Why true negatives matter

The accuracy paradox and class imbalance

Effect of the classification threshold

Factors that affect True Negative

Relationship with the other three outcomes

Explain Like I'm 5 (ELI5)

References

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Static

Agglomerative clustering