True negative (TN)
Last reviewed
May 11, 2026
Sources
9 citations
Review status
Source-backed
Revision
v2 · 2,151 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
9 citations
Review status
Source-backed
Revision
v2 · 2,151 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Machine learning terms
True negative (TN) is when the machine learning model correctly predicts the negative class. Machine learning classification is the process of accurately predicting a data point's class based on features. This classification can lead to four distinct outcomes: true positive (TP), true negative (TN), false positive (FP) and false negative (FN).
A true negative is the diagonal counterpart of a true positive. Both represent correct predictions; the difference is that a true positive correctly identifies a member of the positive class, while a true negative correctly identifies an example that does not belong to the positive class. Counting true negatives is a basic step in evaluating a binary classifier, and it feeds into several widely used metrics, including accuracy, specificity, and negative predictive value.
True Negative (TN) is one of the possible outcomes in a binary classification problem when the model predicts a negative result when the result or label is in fact negative. In other words, when the model correctly recognizes a data point as not belonging to any class, it is treated as a true negative.
Let us consider a machine learning model that predicts whether an email is spam or not. If the model correctly predicts that an email is not spam and it indeed is not, this would be considered a true negative result, meaning the model correctly identified that this input data point does not belong in a certain class. Google's machine learning crash course gives the same example: a legitimate email that the spam filter correctly routes to the inbox is a true negative, while a spam email that the filter correctly sends to the spam folder is a true positive.
The outcome depends on two things: the ground truth label of the example and the model's predicted label. A prediction is a true negative only when both are negative. If the ground truth is negative but the model says positive, the result is a false positive. If the ground truth is positive and the model says negative, the result is a false negative.
A confusion matrix for a binary classifier is a two by two table that counts how many examples fall into each combination of actual class and predicted class. The standard layout places actual classes in rows and predicted classes in columns. True negatives sit in the bottom right cell, the position where both the actual class and the predicted class are negative. True positives sit in the top left cell along the same diagonal. The two off diagonal cells hold false positives and false negatives.
| Predicted positive | Predicted negative | |
|---|---|---|
| Actual positive | True positive (TP) | False negative (FN) |
| Actual negative | False positive (FP) | True negative (TN) |
Reading the diagonal of the matrix gives the total number of correct predictions. TP plus TN is the number of examples the model labeled correctly. FP plus FN is the number of mistakes. This layout makes it easy to spot the kind of error a model is making, since false positives and false negatives have different costs in most real applications.
Several common evaluation metrics depend directly on the count of true negatives.
Accuracy. Accuracy is the share of all predictions that the model got right.
$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$
If a model classifies 9,000 of 10,000 emails correctly across both classes, its accuracy is 0.9. True negatives contribute to the numerator on equal footing with true positives.
Specificity, also called true negative rate (TNR). Specificity measures how well the model identifies examples from the negative class.
$$\text{Specificity} = \frac{TN}{TN + FP}$$
The denominator is the total number of actual negatives in the test set. A specificity of 0.95 in a spam filter means that 95 percent of legitimate emails are correctly routed to the inbox, and 5 percent are wrongly flagged as spam. Specificity is the complement of the false positive rate: $\text{FPR} = 1 - \text{Specificity}$.
Negative predictive value (NPV). NPV looks at the same true negatives from a different angle. Instead of asking what fraction of actual negatives the model catches, it asks what fraction of negative predictions are correct.
$$\text{NPV} = \frac{TN}{TN + FN}$$
This distinction matters in practice. Specificity and sensitivity are intrinsic properties of a classifier and stay the same regardless of how common the positive class is. NPV and positive predictive value depend on the base rate, so they shift when the prevalence of the positive class changes in the population being tested.
False positive rate (FPR). FPR is the share of actual negatives that the model misclassifies as positive.
$$\text{FPR} = \frac{FP}{FP + TN} = 1 - \text{Specificity}$$
FPR is the horizontal axis of the ROC curve, so models with more true negatives at a given threshold sit further to the left on the curve.
Suppose a hospital screens 2,030 people for colorectal cancer. The true labels are known from follow up testing. The screening model produces this confusion matrix:
| Predicted positive | Predicted negative | |
|---|---|---|
| Actual positive (30) | 20 (TP) | 10 (FN) |
| Actual negative (2,000) | 180 (FP) | 1,820 (TN) |
From these counts:
The 1,820 true negatives are people who do not have cancer and were correctly told their screening result was negative. They dominate the dataset because the prevalence is low, which is why the accuracy and NPV come out high even though the model misses a third of the actual positives.
True negatives are essential in machine learning as they help gauge the overall accuracy of a model. Accuracy refers to how accurately it predicts correct outcomes; an abundance of true negatives indicates that the model is performing well and can be trusted to make accurate predictions.
In some instances, the true negative rate is more critical than the true positive rate. For instance, in medical diagnosis, a high true negative rate helps prevent false positives which could lead to unnecessary medical treatments or surgeries. In a screening test, a patient who is truly healthy and receives a negative result is spared the cost, anxiety, and risk of follow up procedures. Specificity is the metric clinicians use to quantify this property of the test.
The same logic shows up in other domains:
True negatives can also mislead, especially when the negative class is much larger than the positive class. The accuracy paradox refers to a situation where a classifier looks excellent on accuracy but is useless in practice.
Consider credit card fraud detection. A typical dataset might contain 990 genuine transactions and 10 fraudulent ones. A naive model that labels every transaction as genuine produces:
The model has perfect specificity and 99 percent accuracy, yet it detects no fraud. In imbalanced settings, true negatives can dominate the diagonal and inflate accuracy while the positive class is ignored entirely. This is why practitioners on imbalanced problems often prefer metrics like precision, recall, F1 score, precision recall curves, or AUC ROC, which are more sensitive to performance on the minority class.
Many classifiers output a probability score instead of a hard class label. To turn the score into a decision, the model compares it against a classification threshold. The threshold directly controls how many true negatives the model produces.
As the threshold for predicting positive increases:
Lowering the threshold has the opposite effect. Choosing a threshold is a trade off between catching positives and avoiding false alarms, and the ROC curve plots this trade off across all possible thresholds.
The true negative rate is dependent upon factors such as the quality and quantity of training data, model complexity, and hyperparameter selection. If these training data are insufficiently representative of actual test data, a model might struggle to accurately predict negative data points. Furthermore, if it's too simple, it might not capture all features that distinguish negative from positive data points; conversely, if it's too complex it could overfit and perform poorly on actual testing data.
A few other factors influence TN counts in practice:
The four outcomes always sum to the total number of test examples:
$$TP + TN + FP + FN = N$$
They also pair up by ground truth label and by predicted label.
Specificity uses the first sum as its denominator. NPV uses the third. Switching between row sums and column sums is a useful way to keep these metrics straight.
Machine learning often refers to a "true negative," such as when your mom inspects your room for messes and doesn't find any, even though you didn't make them.
Imagine your mom checking your room and saying, "Good job, there are no messes in here!" This would be a true negative as there was no mess made and she correctly identified that there weren't any in the room.
Machine learning teaches computers how to recognize things, like pictures of dogs. A true negative is when the computer correctly determines that something is not a dog when it actually is not one.
So it's like the computer playing "find the dog", correctly saying "this picture is not a dog" when there is none present. This is good as we want the computer to be able to accurately identify when something isn't a dog.