Area under the curve

Machine Learning Model Evaluation Statistics

19 min read

Updated Jul 13, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 13, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v6 · 3,853 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Area under the curve (AUC) is a single scalar metric that summarizes the performance of a binary classifier or diagnostic test across all possible decision thresholds by integrating the area beneath a performance curve. In machine learning the term most often refers to the area under the ROC (receiver operating characteristic) curve (AUC-ROC, or simply AUC), which plots the true positive rate (sensitivity) against the false positive rate (1 minus specificity) as the threshold is varied; it can equally refer to the area under the precision-recall curve (AUC-PR, or average precision).^[1]^[2] AUC-ROC has an equivalent probabilistic interpretation, established by Bamber in 1975 and popularized in medicine by Hanley and McNeil in 1982, as the probability that a randomly chosen positive instance receives a higher score than a randomly chosen negative instance, which makes it numerically identical to the normalized Mann-Whitney U statistic.^[3]^[1] The metric ranges from 0 to 1, with 0.5 corresponding to a non-informative classifier that ranks no better than chance and 1.0 corresponding to perfect ranking, and is widely used in machine learning model evaluation, medical diagnostic studies, credit scoring, and information retrieval. AUC has also been criticized, most prominently by David Hand in 2009, for using an implicit cost distribution that depends on the classifier being evaluated, and for being optimistic relative to precision-based metrics when class distributions are highly skewed.^[4]^[5]

Hanley and McNeil gave the most-cited statement of the metric's meaning in their 1982 Radiology paper, writing that the area "represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject."^[1]

This page covers AUC as a general evaluation concept (the AUC-ROC and AUC-PR families, the probabilistic interpretation, and the range and baseline). For the full mathematics, geometry, and history of the ROC curve itself, see ROC (receiver operating characteristic) curve; for a focused treatment of the ROC area metric, see AUC (Area Under the ROC Curve); and for the precision-recall area, see precision-recall curve.

What is the difference between AUC-ROC and AUC-PR?

The unqualified term "AUC" is ambiguous because two different curves are summarized by an area in common practice. AUC-ROC integrates the true positive rate against the false positive rate, while AUC-PR integrates precision against recall. Because the false positive rate uses the (typically large) negative class in its denominator, AUC-ROC can remain high even when the absolute number of false positives is large; AUC-PR, which ignores true negatives entirely, reacts directly to that number and is therefore preferred on highly imbalanced data.^[4]^[5] In scikit-learn, sklearn.metrics.roc_auc_score computes AUC-ROC, while sklearn.metrics.average_precision_score computes AUC-PR as the weighted mean of precisions across thresholds (a non-interpolated estimate that is more conservative than trapezoidal integration of the precision-recall curve).^[8] The two areas are related but not interchangeable: Davis and Goadrich proved a one-to-one correspondence between points in ROC and PR space, yet the two scalar summaries reward different aspects of behavior.^[4]

Area metric	Curve summarized	Axes	Baseline for random model
AUC-ROC	ROC curve	TPR vs FPR	0.5 (constant, independent of class prior)
AUC-PR (average precision)	precision-recall curve	precision vs recall	positive class prevalence (varies with prior)

Background and origins

The receiver operating characteristic curve has its roots in signal detection theory developed during World War II for radar engineering, where operators had to choose between declaring a faint signal a real target (aircraft) or a noise artifact (bird, weather echo).^[6]^[2] The English-language term "receiver operating characteristic" reflects this radar engineering heritage, with the curve describing how the operating point of a receiver (its sensitivity to weak signals) traded off correct detections against false alarms. Egan and others formalized the framework in psychophysics during the 1950s and 1960s, and Charles Metz, Lee Lusted, David Green, and John Swets transferred the methodology into radiology and experimental psychology.^[2]^[6]

Two papers from the 1970s and 1980s consolidated the scalar AUC as the dominant summary measure of an ROC curve. Donald Bamber's 1975 paper in the Journal of Mathematical Psychology, "The area above the ordinal dominance graph and the area below the receiver operating characteristic graph," proved that the area below an empirical ROC curve equals the probability that a randomly chosen positive observation has a higher value than a randomly chosen negative observation.^[3] James Hanley and Barbara McNeil's 1982 paper in Radiology, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," made the same probabilistic interpretation accessible to a medical audience, provided closed-form variance expressions for the empirical AUC under a binormal assumption, and gave sample-size guidance for diagnostic studies.^[1] Both papers explicitly identified the equivalence between the empirical AUC and the Mann-Whitney U statistic (also known as the Wilcoxon rank-sum statistic), which gave the metric a firm grounding in nonparametric statistics.^[1]^[3]

By the late 1990s and early 2000s, AUC had become a default model-selection criterion in machine learning, partly because the KDD Cup and other competitions adopted it for tasks with highly imbalanced class priors where raw classification accuracy was uninformative. Tom Fawcett's 2006 article "An introduction to ROC analysis" in Pattern Recognition Letters codified the visualization and the AUC computation algorithm for a machine-learning audience and is among the most-cited methodological tutorials in the field, with more than 21,000 citations.^[2]

How is AUC defined mathematically?

Let a binary classifier or diagnostic test produce a real-valued score s(x) for each instance x, with class labels y in {0, 1}. The decision rule predicts the positive class if s(x) is above a threshold t. As t varies, the classifier produces a family of (false positive rate, true positive rate) operating points. The ROC curve is the locus of these points in the unit square, where:

True positive rate (sensitivity, recall): $\mathrm{TPR}(t) = P(s(X) > t \mid Y = 1)$
False positive rate (1 minus specificity): $\mathrm{FPR}(t) = P(s(X) > t \mid Y = 0)$

The area under the ROC curve is the integral

\mathrm{AUC} = \int_0^1 \mathrm{TPR}(\mathrm{FPR}^{-1}(u))\, du

Equivalently, by integration by parts and a change of variables, AUC equals the double expectation

\mathrm{AUC} = P(s(X_p) > s(X_n)) + \frac{1}{2} P(s(X_p) = s(X_n))

where X_p denotes a random positive instance, X_n a random negative instance, and the second term accounts for ties.^[3]^[1] In words: AUC is the probability that the classifier ranks a randomly chosen positive example above a randomly chosen negative example. This probabilistic interpretation is the most widely cited intuitive meaning of the metric and underlies the connection to nonparametric statistics described in the next section.

The empirical ROC curve for a finite sample is a step function joining the operating points produced as the threshold sweeps through the observed scores. The empirical AUC is the area under this step function, which corresponds to numerical integration by the trapezoidal rule.^[2]^[7]

A diagonal line from (0,0) to (1,1) has AUC = 0.5 and corresponds to a classifier whose ranking is no better than a coin flip on the positive-versus-negative discrimination task. An AUC of 1 corresponds to perfect ranking: every positive instance scores above every negative one. An AUC below 0.5 indicates that the classifier ranks negatives above positives more often than chance; reversing the score sign produces a complementary classifier with AUC = 1 - AUC.^[2]

Probabilistic interpretation and rank-sum equivalence

Bamber's 1975 result, restated by Hanley and McNeil in 1982, places AUC in the family of two-sample rank statistics.^[3]^[1] Given $n_p$ positive instances with scores $s_1, \ldots, s_{n_p}$ and $n_n$ negative instances with scores $t_1, \ldots, t_{n_n}$ , define the indicator

I(s_i, t_j) = \begin{cases} 1 & \text{if } s_i > t_j \\ 1/2 & \text{if } s_i = t_j \\ 0 & \text{if } s_i < t_j \end{cases}

Then the empirical AUC is

\widehat{\mathrm{AUC}} = \frac{1}{n_p n_n} \sum_{i,j} I(s_i, t_j)

This expression is identical to the Mann-Whitney U statistic divided by n_p * n_n, and Mann-Whitney U is in turn linearly related to the Wilcoxon rank-sum statistic by $W = U + n_p(n_p + 1)/2$ .^[1]^[3] The rank-sum equivalence has three practical consequences. First, the empirical AUC can be computed in $O(N \log N)$ time by sorting all scores, ranking the combined sample, and summing the positive-class ranks, which is faster than the $O(n_p n_n)$ double loop implied by the direct pair-counting formula.^[2] Second, hypothesis testing on AUC can use the well-developed asymptotic theory of U-statistics, leading to closed-form variance estimators and the DeLong test described later. Third, AUC inherits invariance to monotonic transformations of the score from the rank-sum statistic: any strictly increasing function applied to s(x) leaves the empirical AUC unchanged, because it preserves all pairwise comparisons.^[2]

This last property explains a feature of AUC that is sometimes mistaken for a defect. Two classifiers can produce identical AUC values while one is well-calibrated (outputs probabilities matching empirical positive rates) and the other is wildly miscalibrated, because AUC only measures the ranking of scores, not their absolute values.^[2] Calibration requires distinct diagnostics such as reliability diagrams or the Brier score, and a classifier optimized for AUC need not produce probabilities suitable for cost-sensitive decision making.

How is AUC computed in practice?

Two algorithms dominate practical computation of the empirical AUC.

The first is trapezoidal integration on the empirical ROC curve. Sort the predicted scores in decreasing order. Sweep a threshold through the sorted scores; at each step, increment TPR if the next instance is positive and FPR if it is negative. The cumulative trajectory of (FPR, TPR) values forms a piecewise-linear step function from (0,0) to (1,1), and summing the areas of the trapezoids beneath each segment yields the AUC.^[7]^[2] For tied scores, fractional contributions to both TPR and FPR are added at the tie, which corresponds to the 1/2 weighting for ties in the rank-sum formula.

The second is the rank-based formula. Compute the ranks of all $N = n_p + n_n$ scores (using average ranks for ties). Sum the ranks assigned to positive instances to obtain $R_p$ . Then

\widehat{\mathrm{AUC}} = \frac{R_p - n_p(n_p + 1)/2}{n_p n_n}

This formula is mathematically equivalent to the trapezoidal area, runs in O(N log N) sort time, and is used internally by many libraries.^[2]^[1]

In the scikit-learn library (version 1.8.0 as of 2026), the function sklearn.metrics.roc_auc_score(y_true, y_score) implements both binary and multiclass AUC, accepts a max_fpr argument that returns the standardized partial AUC over a restricted range of false positive rates, and exposes multi_class options ovr (one-versus-rest) and ovo (one-versus-one) for problems with more than two classes.^[8] The companion function average_precision_score computes AUC-PR. R packages such as pROC and ROCR implement the same computation along with bootstrap and DeLong-based variance estimation.^[9]

Statistical inference

Because the empirical AUC is a U-statistic, it has known asymptotic properties that support hypothesis testing and confidence intervals.

Hanley and McNeil derived a closed-form expression for the variance of the empirical AUC under a binormal assumption (positive and negative scores both approximately normally distributed).^[1] The expression involves AUC itself and two correlation-like terms Q_1 and Q_2 that can be estimated from the data. The resulting standard error supports approximate normal-theory confidence intervals and is exact under the binormal model.

For comparing the AUCs of two or more classifiers evaluated on the same individuals, the standard tool is the DeLong test, introduced by Elizabeth DeLong, David DeLong, and Daniel Clarke-Pearson in their 1988 Biometrics paper "Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach."^[9] The DeLong test treats each AUC as a U-statistic, derives an unbiased estimator of the covariance matrix between the AUCs using the theory of generalized U-statistics, and constructs a multivariate normal test for differences. It accounts for the correlation induced by evaluating multiple classifiers on the same sample, and is widely implemented in statistical software including pROC, MedCalc, and SAS.^[9]

A complementary approach uses Efron's bootstrap. By drawing bootstrap samples with replacement from the evaluation set, recomputing the empirical AUC on each, and taking the empirical quantiles, one obtains a bias-corrected and accelerated (BCa) bootstrap confidence interval for AUC without distributional assumptions.^[10] Cross-validation schemes such as stratified k-fold are commonly combined with bootstrap or with permutation tests when sample sizes are small.

How is AUC extended to multiclass problems?

The original definition of AUC is intrinsically binary because the ROC plot has two axes (TPR and FPR) tied to two-class confusion outcomes. Several generalizations to multiclass problems have been proposed.

David Hand and Robert Till's 2001 paper "A simple generalisation of the area under the ROC curve for multiple class classification problems," published in Machine Learning, defined the Hand-Till multiclass AUC as the average of one-versus-one AUCs over all pairs of classes.^[11] For $c$ classes, this requires $c(c-1)/2$ pairwise AUCs, each computed using only the instances belonging to the two classes in question. The Hand-Till measure reduces to the standard binary AUC when $c = 2$ , and is insensitive to class prior distributions because pair-wise probabilities are estimated from the relevant pair only.^[11] scikit-learn implements this as multi_class='ovo' with average='macro'.^[8]

An alternative is the one-versus-rest macro AUC: for each class, compute a binary AUC treating that class as positive and all others as negative, then average. This is the multi_class='ovr' option and is sensitive to class prevalence because the negative class is a mixture whose composition depends on the data.^[8] Provost and Domingos and others have proposed volume-under-the-surface generalizations that extend the geometric area to a higher-dimensional ROC manifold, but these are computationally expensive and rarely used in practice.

Partial AUC

In many applications, especially medical screening, only a portion of the ROC curve is clinically relevant. A diagnostic test that requires a 50 percent false positive rate to achieve high sensitivity may be useless for population screening even if it has high overall AUC, because the human and economic costs of large numbers of false positives dominate.

Donna McClish's 1989 paper "Analyzing a portion of the ROC curve" in Medical Decision Making formalized the partial AUC (pAUC) as the area beneath the ROC curve restricted to a clinically meaningful range of false positive rates, typically [0, f] for some small f.^[12] The standardized partial AUC rescales pAUC so that the minimum possible value (corresponding to a non-informative classifier on the restricted region) maps to 0.5 and the maximum (perfect classifier on the restricted region) to 1, placing it on the same scale as full AUC. Variants include partial AUC restricted by a sensitivity range rather than a specificity range, and area under the precision-recall curve restricted to a recall range. scikit-learn supports partial AUC via the max_fpr argument of roc_auc_score.^[8]

How does AUC-ROC compare to precision-recall AUC?

Jesse Davis and Mark Goadrich's 2006 ICML paper "The relationship between precision-recall and ROC curves" established a one-to-one correspondence between points in ROC space and points in precision-recall space, and proved that a curve dominates another in ROC space if and only if it dominates in PR space.^[4] However, the two summary areas (ROC-AUC and PR-AUC, often called average precision) reward different aspects of classifier behavior. ROC-AUC averages over all false positive rates with equal weight, while PR-AUC averages over all recall levels with weights that depend on precision. When negatives vastly outnumber positives, the false positive rate can stay near zero even as the absolute number of false positives grows, which makes ROC-AUC visually flattering compared to PR-AUC.^[4]^[5]

Takaya Saito and Marc Rehmsmeier's 2015 PLoS ONE paper "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets" provided empirical evidence on biomedical microRNA prediction tasks, showing that ROC-AUC varied by less than 0.02 across classifiers with very different practical utility while PR-AUC varied by more than 0.50.^[5] Their analysis of 58 genome-wide imbalanced-classifier studies found that 66.7 percent reported ROC curves while only 12.1 percent reported PR curves, and they argued that this contributed to over-optimistic published performance claims.^[5] Modern best practice in binary classification with highly imbalanced classes is to report both ROC-AUC and PR-AUC, and to specify the class prior so that PR-AUC values can be compared meaningfully across datasets.^[5]^[4]

Properties and limitations

AUC has several attractive theoretical properties beyond the probabilistic interpretation. It is invariant under any strictly monotone transformation of scores, so it does not depend on calibration. It is threshold-free and so does not require a particular operating point to be selected before evaluation. It has a well-developed inferential theory through its U-statistic representation.^[2]^[1]

Limitations have been the subject of substantial methodological debate. David Hand's 2009 paper "Measuring classifier performance: a coherent alternative to the area under the ROC curve," published in Machine Learning, presented the most cited critique.^[13] Hand showed that AUC can be written as a weighted average of misclassification cost ratios, with weights that depend on the empirical distribution of scores produced by the classifier being evaluated. As a result, two different classifiers are implicitly evaluated against different cost distributions when their AUCs are compared, which Hand argued is incoherent because the relative costs of false positives and false negatives are a property of the problem rather than of the classifier. He proposed the H-measure, which fixes the cost distribution to a Beta(2,2) prior (or any other application-relevant prior) so that classifiers are compared on a common scale.^[13] The H-measure is implemented in the R package hmeasure.

A separate line of criticism concerns the behavior of ROC-AUC under class imbalance, summarized in the Davis-Goadrich and Saito-Rehmsmeier papers above.^[4]^[5] When the negative class is much larger than the positive class, large absolute increases in false positives correspond to small changes in FPR, producing a flattering ROC curve. In contrast, precision-based measures react directly to the absolute number of false positives.

A further consideration is that AUC averages performance over operating points that may never be used in practice. A medical screening program that operates only at very low false positive rates does not benefit from high AUC if that AUC is achieved mainly by good performance at high false positive rates. Partial AUC and the H-measure address this concern by restricting or weighting the integration region. Empirical studies have shown that AUC estimates from finite samples can be noisy, with confidence interval widths that exceed the typical differences reported between competing classifiers, which suggests caution when ranking models by AUC alone.^[2]

What is AUC used for in machine learning?

AUC is among the most reported metrics in published machine learning evaluations. It serves as the headline summary in many tasks within the OpenML and Kaggle ecosystems, in the binary problems within the SuperGLUE-derived benchmarks for natural language inference, and in clinical prediction model evaluation. Logistic regression and many deep neural network classifiers can be trained directly to maximize a smooth surrogate of AUC, since the rank-sum formula is differentiable when ties are absent.^[14] Pairwise ranking losses such as RankNet and ListNet were motivated in part by the desire to optimize AUC-like quantities for classification and information retrieval.

In tools like scikit-learn, AUC is the default scoring function for cross-validated model selection on imbalanced binary tasks via the scoring='roc_auc' argument to cross_val_score and GridSearchCV.^[8] PyTorch and TensorFlow expose AUC as a metric class with running estimation suitable for streaming evaluation during training. Production ML systems typically monitor AUC alongside accuracy, precision, recall, and PR-AUC to detect drift in model quality.

Use in medicine and other applied fields

In radiology, pathology, cardiology, and clinical chemistry, AUC is the standard summary of diagnostic accuracy for a continuous-valued biomarker or imaging score, with values typically interpreted using the rough rubric: 0.90 to 1.00 excellent, 0.80 to 0.90 good, 0.70 to 0.80 fair, 0.60 to 0.70 poor, 0.50 to 0.60 fail.^[1] Regulatory guidelines from the U.S. Food and Drug Administration and the European Medicines Agency reference AUC when assessing diagnostic devices, although they require additional reporting of sensitivity and specificity at fixed operating points appropriate to the intended clinical use.

Credit scoring, fraud detection, and churn prediction in industrial settings also rely heavily on AUC and on its near-equivalent the Gini coefficient ( $\text{Gini} = 2 \cdot \mathrm{AUC} - 1$ ), so that an AUC of 0.5 maps to a Gini of 0 and an AUC of 1.0 maps to a Gini of 1.^[2]^[11] Information retrieval often reports area under the precision-recall curve rather than ROC-AUC, reflecting the heavy class imbalance typical of relevance judgments and the precision focus of users browsing ranked results.^[4]

References

Hanley, J. A. and McNeil, B. J. "The meaning and use of the area under a receiver operating characteristic (ROC) curve." Radiology, vol. 143, no. 1, pp. 29-36, April 1982. https://pubs.rsna.org/doi/abs/10.1148/radiology.143.1.7063747. Accessed 2026-05-26. ↩
Fawcett, T. "An introduction to ROC analysis." Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, 2006. https://www.sciencedirect.com/science/article/abs/pii/S016786550500303X. Accessed 2026-05-26. ↩
Bamber, D. "The area above the ordinal dominance graph and the area below the receiver operating characteristic graph." Journal of Mathematical Psychology, vol. 12, no. 4, pp. 387-415, 1975. https://www.sciencedirect.com/science/article/abs/pii/0022249675900012. Accessed 2026-05-26. ↩
Davis, J. and Goadrich, M. "The relationship between precision-recall and ROC curves." Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, pp. 233-240, 25-29 June 2006. https://dl.acm.org/doi/10.1145/1143844.1143874. Accessed 2026-05-26. ↩
Saito, T. and Rehmsmeier, M. "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets." PLoS ONE, vol. 10, no. 3, e0118432, 2015-03-04. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118432. Accessed 2026-05-26. ↩
Swets, J. A. "ROC analysis applied to the evaluation of medical imaging techniques." Investigative Radiology, vol. 14, no. 2, pp. 109-121, 1979. https://journals.lww.com/investigativeradiology/Abstract/1979/03000/ROC_Analysis_Applied_to_the_Evaluation_of_Medical.2.aspx. Accessed 2026-05-26. ↩
Bradley, A. P. "The use of the area under the ROC curve in the evaluation of machine learning algorithms." Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997. https://www.sciencedirect.com/science/article/abs/pii/S0031320396001422. Accessed 2026-05-26. ↩
scikit-learn developers. "sklearn.metrics.roc_auc_score" and "sklearn.metrics.average_precision_score." scikit-learn documentation, 2026. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html. Accessed 2026-05-26. ↩
DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. "Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach." Biometrics, vol. 44, no. 3, pp. 837-845, September 1988. https://www.jstor.org/stable/2531595. Accessed 2026-05-26. ↩
Efron, B. and Tibshirani, R. J. "An Introduction to the Bootstrap." Chapman and Hall/CRC, 1993. https://doi.org/10.1201/9780429246593. Accessed 2026-05-26. ↩
Hand, D. J. and Till, R. J. "A simple generalisation of the area under the ROC curve for multiple class classification problems." Machine Learning, vol. 45, no. 2, pp. 171-186, 2001. https://link.springer.com/article/10.1023/A:1010920819831. Accessed 2026-05-26. ↩
McClish, D. K. "Analyzing a portion of the ROC curve." Medical Decision Making, vol. 9, no. 3, pp. 190-195, 1989. https://journals.sagepub.com/doi/10.1177/0272989X8900900307. Accessed 2026-05-26. ↩
Hand, D. J. "Measuring classifier performance: a coherent alternative to the area under the ROC curve." Machine Learning, vol. 77, no. 1, pp. 103-123, 2009. https://link.springer.com/article/10.1007/s10994-009-5119-5. Accessed 2026-05-26. ↩
Cortes, C. and Mohri, M. "AUC optimization vs. error rate minimization." Advances in Neural Information Processing Systems (NeurIPS), vol. 16, 2003. https://papers.nips.cc/paper/2003/hash/6ef80bb237adf4b6f77d0700e1255907-Abstract.html. Accessed 2026-05-26. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributors · full history

Suggest edit

What links here

A/B Testing Abbreviations Accuracy Acronyms Area under the PR curve COMPAS (recidivism risk assessment)Decision Threshold F1 score Machine learning terms/All Machine learning terms/Recommendation Systems Metric Model Evaluation Pipelining Scalar Terms Validation

What is the difference between AUC-ROC and AUC-PR?

Background and origins

How is AUC defined mathematically?

Probabilistic interpretation and rank-sum equivalence

How is AUC computed in practice?

Statistical inference

How is AUC extended to multiclass problems?

Partial AUC

How does AUC-ROC compare to precision-recall AUC?

Properties and limitations

What is AUC used for in machine learning?

Use in medicine and other applied fields

See also

References

Improve this article

Related Articles

AUC-ROC

False negative

False Negative Rate

False positive

False Positive Rate (FPR)

Mean Absolute Error (MAE)

What links here

Related Articles

AUC-ROC

False negative

False Negative Rate

False positive

False Positive Rate (FPR)

Mean Absolute Error (MAE)

What links here