# Outlier Detection

> Source: https://aiwiki.ai/wiki/outlier_detection
> Updated: 2026-04-26
> Categories: Data & Datasets, Machine Learning, Statistics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Outlier detection** (also called [anomaly detection](/wiki/anomaly_detection) or novelty detection) is the process of identifying data points, observations, or patterns that deviate significantly from the expected behavior of a dataset. Douglas Hawkins provided the classical definition in 1980: "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism." Outlier detection is used across many domains, including [fraud detection](/wiki/fraud_detection), network intrusion detection, medical diagnosis, [sensor](/wiki/sensor) monitoring, and manufacturing quality control.

Outlier detection methods draw from statistics, [machine learning](/wiki/machine_learning), and [deep learning](/wiki/deep_learning). Depending on the nature of the data and the problem, practitioners choose from statistical tests, distance-based methods, density-based algorithms, tree-based approaches, and neural network architectures. The choice of method depends on factors such as data dimensionality, the availability of labeled examples, computational constraints, and whether the data arrives as a batch or a stream.

## Explain like I'm 5 (ELI5)

Imagine you are sorting a jar of red gumballs and you find one blue marble mixed in. The blue marble looks different from everything else in the jar. That is what an outlier is: something that does not fit with the rest of the group. Outlier detection is like having a helper who checks every item in the jar and says, "This one does not belong." Computers do the same thing with numbers and data, looking for the items that are unusual or surprising compared to everything else.

## Types of outliers

Outliers are generally classified into three categories.

| Type | Description | Example |
|------|-------------|---------|
| Point outlier (global outlier) | A single data point that is far from the rest of the dataset. | A credit card transaction of $50,000 when typical transactions are under $200. |
| Contextual outlier (conditional outlier) | A data point that is anomalous in a specific context but might be normal in another. | A temperature of 35°C is normal in July but anomalous in January (in a temperate climate). |
| Collective outlier | A group of data points that are individually unremarkable but together form an unusual pattern. | A sequence of small transactions in rapid succession that together indicate a card-testing fraud attack. |

## Detection paradigms

Outlier detection methods fall into three learning paradigms based on the availability of labels.

| Paradigm | Label requirement | Description |
|----------|-------------------|-------------|
| [Supervised learning](/wiki/supervised_learning) | Fully labeled (normal and anomalous) | A [classification](/wiki/classification_model) model is trained on labeled examples of both normal and anomalous data. Effective when labeled anomalies are available, but this is rare in practice. |
| Semi-supervised learning | Only normal labels | A model learns the distribution of normal data and flags deviations at test time. Also called novelty detection. |
| [Unsupervised learning](/wiki/unsupervised_learning) | No labels | The algorithm identifies outliers based on intrinsic properties of the data such as density, distance, or isolation. This is the most common paradigm for outlier detection. |

## Statistical methods

Statistical approaches are among the oldest techniques for identifying outliers. They rely on fitting a statistical model to the data and flagging points that have low probability under that model.

### Z-score method

The Z-score measures how many [standard deviations](/wiki/standard_deviation) a data point is from the [mean](/wiki/mean). For a data point *x* in a dataset with mean *μ* and standard deviation *σ*:

**Z = (x - μ) / σ**

A common convention is to flag data points with |Z| > 3 as outliers, meaning they lie more than three standard deviations from the mean. This method assumes the data follows a [normal distribution](/wiki/normal_distribution), which limits its applicability to datasets that satisfy that assumption. It is also sensitive to the influence of extreme values on the mean and standard deviation themselves.

### Modified Z-score

The modified Z-score uses the median and the median absolute deviation (MAD) instead of the mean and standard deviation, making it more robust to the very outliers it is trying to detect:

**M = 0.6745 × (x - median) / MAD**

A threshold of |M| > 3.5 is commonly used. Because the median and MAD are resistant to extreme values, the modified Z-score performs better than the standard Z-score on datasets with heavy contamination.

### Interquartile range (IQR) method

The IQR method is a non-parametric approach that does not assume any particular data distribution. It computes the first quartile (Q1) and third quartile (Q3) and defines the IQR as Q3 - Q1. Data points that fall below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR are classified as outliers. This is the method used in standard box plots and was popularized by John Tukey.

### Grubbs' test

Grubbs' test (also known as the maximum normed residual test) is a formal [hypothesis testing](/wiki/hypothesis_testing) procedure designed to detect a single outlier in a univariate dataset assumed to come from a normal distribution. The test statistic is the largest absolute deviation from the mean divided by the standard deviation. If this statistic exceeds a critical value determined by the sample size and significance level, the most extreme point is declared an outlier. Grubbs' test can be applied iteratively to detect multiple outliers, though each removal changes the dataset properties.

### Mahalanobis distance

For multivariate data, the Mahalanobis distance accounts for correlations between variables and differences in scale. Unlike the Euclidean distance, it uses the covariance matrix of the data:

**D = √((x - μ)ᵀ S⁻¹ (x - μ))**

where *S* is the [covariance](/wiki/covariance) matrix. Points with large Mahalanobis distances are flagged as outliers. The Elliptic Envelope method in [scikit-learn](/wiki/scikit-learn) uses the Minimum Covariance Determinant (MCD) estimator to compute a robust version of the covariance matrix, making it more resistant to the influence of outliers on the covariance estimate itself. This approach works best when n_samples > n_features² and when the data is roughly elliptically distributed.

### Summary of statistical methods

| Method | Parametric? | Univariate/Multivariate | Assumptions | Strengths | Limitations |
|--------|-------------|------------------------|-------------|-----------|-------------|
| Z-score | Yes | Univariate | Normal distribution | Simple, fast | Sensitive to extreme values; assumes normality |
| Modified Z-score | Yes | Univariate | Approximate normality | Robust to contamination | Still assumes rough symmetry |
| IQR | No | Univariate | None | Distribution-free; simple | May miss outliers in multimodal data |
| Grubbs' test | Yes | Univariate | Normal distribution | Formal hypothesis test with p-values | Designed for single outliers; iterative use changes data |
| Mahalanobis distance | Yes | Multivariate | Elliptical distribution | Accounts for correlations | Requires stable covariance estimation; degrades in high dimensions |

## Distance-based methods

Distance-based methods define outliers by their remoteness from other data points. Instead of assuming a particular data distribution, they use distance metrics (such as [Euclidean distance](/wiki/euclidean_distance) or Manhattan distance) to quantify how far each point is from its neighbors.

### k-nearest neighbors (k-NN) approach

The [k-nearest neighbors](/wiki/k_nearest_neighbors) approach computes the distance from each data point to its k-th nearest neighbor. Points whose k-th neighbor distance exceeds a threshold are considered outliers. Knorr and Ng introduced one of the earliest distance-based outlier definitions in 1998, defining an outlier as a point for which fewer than *p* fraction of all data points lie within distance *D*. Variants include using the average distance to all k neighbors or the distance to the k-th neighbor alone.

The main advantage is conceptual simplicity. The main limitation is computational cost: computing all pairwise distances requires O(n²) time, which becomes expensive for large datasets. Approximate nearest-neighbor structures such as KD-trees or ball trees can reduce this cost in low-dimensional spaces.

## Density-based methods

Density-based methods estimate the local density around each data point and flag points in regions of unusually low density. Their strength is the ability to detect outliers relative to their local neighborhood, which makes them effective when normal data forms clusters of varying density.

### Local Outlier Factor (LOF)

The Local Outlier Factor algorithm was proposed by Breunig, Kriegel, Ng, and Sander in 2000. It assigns each point a score reflecting how isolated it is compared to its neighbors. LOF computes the local reachability density of a point and compares it to the densities of its k nearest neighbors.

The key steps are:

1. For each point, find its k nearest neighbors.
2. Compute the reachability distance, which is the maximum of the actual distance and the k-distance of the neighbor (smoothing out density estimates for points very close to dense clusters).
3. Compute the local reachability density (LRD) as the inverse of the average reachability distance to the k neighbors.
4. Compute the LOF score as the ratio of the average LRD of the point's neighbors to the point's own LRD.

A LOF score near 1 indicates the point has density similar to its neighbors (inlier). A score significantly greater than 1 indicates an outlier. A score below 1 indicates the point is in a denser region than its neighbors.

LOF is effective at finding outliers in datasets with clusters of different densities, where a global threshold approach would fail. For example, a point at moderate distance from a very dense cluster may be an outlier, even though it would be considered normal if judged against a sparser cluster. LOF shares some foundational concepts with [DBSCAN](/wiki/dbscan), including core distance and reachability distance.

### DBSCAN as an outlier detector

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is primarily a [clustering](/wiki/clustering) algorithm, but it has built-in outlier detection capabilities. DBSCAN classifies points as core points (with at least *minPts* neighbors within radius *eps*), border points (within *eps* of a core point but with fewer than *minPts* neighbors), or noise points. Noise points are not assigned to any cluster and can be treated as outliers.

DBSCAN's advantage for outlier detection is that it does not assume clusters have a particular shape (such as spherical). Its limitation is sensitivity to the choice of *eps* and *minPts* parameters, which can be difficult to set without domain knowledge.

## Tree-based methods

### Isolation Forest

Isolation Forest was introduced by Liu, Ting, and Zhou in 2008 at the IEEE International Conference on Data Mining. It takes a fundamentally different approach from density-based and distance-based methods: instead of modeling normal data and then identifying deviations, it directly isolates anomalies.

The core principle is that anomalies are few and different. Because of these properties, anomalies are easier to separate from the rest of the data through random partitioning. The algorithm works as follows:

1. **Build isolation trees.** For each tree, randomly sample a subset of the data. Recursively partition the data by randomly selecting a feature and a split value between the feature's minimum and maximum in the current subset. Continue until each point is isolated in its own leaf node or a maximum tree height is reached.
2. **Compute path lengths.** The path length for a data point is the number of edges traversed from the root to the leaf node where it ends up.
3. **Compute anomaly scores.** Average the path lengths across all trees in the forest. Shorter average path lengths indicate anomalies, because anomalous points are easier to isolate. The anomaly score *s* is normalized using the average path length of unsuccessful searches in a Binary Search Tree as a baseline.

Isolation Forest has linear time complexity O(n) with low memory requirements, making it efficient for large datasets. It handles high-dimensional data better than many density-based methods and does not require distance computations. The main parameters are the number of trees and the subsampling size. Scikit-learn provides an implementation via `sklearn.ensemble.IsolationForest`.

### Extended Isolation Forest

The original Isolation Forest uses axis-aligned splits, which can produce artifacts when anomalies do not align with feature axes. The Extended Isolation Forest (Hariri, Kind, and Brunner, 2019) addresses this by using hyperplane splits with random slopes rather than axis-parallel cuts, allowing it to capture anomalies in data with correlated features more effectively.

## Clustering-based methods

Clustering-based approaches first group data into clusters and then identify points that do not belong to any cluster, belong to very small clusters, or are far from cluster centroids.

### k-means based detection

After running [k-means](/wiki/k-means) clustering, points that are far from their assigned cluster centroid (relative to other points in the same cluster) can be flagged as outliers. The distance to the centroid can be compared against a threshold such as the mean plus some multiple of the standard deviation within each cluster.

### One-Class SVM

The One-Class [Support Vector Machine](/wiki/support_vector_machine_svm) (SVM) learns a decision boundary that encloses the normal data in feature space. Points falling outside this boundary are classified as outliers. It maps data into a high-dimensional feature space using a [kernel function](/wiki/kernel_function) and finds the maximum-margin hyperplane separating the data from the origin. One-Class SVM performs well when normal behavior is well represented and anomalies are rare or unknown at training time. It was introduced by Scholkopf et al. in 2001.

## Deep learning approaches

[Neural network](/wiki/neural_network) architectures have become widely used for outlier detection, especially in high-dimensional data such as images, text, and time series.

### Autoencoders

An [autoencoder](/wiki/autoencoder) is a neural network trained to reconstruct its input through a bottleneck layer. During training on normal data, the autoencoder learns to compress and reconstruct typical patterns. At inference time, anomalous inputs produce high reconstruction error because the network has not learned to represent them. The reconstruction error serves as the anomaly score.

Variants include:

- **Denoising autoencoders**: trained to reconstruct clean data from artificially corrupted inputs, improving robustness.
- **Sparse autoencoders**: add a sparsity constraint on the bottleneck activations.
- **Convolutional autoencoders**: use [convolutional layers](/wiki/convolutional_neural_network) for image and spatial data.

### Variational autoencoders (VAEs)

A [variational autoencoder](/wiki/variational_autoencoder) (VAE) learns a probabilistic latent space rather than a deterministic encoding. The encoder outputs parameters of a distribution (typically a Gaussian), and the decoder samples from this distribution to reconstruct the input. Anomalies can be detected by computing the reconstruction probability or the evidence lower bound (ELBO). VAEs capture uncertainty in the data, which makes them more sensitive to subtle anomalies compared to standard autoencoders.

### Generative adversarial networks (GANs)

A [generative adversarial network](/wiki/generative_adversarial_network) (GAN) consists of a generator and a discriminator trained in an adversarial setup. For anomaly detection, the GAN is trained on normal data. At test time, anomalies are identified by their poor reconstruction by the generator, the discriminator's confidence score, or a combination of both. AnoGAN (Schlegl et al., 2017) was one of the first GAN-based anomaly detection methods, designed for detecting anomalies in retinal optical coherence tomography images.

### Transformer-based methods

[Transformer](/wiki/transformer) architectures, originally developed for [natural language processing](/wiki/natural_language_processing), have been adapted for time series anomaly detection. Models such as AnomalyBERT use self-supervised pretraining with synthetic anomaly injection to learn representations of normal temporal patterns. The [self-attention](/wiki/self_attention) mechanism allows these models to capture long-range dependencies in sequential data.

### Self-supervised learning for anomaly detection

[Self-supervised learning](/wiki/self-supervised_learning) methods train models on pretext tasks (such as predicting rotations, solving jigsaw puzzles, or reconstructing masked portions of input) using only normal data. At test time, anomalous inputs produce poor performance on these pretext tasks, which serves as the detection signal. This approach reduces the need for labeled anomaly data and has been applied in computer vision and industrial defect detection.

### Summary of deep learning approaches

| Method | Architecture | Anomaly signal | Strengths | Limitations |
|--------|-------------|----------------|-----------|-------------|
| [Autoencoder](/wiki/autoencoder) | Encoder-decoder | Reconstruction error | Simple, effective for tabular and image data | May reconstruct anomalies well if model capacity is too high |
| [VAE](/wiki/variational_autoencoder) | Probabilistic encoder-decoder | Reconstruction probability / ELBO | Captures uncertainty; more sensitive to subtle anomalies | More complex to train; requires tuning of latent dimension |
| [GAN](/wiki/generative_adversarial_network) | Generator + discriminator | Generator reconstruction + discriminator score | Can generate realistic normal data for comparison | Training instability; mode collapse |
| [Transformer](/wiki/transformer) | Self-attention blocks | Attention-based anomaly score | Captures long-range temporal dependencies | Computationally expensive; requires large datasets |
| Self-supervised | Task-specific architecture | Pretext task performance drop | No labeled anomalies needed | Performance depends on pretext task design |

## Time series outlier detection

Detecting outliers in [time series](/wiki/time_series_analysis) data requires methods that account for temporal dependencies, trends, and seasonality.

### Seasonal decomposition

Seasonal-trend decomposition (STL) splits a time series into trend, seasonal, and residual components. Outliers are then detected in the residual component using standard statistical methods (such as the IQR method or Grubbs' test). This approach, used in methods like S-ESD (Seasonal Extreme Studentized Deviate), removes predictable patterns before testing for anomalies.

### LSTM and RNN approaches

[Recurrent neural networks](/wiki/recurrent_neural_network) (RNNs) and [Long Short-Term Memory](/wiki/long_short-term_memory_lstm) (LSTM) networks can be trained to predict the next value in a time series. When the prediction error exceeds a threshold, the observation is flagged as anomalous. LSTM-based methods are effective at capturing both short-term and long-term temporal dependencies.

### Streaming data detection

For data that arrives continuously (such as server logs, financial tickers, or IoT sensor feeds), outlier detection must be performed online. Algorithms such as Amazon's Random Cut Forest (RCF) are designed for streaming scenarios, updating the model incrementally as new data points arrive without storing the entire dataset in memory.

## Evaluation metrics

Evaluating outlier detection algorithms requires metrics suited to the typically imbalanced nature of the problem (anomalies are rare). Common metrics include:

| Metric | Description | When to use |
|--------|-------------|-------------|
| [Precision](/wiki/precision) | Fraction of detected outliers that are true outliers | When false alarms are costly |
| [Recall](/wiki/recall) | Fraction of true outliers that are detected | When missing anomalies is costly |
| [F1 score](/wiki/f1_score) | Harmonic mean of precision and recall | When balancing false positives and false negatives |
| [AUC-ROC](/wiki/auc_area_under_the_roc_curve) | Area under the receiver operating characteristic curve | For overall ranking quality across all thresholds |
| AUC-PR | Area under the precision-recall curve | Preferred for highly imbalanced datasets |
| Average precision | Weighted mean of precisions at each recall threshold | When the ranking order of anomaly scores matters |

AUC-ROC and AUC-PR do not require setting a detection threshold, which makes them useful for comparing algorithms independently of threshold selection. In highly imbalanced settings (where normal data vastly outnumbers anomalies), AUC-PR is generally more informative than AUC-ROC because the latter can be overly optimistic.

## Applications

Outlier detection is used in a wide range of practical domains.

| Domain | Application | Examples |
|--------|-------------|----------|
| Finance | [Fraud detection](/wiki/fraud_detection) | Credit card fraud, money laundering, insider trading, fraudulent insurance claims |
| Cybersecurity | Network intrusion detection | Detecting unauthorized access, malware infections, data exfiltration, unusual login patterns |
| Manufacturing | Quality control and predictive maintenance | Defective products on assembly lines, equipment degradation, sensor anomalies |
| Healthcare | Medical diagnosis | Unusual patient vital signs, rare disease identification, anomalous medical images |
| IoT and smart infrastructure | Sensor monitoring | Abnormal readings from environmental sensors, smart grid anomalies, pipeline leak detection |
| Science | Experimental data cleaning | Removing erroneous measurements from datasets before analysis |
| E-commerce | User behavior analysis | Bot detection, fake review identification, unusual browsing patterns |

## Challenges

Several challenges affect the performance of outlier detection systems in practice.

### Curse of dimensionality

As the number of features increases, the notion of distance becomes less meaningful. In high-dimensional spaces, the distances between all pairs of points tend to converge (a phenomenon called distance concentration), making it difficult for distance-based and density-based methods to distinguish outliers from normal points. [Dimensionality reduction](/wiki/dimension_reduction) techniques such as [PCA](/wiki/principal_component_analysis), [t-SNE](/wiki/t_sne), or autoencoders can help mitigate this problem by projecting data into a lower-dimensional space before applying outlier detection.

### Lack of labeled data

Labeled anomaly data is scarce in most real-world settings. Anomalies are by definition rare, and labeling them often requires expensive domain expertise. This limits the use of supervised methods and makes evaluation difficult, since ground truth labels may be incomplete or noisy.

### Concept drift

In streaming and production environments, the distribution of normal data changes over time. A model trained on historical data may fail to distinguish genuine anomalies from new-but-normal patterns. Adaptive methods that update their models incrementally are needed to handle concept drift.

### Interpretability

Many outlier detection algorithms (especially deep learning methods) produce anomaly scores without explaining why a point was flagged. In applications such as healthcare and finance, explaining the reason for an alert is often as important as detecting it. Research into explainable anomaly detection includes methods like the Subspace Outlier Degree (SOD), which identifies which features contributed most to the anomaly, and Correlation Outlier Probabilities (COP), which computes error vectors showing how a point would need to change to become normal.

### Parameter sensitivity

Most outlier detection algorithms require parameters that influence their behavior: the number of neighbors *k* in LOF, the *eps* and *minPts* in DBSCAN, the contamination rate in Isolation Forest, and the architecture of autoencoders. Setting these parameters without labeled validation data is a persistent difficulty.

## Software and libraries

Several software tools and libraries provide implementations of outlier detection algorithms.

| Library | Language | Key algorithms | Notes |
|---------|----------|---------------|-------|
| [scikit-learn](/wiki/scikit-learn) | Python | Isolation Forest, LOF, One-Class SVM, Elliptic Envelope | Part of the broader scikit-learn ML toolkit; well-documented and widely used |
| PyOD | Python | 50+ algorithms including LOF, k-NN, ECOD, autoencoders, COPOD, deep models | Dedicated outlier detection library; 26 million+ downloads since 2017 |
| ELKI | Java | LOF, ABOD, k-NN, DBSCAN, and many more | Research-oriented; optimized with index acceleration structures |
| [TensorFlow](/wiki/tensorflow) / [PyTorch](/wiki/pytorch) | Python | Custom autoencoder, VAE, GAN implementations | General deep learning frameworks used for building custom anomaly detectors |

## Historical development

The formal study of outliers dates back to the 19th century, but computational outlier detection became a distinct field in the late 20th century.

- **1980**: Douglas Hawkins published "Identification of Outliers," providing a widely cited formal definition.
- **1986**: Dorothy Denning proposed using anomaly detection for intrusion detection systems, linking outlier detection to cybersecurity.
- **1998**: Knorr and Ng introduced distance-based outlier detection, moving beyond distributional assumptions.
- **2000**: Breunig, Kriegel, Ng, and Sander proposed the Local Outlier Factor (LOF) algorithm at ACM SIGMOD, establishing density-based outlier detection.
- **2001**: Scholkopf et al. introduced One-Class SVM for novelty detection.
- **2008**: Liu, Ting, and Zhou proposed Isolation Forest at IEEE ICDM, introducing the isolation-based paradigm.
- **2017**: Schlegl et al. introduced AnoGAN, applying GANs to anomaly detection in medical imaging.
- **2017**: Zhao et al. released PyOD, providing a unified Python toolkit for outlier detection.
- **2019**: Hariri, Kind, and Brunner introduced Extended Isolation Forest with hyperplane splits.

## See also

- [Anomaly detection](/wiki/anomaly_detection)
- [Clustering](/wiki/clustering)
- [Dimensionality reduction](/wiki/dimension_reduction)
- [Feature engineering](/wiki/feature_engineering)
- [Overfitting](/wiki/overfitting)
- [Data augmentation](/wiki/data_augmentation)

## References

1. Hawkins, D. M. (1980). *Identification of Outliers*. Chapman and Hall.
2. Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). "LOF: Identifying Density-Based Local Outliers." *Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data*, 93-104.
3. Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). "Isolation Forest." *Proceedings of the 2008 IEEE International Conference on Data Mining (ICDM)*, 413-422.
4. Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2012). "Isolation-Based Anomaly Detection." *ACM Transactions on Knowledge Discovery from Data*, 6(1), 1-39.
5. Scholkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). "Estimating the Support of a High-Dimensional Distribution." *Neural Computation*, 13(7), 1443-1471.
6. Knorr, E. M. & Ng, R. T. (1998). "Algorithms for Mining Distance-Based Outliers in Large Datasets." *Proceedings of the 24th International Conference on Very Large Data Bases (VLDB)*, 392-403.
7. Hariri, S., Kind, M. C., & Brunner, R. J. (2019). "Extended Isolation Forest." *IEEE Transactions on Knowledge and Data Engineering*, 33(4), 1479-1489.
8. Schlegl, T., Seebock, P., Waldstein, S. M., Schmidt-Erfurth, U., & Langs, G. (2017). "Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery." *Information Processing in Medical Imaging (IPMI)*, 146-157.
9. Zhao, Y., Nasrullah, Z., & Li, Z. (2019). "PyOD: A Python Toolbox for Scalable Outlier Detection." *Journal of Machine Learning Research*, 20(96), 1-7.
10. Denning, D. E. (1986). "An Intrusion-Detection Model." *IEEE Transactions on Software Engineering*, SE-13(2), 222-232.
11. Aggarwal, C. C. (2017). *Outlier Analysis* (2nd ed.). Springer.
12. Chandola, V., Banerjee, A., & Kumar, V. (2009). "Anomaly Detection: A Survey." *ACM Computing Surveys*, 41(3), 1-58.
13. Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2021). "Deep Learning for Anomaly Detection: A Review." *ACM Computing Surveys*, 54(2), 1-38.
