Saliency map
Last reviewed
Apr 30, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 · 3,999 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 · 3,999 words
Add missing citations, update stale details, or suggest a clearer explanation.
A saliency map in deep learning highlights which input features, typically image pixels, most influenced a model's prediction. The canonical "vanilla" saliency map for a convolutional neural network is the absolute value of the gradient of an output class score with respect to the input pixels, computed via backpropagation. The technique was introduced for deep image classifiers by Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman in their 2013 paper "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps."
Saliency maps are one of the most widely used tools in explainable AI and model interpretability. They include a large family of methods beyond the original gradient definition: deconvolutional reconstructions, guided backpropagation, class activation mapping (CAM), Grad-CAM, Integrated Gradients, SmoothGrad, Layer-wise Relevance Propagation, DeepLIFT, SHAP, occlusion sensitivity, and perturbation-based approaches such as RISE. The literature on critiques is almost as rich as the literature on methods, with Adebayo et al. (2018), Kindermans et al. (2019), and Hooker et al. (2019) showing that visually plausible heatmaps can fail basic sanity checks.
The word "saliency" in computer vision predates deep learning. Laurent Itti, Christof Koch, and Ernst Niebur introduced a model of saliency-based visual attention in 1998 in IEEE Transactions on Pattern Analysis and Machine Intelligence. Their system decomposed an image into colour, intensity, and orientation feature maps, applied centre-surround comparisons across multiple scales, and combined the result into a single topographical "saliency map" predicting where a viewer's attention would land. The Itti-Koch-Niebur model was inspired by the early primate visual system and used a winner-take-all neural network to scan attended locations in order of decreasing saliency. This bottom-up notion of saliency is still used in psychophysics and cognitive vision.
The deep-learning sense of saliency map is different. It refers to a model-specific attribution: which pixels of this image, processed by this trained network, were responsible for predicting this class. Simonyan, Vedaldi, and Zisserman drew the link between the two senses by noting that, locally, the gradient of the class score tells you how each pixel would change the prediction, so the magnitude of that gradient defines an image-specific saliency map for the model.
Given an input image $x$ and a class score $S_c(x)$ produced by a deep neural network for class $c$, the vanilla saliency map is
$$M_c(x) = \left| \frac{\partial S_c(x)}{\partial x} \right|$$
For a colour image, the absolute gradient is computed at each pixel and channel, and the channel dimension is reduced by taking the maximum (Simonyan et al. 2013) or summing. The resulting two-dimensional map can be displayed as a grayscale heatmap highlighting pixels whose small perturbations would most change the class score. Simonyan and colleagues used the pre-softmax score so that other classes did not contribute through normalisation.
The map is cheap (a single backward pass) and works for any differentiable model, but the visualisation is often noisy. Edges and high-contrast textures dominate the map even when they are not what the network relies on, and gradient saturation through ReLU can wash out important features.
The table below summarises widely used saliency methods, their original papers, and core ideas.
| Method | First proposed | Type | Core idea |
|---|---|---|---|
| Vanilla gradient | Simonyan, Vedaldi, Zisserman 2013 | Gradient | Absolute gradient of class score w.r.t. input pixels |
| Deconvolution | Zeiler & Fergus 2014 | Modified backprop | Transposed (deconv) network with switched unpooling |
| Guided backpropagation | Springenberg et al. 2014 | Modified backprop | Backprop only positive gradients through positive ReLU activations |
| CAM | Zhou et al. 2016 | Activation-based | Weighted sum of last conv feature maps using GAP-class weights |
| Grad-CAM | Selvaraju et al. 2017 | Gradient + activation | Weight feature maps by global-average gradient of class score |
| Grad-CAM++ | Chattopadhyay et al. 2018 | Gradient + activation | Pixel-wise weighting using positive partial derivatives |
| Score-CAM | Wang et al. 2020 | Gradient-free | Weight feature maps by forward-pass score on target class |
| Eigen-CAM | Muhammad & Yeasin 2020 | Activation-based | First principal component of feature maps |
| SmoothGrad | Smilkov et al. 2017 | Gradient ensemble | Average gradient over noisy copies of the input |
| Integrated Gradients | Sundararajan, Taly, Yan 2017 | Path attribution | Integrate gradients along a path from baseline to input |
| LRP | Bach et al. 2015 | Modified backprop | Redistribute output relevance backwards by layer-specific rules |
| DeepLIFT | Shrikumar, Greenside, Kundaje 2017 | Reference-based | Compare activations to a reference and propagate differences |
| SHAP / DeepSHAP | Lundberg & Lee 2017 | Shapley values | Game-theoretic attribution unifying LIME, DeepLIFT, LRP |
| Occlusion sensitivity | Zeiler & Fergus 2014 | Perturbation | Slide a grey patch and record score drop |
| RISE | Petsiuk, Das, Saenko 2018 | Perturbation, black-box | Random binary masks weighted by output score |
Other popular variants include LIME (Ribeiro et al. 2016), Meaningful Perturbation (Fong & Vedaldi 2017), and Expected Gradients.
Class Activation Mapping was proposed by Bolei Zhou and colleagues at MIT in CVPR 2016 ("Learning Deep Features for Discriminative Localization"). CAM only works when a classification network ends in a global average pooling (GAP) layer followed by a single fully-connected layer to the class scores. Under that architecture, the class score $S_c$ can be written as
$$S_c = \sum_k w_k^c \cdot \frac{1}{Z}\sum_{i,j} A^k_{ij}$$
where $A^k_{ij}$ is the activation of the $k$-th feature map at spatial position $(i,j)$ and $w_k^c$ is the weight from feature map $k$ to class $c$. The CAM is then
$$L^c_{\text{CAM}}(i,j) = \sum_k w_k^c A^k_{ij}$$
upsampled to the input resolution. CAM produced strong weakly supervised localisation results on ILSVRC 2014 (37.1% top-5 localisation error against 34.2% for fully supervised CNNs), but its dependence on the GAP-then-linear architecture limited it to networks designed that way.
Grad-CAM, introduced by Ramprasaath Selvaraju and colleagues at Georgia Tech in ICCV 2017, removed the architectural restriction. Selvaraju et al. observed that the GAP-class weights $w_k^c$ in CAM are essentially the spatial average of $\partial S_c / \partial A^k$. They generalised CAM by defining
$$\alpha_k^c = \frac{1}{Z} \sum_{i,j} \frac{\partial S_c}{\partial A^k_{ij}}$$
and the Grad-CAM map as
$$L^c_{\text{Grad-CAM}} = \text{ReLU}!\left( \sum_k \alpha_k^c A^k \right)$$
The ReLU keeps only features with a positive influence on the class. The map is computed at the last convolutional layer, then upsampled and overlaid on the input. Grad-CAM works for VGG, ResNet, captioning networks, visual question answering, and reinforcement-learning agents without retraining or modifying the architecture. It quickly became the most widely used saliency method for CNNs in practice.
Grad-CAM++ (Chattopadhyay et al., WACV 2018) replaces the global gradient average with a per-pixel weighted average using only positive partial derivatives, giving sharper localisation when an image contains multiple instances of the same class. Score-CAM (Wang et al., CVPR Workshops 2020) removes gradients entirely: it weights each feature map $A^k$ by the model's forward-pass score on the target class when only that map is used to mask the input. Eigen-CAM uses the first principal component of the feature maps as the saliency map.
Matthew Zeiler and Rob Fergus in "Visualizing and Understanding Convolutional Networks" (ECCV 2014, honourable mention for best paper) introduced deconvolution networks, which mirror the forward CNN with transposed convolutions and "switched" unpooling that records the location of each max-pool selection. Running an activation backwards through the deconvnet produces an approximate reconstruction in image space of the pattern that triggered that activation. Zeiler and Fergus used these reconstructions to redesign AlexNet's first-layer filters and stride, giving ZFNet the top single-model result on ImageNet 2013.
Guided backpropagation (Jost Tobias Springenberg et al., "Striving for Simplicity: The All Convolutional Net," ICLR Workshop 2015) is a small but influential modification. During the backward pass through a ReLU, the standard gradient zeroes out gradients where the forward activation was zero. Guided backprop additionally zeroes out gradients that are themselves negative. The effect is to keep only neurons that fired in the forward pass and would, if increased, raise the target activation. Visualisations from guided backprop are noticeably sharper than vanilla gradients and often look like clean feature drawings, part of why the method became popular and why later sanity checks targeted it.
Layer-wise Relevance Propagation (LRP) was introduced by Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek in PLOS ONE 2015. LRP starts from the output and redistributes the prediction $f(x)$ backward through the network using layer-specific rules so the sum of relevances at every layer equals $f(x)$. The simplest rule for a linear layer with weights $w$ and inputs $a$ assigns $R_i = \sum_j (a_i w_{ij} / \sum_{i'} a_{i'} w_{i'j}) R_j$. Variants such as LRP-$\epsilon$, LRP-$\gamma$, and LRP-$\alpha\beta$ handle positive and negative contributions differently. LRP is particularly popular in medical imaging, for example for explaining MRI-based Alzheimer's classifiers.
DeepLIFT (Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje, ICML 2017) compares each neuron's activation to a reference activation for a baseline input (often a black image or the dataset mean) and propagates differences backwards in a single pass. DeepLIFT has "Rescale" and "RevealCancel" variants, the latter of which can detect dependencies that gradient methods miss because of saturation in non-linearities such as ReLU.
Mukund Sundararajan, Ankur Taly, and Qiqi Yan introduced Integrated Gradients at ICML 2017 in "Axiomatic Attribution for Deep Networks." They began from the observation that ordinary gradients fail two properties one might want from an attribution method:
Integrated Gradients computes
$$\text{IG}_i(x) = (x_i - x'_i) \cdot \int_0^1 \frac{\partial F(x' + \alpha (x - x'))}{\partial x_i} , d\alpha$$
integrating the gradient along the straight-line path from a baseline input $x'$ (often a black image, blurred image, or zeros) to the actual input $x$. The integral is approximated by a Riemann sum over typically 20 to 300 evenly spaced points. Sundararajan, Taly, and Yan proved that Integrated Gradients is the only path-integral attribution method satisfying sensitivity, implementation invariance, completeness (attributions sum to $F(x) - F(x')$), linearity, and symmetry-preserving simultaneously. The method requires no architectural change and is implemented in nearly every interpretability library.
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg of Google PAIR introduced SmoothGrad in 2017. Vanilla gradient maps look noisy because the gradient fluctuates rapidly across nearby points in input space. SmoothGrad averages the gradient over many copies of the image perturbed with small Gaussian noise:
$$\widehat{M}c(x) = \frac{1}{n} \sum{i=1}^n M_c(x + \mathcal{N}(0, \sigma^2))$$
The authors recommended $n$ between 10 and 50 and noise standard deviation around 10 to 20 percent of the input pixel range. SmoothGrad can be applied on top of any underlying gradient method, and Smilkov et al. reported that smoothed maps are more visually coherent than the underlying methods on 200 ImageNet images. A variant, VarGrad, uses the variance of the perturbed gradients rather than the mean.
Saliency does not have to come from gradients. Occlusion sensitivity, used by Zeiler and Fergus in 2014, slides a grey or zero patch over the image and records how much the target class probability drops at each location. The resulting map highlights regions whose removal hurts the prediction. Occlusion is intuitive and architecture-agnostic but expensive.
RISE (Vitali Petsiuk, Abir Das, Kate Saenko, BMVC 2018) is a randomised version of occlusion that works on any black-box model. Given $N$ random binary masks $M_i$ (typically a few thousand), the saliency at each pixel is the score-weighted average of the masks containing that pixel:
$$S(x){i,j} = \frac{1}{\mathbb{E}[M] \cdot N} \sum{i=1}^N f(x \odot M_i) \cdot M_i(i,j)$$
RISE matches or beats white-box methods on deletion, insertion, and pointing-game accuracy despite using only forward passes.
In 2018 at NeurIPS, Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim published "Sanity Checks for Saliency Maps," the standard reference for what can go wrong. They proposed two tests:
The results were striking. Vanilla gradients, Integrated Gradients, and (to a lesser extent) Grad-CAM all responded to randomisation. Guided backpropagation and Guided Grad-CAM produced almost the same map regardless of whether the model was trained, randomly initialised, or trained on shuffled labels. The maps look like edge maps of the input rather than explanations of the model's behaviour. Adebayo et al. also showed that a simple edge detector (no model at all) produces results visually similar to many popular saliency maps. Visual plausibility is not evidence of correctness.
The table below summarises which methods passed which sanity checks in the Adebayo et al. study:
| Method | Model randomisation | Data randomisation |
|---|---|---|
| Vanilla gradient | Passes | Passes |
| SmoothGrad (on gradient) | Passes | Passes |
| Integrated Gradients | Passes | Passes |
| Grad-CAM | Passes | Passes |
| Guided backpropagation | Fails | Fails |
| Guided Grad-CAM | Fails | Fails |
| Edge detector (no model) | n/a (looks like Guided backprop) | n/a |
Pieter-Jan Kindermans and colleagues, in "The (Un)reliability of Saliency Methods" (2019), made a complementary point. A simple input transformation that does not change a model's predictions, such as adding a constant shift to every pixel and absorbing it into the bias of the first layer, can completely change saliency maps from many methods. They formalised this as input invariance. Vanilla gradients and Integrated Gradients with a non-zero baseline can both fail input invariance.
Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim added a quantitative benchmark in NeurIPS 2019 with ROAR (RemOve And Retrain). For each saliency method, identify the top-$k$% most important pixels for each training image, replace them with the per-channel mean, retrain the model from scratch, then measure test accuracy. A good method should remove features the model needs, so accuracy should drop sharply. Surprisingly, several popular methods (including Guided backprop, Guided Grad-CAM, and Integrated Gradients) performed at or below random pixel selection. SmoothGrad-Squared and VarGrad gave large accuracy drops. ROAR suggests that many saliency maps that look informative are not actually identifying the features the model uses.
Even leaving aside the deeper reliability questions, naive use of saliency maps tends to introduce avoidable errors:
Saliency maps are used across the deep-learning ecosystem in several roles:
A related question for transformer models is whether attention weights themselves can serve as saliency. In a single attention head, the softmaxed scores look superficially like an explanation. In transformers with many layers, however, information from different tokens mixes through repeated self-attention, so a single head's weights at one layer do not capture the actual flow of information from input to output. Samira Abnar and Willem Zuidema proposed attention rollout and attention flow in ACL 2020 ("Quantifying Attention Flow in Transformers") to approximate input-to-output attention by multiplying attention matrices through layers (with an identity term for residual connections). Attention rollout correlates better with input gradients and ablation-based importance scores than raw attention does.
The broader debate about whether attention is an explanation at all was opened by Sarthak Jain and Byron Wallace in NAACL 2019 ("Attention is not Explanation"). They showed across several NLP tasks that attention weights are often uncorrelated with gradient-based importance, that very different attention distributions can yield identical predictions, and that learned attention rarely picks out the tokens the model is most sensitive to. Sarah Wiegreffe and Yuval Pinter responded later that year with "Attention is not not Explanation," arguing the original tests were too strict and that attention can be a useful explanation under more realistic conditions. Most current interpretability work on transformers uses gradient-based or attention-flow methods rather than raw attention as the saliency signal.
Most mainstream interpretability libraries implement the standard saliency methods:
| Library | Framework | Methods covered |
|---|---|---|
| Captum | PyTorch | Saliency, Integrated Gradients, SmoothGrad, DeepLIFT, GradientSHAP, Occlusion, Feature Ablation, Grad-CAM via LayerGradCam |
| tf-explain | TensorFlow / Keras | Vanilla Gradients, SmoothGrad, GradCAM, Occlusion Sensitivity, Integrated Gradients |
| iNNvestigate | TensorFlow / Keras | LRP variants, DeepTaylor, Guided backprop, Integrated Gradients, SmoothGrad |
| Alibi | scikit-learn / PyTorch / TensorFlow | Integrated Gradients, anchor explanations, counterfactuals, kernel SHAP |
| SHAP | model-agnostic | Kernel, Tree, and Deep variants of SHAP |
| pytorch-grad-cam (Gildenblat) | PyTorch | Grad-CAM, Grad-CAM++, Score-CAM, Eigen-CAM, Ablation-CAM |
Captum, originally from Meta's PyTorch team and now part of meta-pytorch, was published with a 2020 paper by Narine Kokhlikyan and colleagues and is the de facto choice for PyTorch users. The library name comes from the Latin for "comprehension."
The research arc spans roughly a decade. Simonyan, Vedaldi, and Zisserman in 2013 showed that you could just take the gradient. Zeiler and Fergus, Springenberg and colleagues, and Zhou and colleagues developed cleaner visualisations through deconvolution, guided backprop, and CAM. Between 2015 and 2017 the methodological flowering produced LRP, DeepLIFT, SmoothGrad, Integrated Gradients, Grad-CAM, and SHAP, with Grad-CAM emerging as the practical default for vision. Starting in 2017-2018, Adebayo, Kindermans, Hooker, and others showed that several of those methods do not survive randomisation and invariance tests. The mainstream response has been to favour Integrated Gradients or Grad-CAM over Guided backprop, combine attribution with smoothing (SmoothGrad-Squared, VarGrad), and evaluate methods quantitatively with ROAR, deletion/insertion, and pointing game.
For practitioners, the safest approach in 2026 is to use methods that pass at least the model-randomisation sanity check, compute attributions for both the predicted class and runner-up classes, ensemble several methods and look for agreement, and confirm any conclusion with an actual intervention on the input.