# Integrated Gradients

> Source: https://aiwiki.ai/wiki/integrated_gradients
> Updated: 2026-06-25
> Categories: Deep Learning, Interpretability, Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Integrated Gradients** (IG) is a feature-attribution method for [explainable AI](/wiki/explainable_ai) that explains a [neural network](/wiki/neural_network) prediction by assigning each input feature an importance score, computed by integrating the model's gradients along a straight-line path from a neutral reference input (the [baseline](/wiki/baseline)) to the actual input. It was introduced by [Mukund Sundararajan](/wiki/mukund_sundararajan), [Ankur Taly](/wiki/ankur_taly), and Qiqi Yan in the 2017 [ICML](/wiki/icml) paper "Axiomatic Attribution for Deep Networks," which derives the method from two axioms, Sensitivity and Implementation Invariance, that the authors argue every attribution method ought to satisfy. [1] Unlike earlier [saliency map](/wiki/saliency_map) techniques that read off raw gradients at a single point, IG averages gradients over the whole path, and that integration step is what gives the method its name.

The paper frames IG as both principled and lightweight. In its own words, the authors "identify two fundamental axioms, Sensitivity and Implementation Invariance, that attribution methods ought to satisfy," then "use the axioms to guide the design of a new attribution method called Integrated Gradients" that "requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator." [1] IG has become one of the most widely cited gradient-based attribution techniques for [deep learning](/wiki/deep_learning) models, and it is the unique method (up to a family of path integrals) that satisfies both axioms.

IG sits within the broader field of [attribution methods](/wiki/attribution_methods) and is closely tied to the [Aumann-Shapley value](/wiki/aumann_shapley) from cooperative game theory, which generalizes the discrete [Shapley value](/wiki/shapley_value) to continuous settings. It has been adopted by major libraries including [Captum](/wiki/captum) for [PyTorch](/wiki/pytorch), the Saliency library for [TensorFlow](/wiki/tensorflow), Alibi, and SHAP. It has been applied to image classification on networks like [Inception](/wiki/inception) and [ResNet](/wiki/resnet) trained on [ImageNet](/wiki/imagenet), to natural language tasks with [BERT](/wiki/bert) and other [transformer](/wiki/transformer) models, to tabular data in finance and healthcare, and to genomics for identifying regulatory motifs in DNA. [2][3]

## What is Integrated Gradients?

Integrated Gradients is a model interpretability technique that quantifies how much each input feature contributed to a particular prediction, measured relative to a chosen baseline. The core idea is to interpolate between the baseline and the input, compute the model's gradient at many points along that straight line, average those gradients, and scale the result by the input-minus-baseline difference. The output is one attribution score per feature, with positive scores indicating features that pushed the prediction up and negative scores indicating features that pushed it down.

The method was designed to fix a known failure of plain gradients. Raw saliency can report zero importance for a feature that clearly matters, because the model output may be locally flat (saturated) at the exact input point even though the feature changed the prediction overall. By integrating gradients across the path rather than sampling a single point, IG guarantees that any feature responsible for the prediction difference receives credit. The trade-off is cost: where vanilla saliency needs one backward pass, IG needs many (the original paper recommends 20 to 300). [1][3]

## How does Integrated Gradients work? (mathematical formulation)

Let `F: R^n -> R` be a differentiable function representing a neural network output (for example, a class probability or logit). Let `x` be the input of interest and `x'` be a chosen baseline. The Integrated Gradient for input feature `i` is defined as the path integral of the partial derivative of `F` with respect to feature `i`, taken along the straight-line interpolation between `x'` and `x`:

```
IG_i(x) = (x_i - x'_i) * integral from alpha=0 to 1 of dF(x' + alpha * (x - x')) / dx_i  d(alpha)
```

In words: scale each input dimension by the difference between the input and baseline values, and multiply by the average partial derivative of the model along the linear path between them. The factor `(x_i - x'_i)` ensures the attribution is zero when an input feature equals the baseline, and the integral averages out local fluctuations in the [gradient](/wiki/gradient) caused by saturated activations or sharp nonlinearities. [1]

### How is the integral approximated numerically?

The integral is rarely computed analytically. In practice, IG is approximated by a [Riemann sum](/wiki/riemann_sum) with `m` discrete steps:

```
IG_i(x) ~ (x_i - x'_i) / m * sum from k=1 to m of dF(x' + (k/m) * (x - x')) / dx_i
```

Each step requires one forward pass and one backward pass through the network. The original paper recommends 20 to 300 steps, enough to approximate the integral to within about 5 percent, with the exact number depending on how much the model output changes between the baseline and input. [1][3] For convolutional networks like Inception V3 evaluated on ImageNet, 50 steps is a common default that converges within a few percent of completeness; in practice, libraries sometimes push the step count into the thousands when high-precision attributions are needed. [3]

### A simple worked example

Consider a one-dimensional ReLU network `F(x) = max(0, x - 0.5)` evaluated at `x = 1.0` with baseline `x' = 0.0`. The model output is `F(1.0) = 0.5` and `F(0.0) = 0.0`. The gradient `dF/dx` is 0 for `x < 0.5` and 1 for `x > 0.5`. Along the path from 0 to 1 the gradient equals 1 for half the path and 0 for the other half, so the path-averaged gradient is 0.5. Multiplying by `(x - x') = 1` gives an attribution of 0.5, exactly equal to `F(x) - F(x')`. Vanilla saliency, by contrast, would report a gradient of 1 evaluated at `x = 1.0`, ignoring the saturated region between 0 and 0.5 where the function was actually flat.

## What are the axioms behind Integrated Gradients?

The central contribution of the IG paper is the introduction of two axioms that any attribution method should satisfy. The authors argue that prior literature had judged methods primarily by visual appeal rather than by formal properties, making it hard to know whether a saliency map captures model behavior or merely produces plausible-looking heatmaps. As the paper puts it, most known attribution methods do not satisfy these axioms, "which we consider to be a fundamental weakness of those methods." [1]

### What is the Sensitivity axiom?

The **Sensitivity (a)** axiom states: if an input `x` and baseline `x'` differ in a single feature, and the model output differs between them, then that feature must receive a non-zero attribution. The **Sensitivity (b)** axiom adds the converse condition that if a function does not mathematically depend on a variable, then attribution to that variable should be zero.

Vanilla [saliency map](/wiki/saliency_map) methods, which compute `dF/dx` at the input point alone, can fail Sensitivity (a) because of *saturated activations*. In deep networks with many ReLU units, the gradient of a class score with respect to an input pixel can be exactly zero even when changing the pixel from black to its observed value would change the prediction. IG circumvents this by averaging gradients along the interpolation path, which guarantees that any feature contributing to the prediction difference receives non-zero credit. [1]

### What is the Implementation Invariance axiom?

The **Implementation Invariance** axiom requires that two networks computing the exact same input-output function `F` should produce identical attributions, even if they have different internal architectures. Two networks are *functionally equivalent* if `F_1(x) = F_2(x)` for every input `x`.

This axiom is satisfied automatically by methods that depend only on input gradients, since the gradient of `F` is a property of the function, not of its implementation. Vanilla gradients and IG both satisfy it. [DeepLIFT](/wiki/deeplift), proposed by Shrikumar, Greenside, and Kundaje, replaces the partial derivative with a *reference difference quotient* and propagates modified gradients backward through the network. [4] Because DeepLIFT rules depend on the choice of nonlinearity and on how layers are decomposed, two functionally equivalent networks with different layer structures can yield different attributions. The same critique applies to [LRP](/wiki/lrp) (Bach et al. 2015). [5]

IG is therefore distinguished by satisfying both axioms simultaneously. Sundararajan and colleagues prove that, among the family of *path integration* methods that integrate gradients along some path from baseline to input, the linear path is the unique choice that satisfies a stronger set of axioms including Symmetry-Preserving (symmetric inputs receive equal attribution). [1]

## What is the Completeness property and how does it connect to Shapley values?

IG satisfies a useful accounting property called **Completeness**: the per-feature attributions sum exactly to the difference in model output between the input and the baseline.

```
sum over i of IG_i(x) = F(x) - F(x')
```

This is a direct consequence of the fundamental theorem of calculus applied to the path integral, and it means the attributions can be interpreted as a decomposition of the prediction change into per-feature contributions. Completeness also makes IG comparable to [SHAP](/wiki/shap) and other attribution methods that produce a budgeted distribution of credit. In practice, Completeness doubles as a built-in correctness check: summing the attributions and comparing against `F(x) - F(x')` reveals whether the Riemann approximation used enough steps. [6]

The connection to game theory runs deeper than the surface analogy. IG is mathematically equivalent to the [Aumann-Shapley value](/wiki/aumann_shapley) for the linear path between baseline and input. Aumann and Shapley developed this in 1974 as a continuous extension of the discrete [Shapley value](/wiki/shapley_value) for cost allocation where players contribute fractional amounts. [7] In game-theoretic terms, IG distributes the payout `F(x) - F(x')` among features in a way that satisfies Efficiency, Linearity, Symmetry, and the Dummy axioms.

Sundararajan and Najmi's 2020 paper "The many Shapley values for model explanation" later clarified the relationship between IG, KernelSHAP, baseline Shapley, and other Shapley-derived methods, showing they correspond to different choices of game formulation. [8] IG's specific choice of a *deterministic linear path* makes it computationally tractable in continuous settings where a true expectation over feature subsets would require exponential evaluation.

## How do you choose a baseline?

The choice of baseline `x'` is the most consequential modeling decision in IG and the most discussed practical issue in the literature. The baseline defines what "absent" or "neutral" features mean. Different baselines produce different attributions, sometimes dramatically so.

The original paper recommends that the baseline should produce a near-zero output: `F(x') ~ 0`. For an image classifier this typically means that a baseline image should not be predicted as the target class. Common baseline choices include:

| Baseline type | Description | Typical use case |
|---|---|---|
| Zero / black image | Input replaced with all zeros (or zero after normalization) | Default for image models, often used with Inception, ResNet |
| Random noise | Pixels sampled from Gaussian or uniform noise | Avoids systematic bias of black baseline; common alternative |
| Gaussian blur | Heavily blurred version of the input | Removes high-frequency content but preserves overall structure |
| Mean image | Average of training set | Represents an "average" example |
| Multiple baselines | Average IG over many baselines | Reduces baseline sensitivity, used in Expected Gradients |
| Padding token | Embedding of a special PAD or [MASK] token | Standard for transformer NLP models |
| Empty string embedding | Zero vector in embedding space | Alternative for text classification |

The black-image baseline has been criticized because it is itself informative: a black region in a natural image is unusual, and the gradient with respect to a black pixel may emphasize edge detectors that fire on dark regions rather than truly indicating absence of the feature. [9] Many practitioners now use noise or multi-baseline approaches.

Erion and colleagues introduced [Expected Gradients](/wiki/expected_gradients) in 2021, which marginalizes over a distribution of baselines (typically the training distribution) by averaging IG attributions across sampled baselines. [10] This approximates the Aumann-Shapley value with respect to the data distribution and removes the need to pick a single arbitrary baseline.

For text models, the baseline is usually the embedding of a PAD or zero token, since the model itself only sees continuous embeddings. Attribution is computed with respect to the embedding vectors and then summed across embedding dimensions to give a scalar score per token.

## How is Integrated Gradients implemented in practice?

A reference implementation of IG looks like this in pseudocode:

```
def integrated_gradients(model, x, x_prime, target_class, m=50):
    alphas = linspace(0, 1, m)
    interpolated = x_prime + alphas[:, None] * (x - x_prime)
    grads = []
    for x_alpha in interpolated:
        x_alpha.requires_grad = True
        output = model(x_alpha)[target_class]
        grad = autograd.grad(output, x_alpha)[0]
        grads.append(grad)
    avg_grad = mean(grads, axis=0)
    attributions = (x - x_prime) * avg_grad
    return attributions
```

In production the interpolated inputs are batched, so a single forward pass computes outputs for all `m` points at once. Memory becomes a constraint at large `m`, and implementations often chunk the integration into mini-batches.

A common diagnostic is to verify completeness numerically: sum the attributions and compare against `F(x) - F(x')`. Captum exposes this via the `return_convergence_delta=True` flag in its `IntegratedGradients` class. Deltas above a few percent indicate the model contains discontinuities or that more steps are required.

IG requires the model to be differentiable along the path. For piecewise-linear networks (ReLU, max-pooling) the path is differentiable almost everywhere, and the Riemann sum converges. For non-differentiable operations like argmax, gradient smoothing tricks are needed.

## What is Integrated Gradients used for? (applications)

IG has been deployed across many domains. The table below summarizes representative use cases.

| Domain | Model architecture | Use of IG | Reference |
|---|---|---|---|
| Image classification | Inception V3, ResNet, VGG on ImageNet | Pixel-level saliency for class predictions | Sundararajan et al. 2017 [1] |
| NLP / question answering | BERT, T5, RoBERTa | Token-level attribution for answer spans and classifications | Mudrakarta et al. 2018 [11] |
| Tabular finance | Gradient-boosted trees and MLPs | Per-feature credit risk explanation | Erion et al. 2021 [10] |
| Healthcare | CNNs for medical imaging, MLPs for EHR | Clinical decision support and bias auditing | Lundberg et al. 2018 [12] |
| Genomics | DeepBind, DeepSEA, BPNet | Identifying regulatory motifs and DNA-binding sites | Avsec et al. 2021 [13] |
| Recommendation systems | Two-tower neural networks | Feature importance for item ranking | Sundararajan et al. 2019 [14] |
| Drug discovery | Graph neural networks | Atom-level attribution for molecular property prediction | Jimenez-Luna et al. 2020 [15] |
| Speech recognition | Wav2Vec, Conformer | Time-frequency attribution on spectrograms | Becker et al. 2018 [16] |

In image work, IG is usually visualized as a heatmap overlay, with positive attributions in red and negative in blue. The maps tend to be smoother than vanilla saliency because integration averages out local gradient noise. Common practice is to take absolute attributions or sum across color channels.

For NLP tasks, IG attributions to token embeddings are summed across embedding dimensions to give a scalar score per token, visualized as heatmap-colored text. Researchers have used IG to identify spurious correlations in BERT-based models, such as overreliance on stopwords. [11]

In genomics, IG and [DeepLIFT](/wiki/deeplift) extract sequence motifs from convolutional networks trained on regulatory tasks. The DeepBind and BPNet papers use IG-style attributions to identify transcription factor binding motifs in DNA. [13]

## What variants and extensions of Integrated Gradients exist?

Several extensions of IG have been proposed to address specific limitations.

**Expected Gradients** (Erion et al. 2021) replaces the single baseline with a distribution sampled from a reference set, often the training data. Averaging IG across these baselines yields the Aumann-Shapley value with respect to the data distribution. The approach reduces baseline sensitivity and can be used as an attribution-guided regularizer during training. [10]

**Blur Integrated Gradients** (Xu et al. 2020) replaces the linear path with a path through progressively blurred versions of the input. The intuition is that a Gaussian-blurred image is a more semantically meaningful "absence of features" baseline than a black image. The resulting attributions are smoother and align better with human perceptual judgments on natural images. [9]

**Guided Integrated Gradients** (Kapishnikov et al. 2021) chooses the integration path adaptively rather than using a straight line. At each step, the path moves the feature with the largest current gradient, rather than all features uniformly. Empirically, Guided IG produces sharper and less noisy attributions on image classifiers, at the cost of breaking the strict Aumann-Shapley interpretation. [17]

**[SmoothGrad](/wiki/smoothgrad)-Integrated Gradients** combines IG with SmoothGrad by averaging IG over multiple noisy versions of the input. [18] This reduces visual noise in the saliency map at the cost of additional compute. The combined method is sometimes called Smooth IG or SmoothGrad-IG.

**Integrated Hessians** (Janizek, Sturmfels, Lee 2021) extends IG to capture pairwise feature interactions by integrating second derivatives along the path. The result is a matrix of pairwise attributions that decomposes the prediction not just into per-feature credit but into per-feature-pair interactions, useful for diagnosing how a model combines features. [19]

**XRAI** (Kapishnikov et al. 2019) combines IG with image segmentation. After computing pixel-level IG, XRAI aggregates attributions over superpixel regions to produce region-level explanations that are more interpretable on natural images than pixel-level heatmaps. [20]

## What are the limitations and critiques of Integrated Gradients?

Despite its theoretical grounding, IG has documented limitations. The most influential critique is the 2018 NeurIPS paper "Sanity Checks for Saliency Maps" by Adebayo and colleagues. [21] They proposed two tests: a *parameter randomization test*, where weights are randomized layer by layer, and a *data randomization test*, where labels are shuffled. A reliable method should produce attributions that change substantially when the model is randomized. They found that several popular methods, including some IG configurations depending on the baseline, produced visually similar attributions on randomly initialized and trained networks. The conclusion was that visual similarity to edge maps does not imply faithfulness to the model. This sparked more rigorous evaluation methodology across the saliency literature.

Other limitations include:

**Path dependence.** The straight-line path is one of many. The Aumann-Shapley value depends on the chosen path, and although the linear path is the unique symmetric one, alternatives (Blur IG, Guided IG) yield different attributions. The choice rests on axiomatic arguments and visual quality, not pure empirics.

**Computational cost.** A single IG attribution requires 20 to 300 forward and backward passes. For a large transformer with billions of parameters, this can be prohibitive. Approximate methods like sampling-based IG reduce cost.

**Baseline sensitivity.** As discussed above, attributions can change significantly with the choice of baseline. There is no universally correct baseline, and different fields have developed domain-specific conventions.

**Discrete inputs.** For categorical or token inputs the partial derivative is undefined. The workaround is to compute IG with respect to the continuous embedding and sum across embedding dimensions. The attribution is then over the embedding, not over the original token.

**Limited feature interactions.** Standard IG produces an additive decomposition into per-feature contributions. Real models combine features non-additively, and this structure is invisible in standard IG. Integrated Hessians and other extensions address this gap.

**Adversarial fragility.** Small perturbations that change predictions can produce IG attributions similar to the original, suggesting IG may not always reveal what the model truly responds to. [22]

**Faithfulness vs intuitiveness.** Visually appealing saliency maps are not always faithful to the model. IG occupies a middle ground that is more faithful than vanilla saliency but less so than deletion-based methods.

## How does Integrated Gradients compare with other attribution methods?

The following table compares IG with other widely used attribution methods. All of them produce per-feature attributions but differ in how they obtain those scores.

| Method | Year | Approach | Implementation invariant | Completeness | Computational cost |
|---|---|---|---|---|---|
| Vanilla Saliency | 2014 | Gradient at input point | Yes | No | 1 backward pass |
| Guided Backprop | 2014 | Gradient with negative values clipped | No | No | 1 backward pass |
| Class Activation Map (CAM) | 2016 | Weighted sum of last-layer activations | No | No | 1 forward pass |
| Grad-CAM | 2017 | Gradient-weighted activation map | Partially | No | 1 backward pass |
| LRP | 2015 | Layer-wise relevance propagation rules | No | Yes (with constraints) | 1 backward pass |
| DeepLIFT | 2017 | Reference-based modified backprop | No | Yes | 1 backward pass |
| Integrated Gradients | 2017 | Path integral of gradients | Yes | Yes | 20 to 300 passes |
| SmoothGrad | 2017 | Average gradients over noisy inputs | Yes | No | n forward+backward passes |
| LIME | 2016 | Local linear model around input | Model agnostic | Approximately | Many forward passes |
| SHAP (KernelSHAP) | 2017 | Sampled Shapley values | Model agnostic | Yes | Many forward passes |
| GradientSHAP | 2017 | IG averaged over baselines and noise | Yes | Yes | Many passes |
| Expected Gradients | 2021 | IG averaged over baseline distribution | Yes | Yes | Many passes |
| Blur IG | 2020 | IG along Gaussian blur path | Yes | Yes | 20 to 300 passes |
| Guided IG | 2021 | IG along adaptively chosen path | Yes | Yes | 20 to 300 passes |

[Saliency map](/wiki/saliency_map) methods (Simonyan et al. 2014) compute the absolute value of `dF/dx` at the input. They are cheap but suffer from gradient saturation and noise. [23]

[DeepLIFT](/wiki/deeplift) (Shrikumar et al. 2017) computes attributions by propagating reference activations backward through the network using rules that depend on the specific layer types. DeepLIFT is faster than IG (one backward pass vs many), but it sacrifices Implementation Invariance. [4]

[LRP](/wiki/lrp) (Bach et al. 2015) propagates relevance backward through the network using conservation rules. Like DeepLIFT, it depends on the specific layer types and so is not implementation invariant. [5]

[SHAP](/wiki/shap) (Lundberg and Lee 2017) is a unifying framework that estimates the Shapley value of each input feature using sampling or kernel-based techniques. SHAP is model-agnostic, but its sampling cost grows quickly with the number of features. SHAP can also use IG-like techniques internally; the GradientExplainer in the SHAP library implements an IG variant. [6]

[LIME](/wiki/lime) (Ribeiro et al. 2016) fits a sparse linear model in the neighborhood of the input and uses its coefficients as attributions. LIME is also model-agnostic and works for any classifier, but is more sensitive to neighborhood definition and sampling. [24]

[SmoothGrad](/wiki/smoothgrad) (Smilkov et al. 2017) reduces gradient noise by averaging gradients over many noisy versions of the input. It is often combined with other gradient methods including IG to denoise the attribution maps. [18]

## Which libraries implement Integrated Gradients?

IG is supported by all major interpretability libraries.

**[Captum](/wiki/captum)** is the official PyTorch interpretability library from Meta. Its `IntegratedGradients` class supports batched integration, multiple baselines, and convergence diagnostics. Captum also includes `LayerIntegratedGradients` for attributing to intermediate layer activations. [25]

**TensorFlow Saliency** library (Google's PAIR team) provides a NumPy-based implementation that interoperates with TensorFlow models. It also includes XRAI, Blur IG, and Guided IG. [26]

**Alibi** is a Python library focused on machine learning explanation, with `IntegratedGradients` built on top of TensorFlow and Keras. [27]

**SHAP library** includes a `GradientExplainer` class that approximates IG by averaging gradients along a path between input and a sampled baseline from a background dataset. It is similar to Expected Gradients in spirit. [6]

**InterpretML** (Microsoft) wraps several interpretability methods including IG-like approaches. [28]

All implementations share the same algorithm: take an input and a baseline, generate `m` interpolated inputs along the linear path, compute gradients at each, and combine them via a Riemann sum scaled by `(x - x')`. Differences arise in batching, layer-level attribution, and convergence diagnostics. For large transformers, layer-level IG is often more useful than input-level IG because token IDs have no natural baseline. Layer IG attributes to embedding outputs or attention activations, which admit natural zero baselines.

## ELI5: Integrated Gradients in plain terms

Imagine a model that looks at a photo and decides "this is a cat." You want to know which pixels convinced it. The naive approach is to nudge each pixel a tiny bit and see how much the "cat" score wiggles, but that can be misleading: a pixel might already be "maxed out," so wiggling it does nothing locally even though it was important. Integrated Gradients fixes this by starting from a blank reference image (the baseline, often all black) and slowly fading the real photo in, checking at each step how the "cat" score responds. Adding up those responses tells you how much each pixel mattered on the journey from blank to real. A nice bonus is that all the per-pixel scores add up exactly to the change in the "cat" score between the blank image and the real one, so nothing is double-counted or lost.

## See also

- [Attribution methods](/wiki/attribution_methods)
- [Explainable AI](/wiki/explainable_ai)
- [Interpretability](/wiki/interpretability)
- [Feature attribution](/wiki/feature_attribution)
- [SHAP](/wiki/shap)
- [LIME](/wiki/lime)
- [DeepLIFT](/wiki/deeplift)
- [LRP](/wiki/lrp)
- [SmoothGrad](/wiki/smoothgrad)
- [Expected Gradients](/wiki/expected_gradients)
- [Saliency map](/wiki/saliency_map)
- [Aumann-Shapley value](/wiki/aumann_shapley)
- [Shapley value](/wiki/shapley_value)
- [Convolutional neural network](/wiki/convolutional_neural_network)

## References

[1] Sundararajan, M., Taly, A., & Yan, Q. (2017). "Axiomatic Attribution for Deep Networks." *Proceedings of the 34th International Conference on Machine Learning (ICML)*. https://arxiv.org/abs/1703.01365

[2] Captum documentation. "Integrated Gradients." Meta AI. https://captum.ai/api/integrated_gradients.html

[3] TensorFlow tutorials. "Integrated Gradients." https://www.tensorflow.org/tutorials/interpretability/integrated_gradients

[4] Shrikumar, A., Greenside, P., & Kundaje, A. (2017). "Learning Important Features Through Propagating Activation Differences." *ICML 2017*. https://arxiv.org/abs/1704.02685

[5] Bach, S., Binder, A., Montavon, G., Klauschen, F., Muller, K. R., & Samek, W. (2015). "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation." *PLOS ONE*. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140

[6] Lundberg, S. M., & Lee, S. I. (2017). "A Unified Approach to Interpreting Model Predictions." *NeurIPS 2017*. https://arxiv.org/abs/1705.07874

[7] Aumann, R. J., & Shapley, L. S. (1974). *Values of Non-Atomic Games*. Princeton University Press. https://press.princeton.edu/books/hardcover/9780691645469/values-of-non-atomic-games

[8] Sundararajan, M., & Najmi, A. (2020). "The Many Shapley Values for Model Explanation." *ICML 2020*. https://arxiv.org/abs/1908.08474

[9] Xu, S., Venugopalan, S., & Sundararajan, M. (2020). "Attribution in Scale and Space." *CVPR 2020*. https://arxiv.org/abs/2004.03383

[10] Erion, G., Janizek, J. D., Sturmfels, P., Lundberg, S. M., & Lee, S. I. (2021). "Improving Performance of Deep Learning Models with Axiomatic Attribution Priors and Expected Gradients." *Nature Machine Intelligence*. https://arxiv.org/abs/1906.10670

[11] Mudrakarta, P. K., Taly, A., Sundararajan, M., & Dhamdhere, K. (2018). "Did the Model Understand the Question?" *ACL 2018*. https://arxiv.org/abs/1805.05492

[12] Lundberg, S. M., Nair, B., Vavilala, M. S., et al. (2018). "Explainable Machine-Learning Predictions for the Prevention of Hypoxaemia During Surgery." *Nature Biomedical Engineering*. https://www.nature.com/articles/s41551-018-0304-0

[13] Avsec, Z., Weilert, M., Shrikumar, A., et al. (2021). "Base-Resolution Models of Transcription-Factor Binding Reveal Soft Motif Syntax." *Nature Genetics*. https://www.nature.com/articles/s41588-021-00782-6

[14] Sundararajan, M., Xu, J., Taly, A., Mukund, S., & Najmi, A. (2019). "Exploring Principled Visualizations for Deep Network Attributions." *IUI Workshops*. https://research.google/pubs/pub47800/

[15] Jimenez-Luna, J., Grisoni, F., & Schneider, G. (2020). "Drug Discovery with Explainable Artificial Intelligence." *Nature Machine Intelligence*. https://www.nature.com/articles/s42256-020-00236-4

[16] Becker, S., Ackermann, M., Lapuschkin, S., Muller, K. R., & Samek, W. (2018). "Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals." arXiv preprint. https://arxiv.org/abs/1807.03418

[17] Kapishnikov, A., Venugopalan, S., Avci, B., Wedin, B., Terry, M., & Bolukbasi, T. (2021). "Guided Integrated Gradients: An Adaptive Path Method for Removing Noise." *CVPR 2021*. https://arxiv.org/abs/2106.09788

[18] Smilkov, D., Thorat, N., Kim, B., Viegas, F., & Wattenberg, M. (2017). "SmoothGrad: Removing Noise by Adding Noise." arXiv preprint. https://arxiv.org/abs/1706.03825

[19] Janizek, J. D., Sturmfels, P., & Lee, S. I. (2021). "Explaining Explanations: Axiomatic Feature Interactions for Deep Networks." *Journal of Machine Learning Research*. https://arxiv.org/abs/2002.04138

[20] Kapishnikov, A., Bolukbasi, T., Viegas, F., & Terry, M. (2019). "XRAI: Better Attributions Through Regions." *ICCV 2019*. https://arxiv.org/abs/1906.02825

[21] Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2018). "Sanity Checks for Saliency Maps." *NeurIPS 2018*. https://arxiv.org/abs/1810.03292

[22] Ghorbani, A., Abid, A., & Zou, J. (2019). "Interpretation of Neural Networks is Fragile." *AAAI 2019*. https://arxiv.org/abs/1710.10547

[23] Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps." *ICLR Workshop*. https://arxiv.org/abs/1312.6034

[24] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You? Explaining the Predictions of Any Classifier." *KDD 2016*. https://arxiv.org/abs/1602.04938

[25] Captum library on GitHub. https://github.com/pytorch/captum

[26] PAIR-code Saliency library on GitHub. https://github.com/PAIR-code/saliency

[27] Alibi library documentation. "Integrated Gradients." Seldon. https://docs.seldon.io/projects/alibi/en/stable/methods/IntegratedGradients.html

[28] InterpretML on GitHub. https://github.com/interpretml/interpret

[29] Molnar, C. (2022). *Interpretable Machine Learning: A Guide for Making Black Box Models Explainable*. Chapter on gradient-based attribution. https://christophm.github.io/interpretable-ml-book/

[30] Ancona, M., Ceolini, E., Oztireli, C., & Gross, M. (2018). "Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks." *ICLR 2018*. https://arxiv.org/abs/1711.06104
