# Layer-wise Relevance Propagation (LRP)

> Source: https://aiwiki.ai/wiki/lrp
> Updated: 2026-06-28
> Categories: Deep Learning, Interpretability, Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Layer-wise Relevance Propagation (LRP)** is an [explainable AI](/wiki/explainable_ai) method that explains the prediction of a deep [neural network](/wiki/neural_network) by propagating the model's output backward through the network, layer by layer, and redistributing it onto the input features (for example, individual pixels) as per-feature relevance scores. The resulting scores form a heatmap that shows which parts of the input drove the decision. LRP was introduced in 2015 by Sebastian Bach, Alexander Binder, [Grégoire Montavon](/wiki/gregoire_montavon), Frederick Klauschen, [Klaus-Robert Müller](/wiki/klaus_robert_muller), and [Wojciech Samek](/wiki/wojciech_samek) in a *PLOS ONE* paper titled "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation", which has become one of the most widely cited works in explainable AI [1]. Its defining feature is a conservation principle: the total relevance is preserved at every layer, so the per-feature scores sum back to the model's output.

LRP belongs to the broader family of [attribution methods](/wiki/attribution_methods). Unlike gradient-based explanations, it does not multiply gradients along the backward pass; instead it redistributes a conserved quantity (relevance) using layer-specific rules, which is why it is sometimes described as a backward decomposition of the prediction rather than a sensitivity analysis. The original authors describe the goal as producing explanations a human can act on: "These pixel contributions can be visualized as heatmaps and are provided to a human expert who can intuitively not only verify the validity of the classification decision" [1]. This conservation property distinguishes LRP from saliency-style methods and makes it one of the foundational techniques in modern [interpretability](/wiki/interpretability) research.

LRP gained particular visibility after Lapuschkin and colleagues used it in 2019 to expose a [Clever Hans](/wiki/clever_hans) effect in image classifiers trained on [PASCAL VOC](/wiki/pascal_voc), where a model appeared to classify horses correctly but was actually relying on copyright watermarks present in many of the training images [2]. The technique is implemented in several open-source libraries, including [iNNvestigate](/wiki/innvestigate) for [TensorFlow](/wiki/tensorflow) and [Keras](/wiki/keras), [Zennit](/wiki/zennit) for [PyTorch](/wiki/pytorch), and [Captum](/wiki/captum), and it has been applied across [medical imaging](/wiki/medical_imaging), [genomics](/wiki/genomics), [NLP](/wiki/nlp), [time series analysis](/wiki/time_series_analysis), and other domains.

## What is Layer-wise Relevance Propagation?

LRP is a technique for explaining individual predictions of a trained neural network by attributing a relevance score to each input feature. Given an input `x` and a network output `f(x)`, LRP treats the output as a fixed budget of "relevance" and pushes that budget backward through the layers until it reaches the input, where it appears as a heatmap of contributions. In the words of the original paper, the authors propose "a novel concept we denote as layer-wise relevance propagation as a general concept for the purpose of achieving a pixel-wise decomposition" [1].

The defining constraint is conservation. At every layer the relevance flowing out equals the relevance flowing in, so nothing is created or destroyed as the explanation propagates. Because of this, the final input-level scores sum (up to bias terms) to the prediction itself, which means the heatmap can be read as a literal decomposition of the model's output rather than an arbitrary importance surface. This is the property the LRP literature usually calls completeness or conservation, and it is the main thing that sets LRP apart from gradient saliency.

## Background: why gradient explanations were not enough

The problem of explaining individual predictions of deep neural networks became pressing as image classifiers grew in size and complexity through the early 2010s. Early approaches relied on gradients of the output with respect to the input. The [saliency map](/wiki/saliency_map) method of Simonyan, Vedaldi, and Zisserman (2014) computed gradient magnitude at each input pixel and visualized it as a heatmap [3]. While intuitive, gradient-based saliency suffers from saturation: when a neuron is firing strongly, its gradient may be near zero even if the feature it represents was crucial for the prediction. Gradients also tend to produce noisy attribution patterns.

Deconvolutional networks (Zeiler and Fergus, 2014) and [guided backpropagation](/wiki/guided_backpropagation) (Springenberg et al., 2014) extended the idea of running the network in reverse and produced visually cleaner explanations by zeroing out negative signals during the backward pass. However, these methods do not satisfy a clear conservation principle and can produce explanations that look plausible regardless of the model's actual decision logic.

LRP, introduced in 2015, took a different approach by formulating attribution as a redistribution problem. The method starts with the predicted output value treated as the total relevance, then propagates that relevance backward through the network using rules that ensure the sum of relevances at every layer equals the original output.

In the years following the original paper, several related attribution techniques appeared. [DeepLIFT](/wiki/deeplift) (Shrikumar, Greenside, and Kundaje, 2017) uses a similar backward-decomposition idea but compares each activation to a reference. [Integrated gradients](/wiki/integrated_gradients) (Sundararajan et al., 2017) integrate the gradient along a straight-line path from a baseline input to the actual input. [SHAP](/wiki/shap) (Lundberg and Lee, 2017) provides Shapley-value-based attributions and includes a DeepSHAP variant. [Grad-CAM](/wiki/grad_cam) (Selvaraju et al., 2017) localizes important regions in [convolutional neural network](/wiki/convolutional_neural_network) feature maps. LRP and these methods are often grouped together as [feature attribution](/wiki/feature_attribution) techniques.

## How does LRP work?

LRP starts from the network's output and assigns to it an initial relevance equal to the prediction itself, often denoted `R = f(x)` for input `x`. The relevance is then propagated backward through every layer using a redistribution rule that satisfies a conservation property: at each layer the total relevance is preserved. Concretely, the procedure is a standard forward pass followed by a specific backward pass, with a dedicated propagation rule defined for each type of layer.

For a fully connected layer with input activations `a_i` and weights `w_ij` connecting input neuron `i` to output neuron `j`, the simplest LRP rule (LRP-0) computes the relevance flowing back to neuron `i` as:

`R_i = sum over j of (a_i * w_ij / sum over i' of (a_i' * w_i'j)) * R_j`

Intuitively, the relevance of a downstream neuron `j` is shared back to upstream neurons `i` in proportion to how much each `i` contributed to activating `j`. After propagating back through every layer, the input-layer relevances form a heatmap that can be visualized over the original input.

The conservation property states that `sum over i of R_i = sum over j of R_j` at each layer, so the total relevance flowing into the network from the output equals the total relevance arriving at the input. This is sometimes called completeness: the per-feature attributions sum to the prediction itself.

LRP can be viewed as a deep Taylor decomposition of the network's output around a root point, and several LRP rules correspond to specific choices of root points and approximations [4]. The framework is more general than the [chain rule](/wiki/chain_rule) used in vanilla gradient backpropagation: instead of multiplying gradients, LRP propagates a quantity (relevance) through layer-specific redistribution rules.

## What are the LRP rules (epsilon, gamma, alpha-beta)?

A major source of LRP's flexibility, and a source of complexity, is that there are several propagation rules and the practitioner must select an appropriate combination for the network at hand. The 2019 overview by Montavon and colleagues recommends a composite strategy that applies different rules at different layers [4]. The epsilon rule adds a small positive term to the denominator for stability; the gamma rule favors positive contributions, with larger gamma making negative contributions disappear; and the alpha-beta rule splits positive and negative contributions and is subject to the conservation constraint alpha = beta + 1 [4].

| Rule | Formula sketch | Use case |
|------|----------------|----------|
| LRP-0 | Pure proportional `R_i = sum_j (a_i w_ij / z_j) R_j` | Theoretical baseline; can produce noisy attributions |
| LRP-epsilon | Adds small `eps` to denominator: `z_j + eps * sign(z_j)` | Stabilizes computation; absorbs weakly supported contributions |
| LRP-gamma | Boosts positive weights by factor `(1 + gamma)` | Lower convolutional layers where positive evidence dominates |
| LRP-alpha-beta | Splits positive and negative contributions; alpha for positive, beta = alpha minus 1 for negative | Emphasizes supportive evidence; LRP-alpha1-beta0 ignores negative |
| LRP-zB (z-box) | For input layers with bounded inputs, e.g., pixels in `[0, 255]` | Pixel input layer of image classifiers |
| LRP-w-squared | Redistributes by squared weights `w_ij^2` | Input layers without informative activations |
| LRP-z-plus | Uses only positive contributions | Equivalent to LRP-alpha1-beta0 |
| LRP-flat | Equal redistribution across inputs | Sanity check; uninformative contributions |

The composite recommendation for CNN image classifiers from the 2019 overview is roughly: LRP-0 in upper fully connected layers, LRP-epsilon in middle layers, LRP-gamma in lower convolutional layers, and LRP-zB at the pixel input [4]. This combination produces stable, semantically meaningful heatmaps for typical vision models.

## What properties does LRP have?

The most important property of LRP is **conservation**, which guarantees that the relevances at any layer sum to the model's output value. This makes the heatmap interpretable as a literal decomposition of the prediction rather than an arbitrary saliency surface.

LRP is **more sensitive than vanilla gradients** because it does not suffer from the saturation problem; even when a neuron's gradient is zero, its relevance can be nonzero if the activation actually contributed to the prediction. This makes LRP particularly suitable for networks with rectified linear units (ReLUs) and other piecewise-linear nonlinearities where gradients commonly vanish.

Unlike integrated gradients and DeepLIFT, **LRP requires no baseline or reference input**. The method only uses the trained network and the actual input, which simplifies application to domains where it is unclear what an appropriate baseline should be. LRP runs in a **single backward pass** with cost comparable to one gradient computation, plus a small constant overhead from the rule-specific arithmetic.

**Implementation invariance** is more nuanced. LRP-0 on linear layers is equivalent to gradient times input and inherits gradient-style invariances. Other rules, particularly LRP-alpha-beta and LRP-gamma, are not implementation-invariant and may give different results for two networks that compute the same function but are parameterized differently.

## How does LRP compare to other explainability methods?

| Method | Conservation | Baseline required | Handles saturation | Typical domain |
|--------|--------------|-------------------|--------------------|----------------|
| Vanilla gradient / saliency | No | No | No | Generic networks |
| Guided backpropagation | No | No | Partially | CNN images |
| Grad-CAM | Localized | No | Mostly | CNN classification |
| DeepLIFT | Yes (vs. reference) | Yes | Yes | Genomics, generic |
| Integrated gradients | Yes (completeness) | Yes | Yes | Generic, transformers |
| SHAP / DeepSHAP | Yes (Shapley axioms) | Yes | Yes | Tabular, generic |
| LRP | Yes (conservation) | No | Yes | CNN, time series, medical |

The key difference is what flows backward and what is guaranteed. Vanilla [saliency map](/wiki/saliency_map) and [guided backpropagation](/wiki/guided_backpropagation) propagate gradients and offer no conservation guarantee. [Grad-CAM](/wiki/grad_cam) produces a coarse, class-localized map from convolutional feature maps rather than a per-pixel decomposition. [DeepLIFT](/wiki/deeplift), [integrated gradients](/wiki/integrated_gradients), and [SHAP](/wiki/shap) all satisfy a completeness or axiomatic property similar in spirit to LRP's conservation, but they require a baseline or reference input, whereas LRP does not.

LRP is more popular in European and especially German interpretability research, partly because of its origin at [TU Berlin](/wiki/tu_berlin) and [Fraunhofer HHI](/wiki/fraunhofer_hhi). Integrated gradients and SHAP tend to dominate in American machine learning literature and in the documentation of large industry frameworks. Grad-CAM is by far the most common attribution method for CNN classifiers in computer vision because of its simplicity. Other related techniques include [SmoothGrad](/wiki/smoothgrad), which averages gradient maps over noise-perturbed inputs, and [attention rollout](/wiki/attention_rollout) for [transformer](/wiki/transformer) architectures. In typical comparisons, methods agree on which input regions are highly relevant but disagree on fine-grained per-pixel scores.

## What was the Clever Hans finding (Lapuschkin et al. 2019)?

One of the most influential applications of LRP is the 2019 paper by Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller titled "Unmasking Clever Hans predictors and assessing what machines really learn", published in *Nature Communications* [2].

The authors applied LRP to a Fisher Vector classifier and a deep convolutional network trained on the [PASCAL VOC](/wiki/pascal_voc) image classification benchmark. When they inspected the relevance heatmaps for the horse class, they found the Fisher Vector classifier was assigning very high relevance to small regions in the lower-left corners of many images. On closer inspection these regions contained copyright watermarks from a particular photo source overrepresented in the horse training images: roughly one fifth of the horse images in PASCAL VOC carried such a source tag [2]. The classifier had learned to detect the watermark instead of detecting horses. When images of horses without the watermark were presented, the model's accuracy dropped sharply, and when the watermark was added to images of cars, the model classified them as horses.

The authors named this phenomenon a "Clever Hans" effect after the early-twentieth-century horse Clever Hans, who appeared to perform arithmetic but was in fact responding to subtle cues from his handler. The paper showed that high test accuracy does not imply that a model has learned the intended concept and that interpretability tools can reveal these shortcuts even when standard metrics show no problem. It introduced spectral relevance analysis (SpRAy), a method that clusters LRP heatmaps to identify recurring strategies a model is using across many predictions; for the horse class it surfaced several distinct strategies, including detecting the horse and rider and, separately, detecting the source tag [2]. The Clever Hans finding became a landmark example in the explainable AI community and helped motivate subsequent research into shortcut learning, dataset auditing, and robustness.

## What is LRP used for?

In [medical imaging](/wiki/medical_imaging), LRP has been used for cancer detection in [histopathology](/wiki/histopathology) slides, MRI analysis, and dermatology. Frederick Klauschen, a co-author of the original LRP paper and a pathologist at [Charité](/wiki/charite), has been involved in clinical applications where LRP heatmaps help pathologists understand why a model identified particular regions as cancerous. In bioinformatics and [genomics](/wiki/genomics), LRP has been used to analyze gene expression models, protein-protein interaction predictors, and DNA sequence classifiers.

In audio classification, LRP heatmaps over time-frequency representations such as spectrograms have helped researchers understand which frequency components a model uses. In [NLP](/wiki/nlp), LRP has been applied to bag-of-words classifiers, recurrent networks, and convolutional text models. Work on transformer models including [BERT](/wiki/bert) has been done but is more challenging because of the multiple attention heads and skip connections; specialized rules and combinations with attention rollout have been proposed.

In [time series analysis](/wiki/time_series_analysis), LRP has been applied to financial models, EEG and ECG signal analysis, and industrial sensor monitoring. In [autonomous vehicles](/wiki/autonomous_vehicle) and driver assistance systems, LRP heatmaps have been used to visualize what regions of a camera image a perception model uses to make steering or braking decisions. In industrial automation, LRP has supported defect detection in manufacturing.

## Which libraries implement LRP?

[iNNvestigate](/wiki/innvestigate) is a TensorFlow and Keras library for attribution methods, introduced in 2019 by Maximilian Alber and colleagues, supporting many LRP rules along with other techniques such as integrated gradients, DeepLIFT, and saliency maps [5]. It became the most commonly used LRP package during the era when TensorFlow was dominant.

[Zennit](/wiki/zennit) is a PyTorch library focused on LRP and related methods, developed by Christopher J. Anders and colleagues. It supports flexible composite rules, custom attribution targets, and integrates with modern PyTorch model definitions [6]. As PyTorch overtook TensorFlow in research, Zennit became a primary tool for LRP work.

[Captum](/wiki/captum) is the official PyTorch interpretability library from Meta. It includes LRP, integrated gradients, SHAP, occlusion, and many other attribution methods [7]. Other tools include [Quantus](/wiki/quantus), a library by Hedstrom and colleagues for quantitative evaluation of attribution methods including LRP, providing metrics such as faithfulness, robustness, complexity, and randomization sensitivity [8].

## What are the limitations of LRP?

**Rule choice complexity** is a recurring issue. The flexibility of LRP, with multiple rules and parameters such as `gamma` and `alpha`, means practitioners must understand which combinations work for their architecture. Composite recommendations exist but require expertise. Choosing rules poorly can produce misleading heatmaps that look plausible but do not faithfully reflect the model's reasoning.

**Architecture compatibility** has been a persistent challenge. Skip connections, batch normalization, attention mechanisms, and gating structures all require careful handling. LRP for [transformer](/wiki/transformer) architectures, including BERT and the [vision transformer (ViT)](/wiki/vision_transformer_vit), is an active research area; standard LRP rules do not directly apply to attention layers and modifications are needed.

**Sanity check failures** were highlighted by Adebayo and colleagues in 2018, who tested several attribution methods including some LRP rules with experiments such as randomizing model parameters and labels. Some rules, particularly those resembling guided backpropagation, produced similar-looking heatmaps before and after randomization, suggesting they were partly insensitive to the trained model itself. These findings prompted the LRP community to refine rule choices and develop quantitative evaluation tools such as Quantus [8].

Finally, LRP is less popular than integrated gradients and SHAP in mainstream machine learning literature, especially in American venues and for transformer-based models. This is partly historical, partly because IG and SHAP are tightly integrated with widely used industry libraries, and partly because IG has cleaner axiomatic guarantees that map well to high-stakes applications.

## Recent developments

Research on LRP has continued to evolve, particularly in response to the rise of transformer architectures. Hila Chefer, Shir Gur, and Lior Wolf introduced "Transformer Interpretability Beyond Attention Visualization" in 2021, combining attention rollout with LRP-like propagation through the residual stream and layer normalizations. Their follow-up work "Generic Attention-model Explainability" in 2021 generalized the method to multimodal transformers and other attention-based architectures [9].

AttnLRP, by Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, and Wojciech Samek, was presented at ICML 2024 and provides an efficient and faithful LRP rule for attention layers. According to the authors, it is "the first to faithfully and holistically attribute not only the input but also the latent representations of transformer models" while keeping computational efficiency similar to a single backward pass, with evaluations on LLaMA 2, Mixtral 8x7B, Flan-T5, and vision transformers [10]. Concept relevance propagation (CRP) by Achtibat and colleagues extends the LRP framework to attribute relevance to internal concepts learned by the network rather than just input features. Quantitative evaluation has become a major theme; the Quantus library [8] and benchmark datasets have made it possible to compare attribution methods systematically rather than only through visual inspection.

## Who developed LRP (authors and institutional context)?

LRP emerged from a tightly connected research community in Berlin centered on [TU Berlin](/wiki/tu_berlin) and the [Fraunhofer Heinrich Hertz Institute (HHI)](/wiki/fraunhofer_hhi). The senior figure most associated with the program is [Klaus-Robert Müller](/wiki/klaus_robert_muller), a long-standing professor at TU Berlin and one of the leading machine learning researchers in Europe.

[Wojciech Samek](/wiki/wojciech_samek) leads the AI department at Fraunhofer HHI and has been a continuing co-author on LRP papers since 2015. [Grégoire Montavon](/wiki/gregoire_montavon), affiliated with TU Berlin, has driven much of the theoretical work on LRP including the deep Taylor decomposition framework. Sebastian Bach was the first author of the original 2015 paper, Alexander Binder contributed substantially to early LRP development and applications, and Sebastian Lapuschkin led the Clever Hans work and has continued to develop LRP for new architectures. Frederick Klauschen, a co-author of the original paper, is a pathologist at [Charité](/wiki/charite) and represents the medical side of the collaboration.

## ELI5: Layer-wise Relevance Propagation

Imagine a neural network looks at a photo and says "this is a cat". You want to know which pixels made it say "cat". LRP works backward from the answer: it takes the network's confidence in "cat" and hands that confidence back down through the network, one layer at a time, always splitting it among the parts that helped the most and never adding or losing any along the way. By the time it reaches the photo, every pixel has a share, and the bright spots in the resulting picture show where the network actually looked. The neat trick is the bookkeeping rule that the shares always add back up to the original answer, so the heatmap is a faithful breakdown rather than a guess.

## See also

- [Attribution methods](/wiki/attribution_methods)
- [Feature attribution](/wiki/feature_attribution)
- [Saliency map](/wiki/saliency_map)
- [Integrated gradients](/wiki/integrated_gradients)
- [DeepLIFT](/wiki/deeplift)
- [SHAP](/wiki/shap)
- [Grad-CAM](/wiki/grad_cam)
- [Guided backpropagation](/wiki/guided_backpropagation)
- [SmoothGrad](/wiki/smoothgrad)
- [Attention rollout](/wiki/attention_rollout)
- [Explainable AI](/wiki/explainable_ai)
- [Interpretability](/wiki/interpretability)
- [Clever Hans](/wiki/clever_hans)

## References

1. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. (2015). On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. *PLOS ONE*, 10(7), e0130140. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140
2. Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., and Müller, K.-R. (2019). Unmasking Clever Hans predictors and assessing what machines really learn. *Nature Communications*, 10, 1096. https://www.nature.com/articles/s41467-019-08987-4
3. Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. https://arxiv.org/abs/1312.6034
4. Montavon, G., Binder, A., Lapuschkin, S., Samek, W., and Müller, K.-R. (2019). Layer-Wise Relevance Propagation: An Overview. In *Explainable AI*. Springer LNCS 11700. https://link.springer.com/chapter/10.1007/978-3-030-28954-6_10
5. Alber, M. et al. (2019). iNNvestigate Neural Networks! *JMLR* 20(93). https://github.com/albermax/innvestigate
6. Anders, C. J., Neumann, D., Samek, W., Müller, K.-R., and Lapuschkin, S. (2021). Software for Dataset-wide XAI: From Local Explanations to Global Insights with Zennit. https://github.com/chr5tphr/zennit
7. Kokhlikyan, N. et al. (2020). Captum: A unified and generic model interpretability library for PyTorch. https://captum.ai/
8. Hedström, A. et al. (2023). Quantus: An Explainable AI Toolkit for Responsible Evaluation. *JMLR* 24(34). https://github.com/understandable-machine-intelligence-lab/Quantus
9. Chefer, H., Gur, S., and Wolf, L. (2021). Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers. *ICCV 2021*. https://arxiv.org/abs/2103.15679
10. Achtibat, R., Hatefi, S. M. V., Dreyer, M., Jain, A., Wiegand, T., Lapuschkin, S., and Samek, W. (2024). AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers. *ICML 2024*. https://arxiv.org/abs/2402.05602
11. Samek, W., Wiegand, T., and Müller, K.-R. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. https://arxiv.org/abs/1708.08296
12. Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning Important Features Through Propagating Activation Differences. *ICML 2017*. https://arxiv.org/abs/1704.02685
13. Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic Attribution for Deep Networks. *ICML 2017*. https://arxiv.org/abs/1703.01365
14. Adebayo, J. et al. (2018). Sanity Checks for Saliency Maps. *NeurIPS 2018*. https://arxiv.org/abs/1810.03292
15. Springenberg, J. T. et al. (2015). Striving for Simplicity: The All Convolutional Net. https://arxiv.org/abs/1412.6806

