Layer-wise Relevance Propagation (LRP) is a method for explaining the predictions of deep neural networks by decomposing the output back into per-feature relevance scores. It was introduced in 2015 by Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek in a PLOS ONE paper that has become one of the most widely cited works in explainable AI [1]. LRP belongs to the broader family of attribution methods. It produces explanations by propagating the model's output relevance backward through the network, layer by layer, redistributing it to each neuron in proportion to its contribution to the activation, while preserving the total amount of relevance at every step. This conservation property distinguishes LRP from gradient-based explanations and makes it one of the foundational techniques in modern interpretability research.
LRP gained particular visibility after Lapuschkin and colleagues used it in 2019 to expose a Clever Hans effect in image classifiers trained on PASCAL VOC, where a model appeared to classify horses correctly but was actually relying on copyright watermarks present in many of the training images [2]. The technique is implemented in several open-source libraries, including iNNvestigate for TensorFlow and Keras, Zennit for PyTorch, and Captum, and it has been applied across medical imaging, genomics, NLP, time series analysis, and other domains.
The problem of explaining individual predictions of deep neural networks became pressing as image classifiers grew in size and complexity through the early 2010s. Early approaches relied on gradients of the output with respect to the input. The saliency map method of Simonyan, Vedaldi, and Zisserman (2014) computed gradient magnitude at each input pixel and visualized it as a heatmap [3]. While intuitive, gradient-based saliency suffers from saturation: when a neuron is firing strongly, its gradient may be near zero even if the feature it represents was crucial for the prediction. Gradients also tend to produce noisy attribution patterns.
Deconvolutional networks (Zeiler and Fergus, 2014) and guided backpropagation (Springenberg et al., 2014) extended the idea of running the network in reverse and produced visually cleaner explanations by zeroing out negative signals during the backward pass. However, these methods do not satisfy a clear conservation principle and can produce explanations that look plausible regardless of the model's actual decision logic.
LRP, introduced in 2015, took a different approach by formulating attribution as a redistribution problem. The method starts with the predicted output value treated as the total relevance, then propagates that relevance backward through the network using rules that ensure the sum of relevances at every layer equals the original output.
In the years following the original paper, several related attribution techniques appeared. DeepLIFT (Shrikumar, Greenside, and Kundaje, 2017) uses a similar backward-decomposition idea but compares each activation to a reference. Integrated gradients (Sundararajan et al., 2017) integrate the gradient along a straight-line path from a baseline input to the actual input. SHAP (Lundberg and Lee, 2017) provides Shapley-value-based attributions and includes a DeepSHAP variant. Grad-CAM (Selvaraju et al., 2017) localizes important regions in convolutional neural network feature maps. LRP and these methods are often grouped together as feature attribution techniques.
LRP starts from the network's output and assigns to it an initial relevance equal to the prediction itself, often denoted R = f(x) for input x. The relevance is then propagated backward through every layer using a redistribution rule that satisfies a conservation property: at each layer the total relevance is preserved.
For a fully connected layer with input activations a_i and weights w_ij connecting input neuron i to output neuron j, the simplest LRP rule (LRP-0) computes the relevance flowing back to neuron i as:
R_i = sum over j of (a_i * w_ij / sum over i' of (a_i' * w_i'j)) * R_j
Intuitively, the relevance of a downstream neuron j is shared back to upstream neurons i in proportion to how much each i contributed to activating j. After propagating back through every layer, the input-layer relevances form a heatmap that can be visualized over the original input.
The conservation property states that sum over i of R_i = sum over j of R_j at each layer, so the total relevance flowing into the network from the output equals the total relevance arriving at the input. This is sometimes called completeness: the per-feature attributions sum to the prediction itself.
LRP can be viewed as a deep Taylor decomposition of the network's output around a root point, and several LRP rules correspond to specific choices of root points and approximations [4]. The framework is more general than the chain rule used in vanilla gradient backpropagation: instead of multiplying gradients, LRP propagates a quantity (relevance) through layer-specific redistribution rules.
A major source of LRP's flexibility, and a source of complexity, is that there are several propagation rules and the practitioner must select an appropriate combination for the network at hand. The 2019 overview by Montavon and colleagues recommends a composite strategy that applies different rules at different layers [4].
| Rule | Formula sketch | Use case |
|---|---|---|
| LRP-0 | Pure proportional R_i = sum_j (a_i w_ij / z_j) R_j | Theoretical baseline; can produce noisy attributions |
| LRP-epsilon | Adds small eps to denominator: z_j + eps * sign(z_j) | Stabilizes computation; absorbs weakly supported contributions |
| LRP-gamma | Boosts positive weights by factor (1 + gamma) | Lower convolutional layers where positive evidence dominates |
| LRP-alpha-beta | Splits positive and negative contributions; alpha for positive, beta = alpha minus 1 for negative | Emphasizes supportive evidence; LRP-alpha1-beta0 ignores negative |
| LRP-zB (z-box) | For input layers with bounded inputs, e.g., pixels in [0, 255] | Pixel input layer of image classifiers |
| LRP-w-squared | Redistributes by squared weights w_ij^2 | Input layers without informative activations |
| LRP-z-plus | Uses only positive contributions | Equivalent to LRP-alpha1-beta0 |
| LRP-flat | Equal redistribution across inputs | Sanity check; uninformative contributions |
The composite recommendation for CNN image classifiers from the 2019 overview is roughly: LRP-0 in upper fully connected layers, LRP-epsilon in middle layers, LRP-gamma in lower convolutional layers, and LRP-zB at the pixel input [4]. This combination produces stable, semantically meaningful heatmaps for typical vision models.
The most important property of LRP is conservation, which guarantees that the relevances at any layer sum to the model's output value. This makes the heatmap interpretable as a literal decomposition of the prediction rather than an arbitrary saliency surface.
LRP is more sensitive than vanilla gradients because it does not suffer from the saturation problem; even when a neuron's gradient is zero, its relevance can be nonzero if the activation actually contributed to the prediction. This makes LRP particularly suitable for networks with rectified linear units (ReLUs) and other piecewise-linear nonlinearities where gradients commonly vanish.
Unlike integrated gradients and DeepLIFT, LRP requires no baseline or reference input. The method only uses the trained network and the actual input, which simplifies application to domains where it is unclear what an appropriate baseline should be. LRP runs in a single backward pass with cost comparable to one gradient computation, plus a small constant overhead from the rule-specific arithmetic.
Implementation invariance is more nuanced. LRP-0 on linear layers is equivalent to gradient times input and inherits gradient-style invariances. Other rules, particularly LRP-alpha-beta and LRP-gamma, are not implementation-invariant and may give different results for two networks that compute the same function but are parameterized differently.
| Method | Conservation | Baseline required | Handles saturation | Typical domain |
|---|---|---|---|---|
| Vanilla gradient / saliency | No | No | No | Generic networks |
| Guided backpropagation | No | No | Partially | CNN images |
| Grad-CAM | Localized | No | Mostly | CNN classification |
| DeepLIFT | Yes (vs. reference) | Yes | Yes | Genomics, generic |
| Integrated gradients | Yes (completeness) | Yes | Yes | Generic, transformers |
| SHAP / DeepSHAP | Yes (Shapley axioms) | Yes | Yes | Tabular, generic |
| LRP | Yes (conservation) | No | Yes | CNN, time series, medical |
LRP is more popular in European and especially German interpretability research, partly because of its origin at TU Berlin and Fraunhofer HHI. Integrated gradients and SHAP tend to dominate in American machine learning literature and in the documentation of large industry frameworks. Grad-CAM is by far the most common attribution method for CNN classifiers in computer vision because of its simplicity. Other related techniques include SmoothGrad, which averages gradient maps over noise-perturbed inputs, and attention rollout for transformer architectures. In typical comparisons, methods agree on which input regions are highly relevant but disagree on fine-grained per-pixel scores.
One of the most influential applications of LRP is the 2019 paper by Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller titled "Unmasking Clever Hans predictors and assessing what machines really learn", published in Nature Communications [2].
The authors applied LRP to a Fisher Vector classifier and a deep convolutional network trained on the PASCAL VOC image classification benchmark. When they inspected the relevance heatmaps for the horse class, they found the Fisher Vector classifier was assigning very high relevance to small regions in the lower-left corners of many images. On closer inspection these regions contained copyright watermarks from a particular photo source overrepresented in the horse training images. The classifier had learned to detect the watermark instead of detecting horses. When images of horses without the watermark were presented, the model's accuracy dropped sharply, and when the watermark was added to images of cars, the model classified them as horses.
The authors named this phenomenon a "Clever Hans" effect after the early-twentieth-century horse Clever Hans, who appeared to perform arithmetic but was in fact responding to subtle cues from his handler. The paper showed that high test accuracy does not imply that a model has learned the intended concept and that interpretability tools can reveal these shortcuts even when standard metrics show no problem. It introduced spectral relevance analysis (SpRAy), a method that clusters LRP heatmaps to identify recurring strategies a model is using across many predictions. The Clever Hans finding became a landmark example in the explainable AI community and helped motivate subsequent research into shortcut learning, dataset auditing, and robustness.
In medical imaging, LRP has been used for cancer detection in histopathology slides, MRI analysis, and dermatology. Frederick Klauschen, a co-author of the original LRP paper and a pathologist at Charité, has been involved in clinical applications where LRP heatmaps help pathologists understand why a model identified particular regions as cancerous. In bioinformatics and genomics, LRP has been used to analyze gene expression models, protein-protein interaction predictors, and DNA sequence classifiers.
In audio classification, LRP heatmaps over time-frequency representations such as spectrograms have helped researchers understand which frequency components a model uses. In NLP, LRP has been applied to bag-of-words classifiers, recurrent networks, and convolutional text models. Work on transformer models including BERT has been done but is more challenging because of the multiple attention heads and skip connections; specialized rules and combinations with attention rollout have been proposed.
In time series analysis, LRP has been applied to financial models, EEG and ECG signal analysis, and industrial sensor monitoring. In autonomous vehicles and driver assistance systems, LRP heatmaps have been used to visualize what regions of a camera image a perception model uses to make steering or braking decisions. In industrial automation, LRP has supported defect detection in manufacturing.
iNNvestigate is a TensorFlow and Keras library for attribution methods, introduced in 2019 by Maximilian Alber and colleagues, supporting many LRP rules along with other techniques such as integrated gradients, DeepLIFT, and saliency maps [5]. It became the most commonly used LRP package during the era when TensorFlow was dominant.
Zennit is a PyTorch library focused on LRP and related methods, developed by Christopher J. Anders and colleagues. It supports flexible composite rules, custom attribution targets, and integrates with modern PyTorch model definitions [6]. As PyTorch overtook TensorFlow in research, Zennit became a primary tool for LRP work.
Captum is the official PyTorch interpretability library from Meta. It includes LRP, integrated gradients, SHAP, occlusion, and many other attribution methods [7]. Other tools include Quantus, a library by Hedstrom and colleagues for quantitative evaluation of attribution methods including LRP, providing metrics such as faithfulness, robustness, complexity, and randomization sensitivity [8].
Rule choice complexity is a recurring issue. The flexibility of LRP, with multiple rules and parameters such as gamma and alpha, means practitioners must understand which combinations work for their architecture. Composite recommendations exist but require expertise. Choosing rules poorly can produce misleading heatmaps that look plausible but do not faithfully reflect the model's reasoning.
Architecture compatibility has been a persistent challenge. Skip connections, batch normalization, attention mechanisms, and gating structures all require careful handling. LRP for transformer architectures, including BERT and the vision transformer (ViT), is an active research area; standard LRP rules do not directly apply to attention layers and modifications are needed.
Sanity check failures were highlighted by Adebayo and colleagues in 2018, who tested several attribution methods including some LRP rules with experiments such as randomizing model parameters and labels. Some rules, particularly those resembling guided backpropagation, produced similar-looking heatmaps before and after randomization, suggesting they were partly insensitive to the trained model itself. These findings prompted the LRP community to refine rule choices and develop quantitative evaluation tools such as Quantus [8].
Finally, LRP is less popular than integrated gradients and SHAP in mainstream machine learning literature, especially in American venues and for transformer-based models. This is partly historical, partly because IG and SHAP are tightly integrated with widely used industry libraries, and partly because IG has cleaner axiomatic guarantees that map well to high-stakes applications.
Research on LRP has continued to evolve, particularly in response to the rise of transformer architectures. Hila Chefer, Shir Gur, and Lior Wolf introduced "Transformer Interpretability Beyond Attention Visualization" in 2021, combining attention rollout with LRP-like propagation through the residual stream and layer normalizations. Their follow-up work "Generic Attention-model Explainability" in 2021 generalized the method to multimodal transformers and other attention-based architectures [9].
AttnLRP by Reduan Achtibat and colleagues, presented at ICML 2024, provides an efficient and faithful LRP rule for attention layers and brings LRP performance on transformer models closer to other attribution baselines [10]. Concept relevance propagation (CRP) by Achtibat and colleagues extends the LRP framework to attribute relevance to internal concepts learned by the network rather than just input features. Quantitative evaluation has become a major theme; the Quantus library [8] and benchmark datasets have made it possible to compare attribution methods systematically rather than only through visual inspection.
LRP emerged from a tightly connected research community in Berlin centered on TU Berlin and the Fraunhofer Heinrich Hertz Institute (HHI). The senior figure most associated with the program is Klaus-Robert Müller, a long-standing professor at TU Berlin and one of the leading machine learning researchers in Europe.
Wojciech Samek leads the AI department at Fraunhofer HHI and has been a continuing co-author on LRP papers since 2015. Grégoire Montavon, affiliated with TU Berlin, has driven much of the theoretical work on LRP including the deep Taylor decomposition framework. Sebastian Bach was the first author of the original 2015 paper, Alexander Binder contributed substantially to early LRP development and applications, and Sebastian Lapuschkin led the Clever Hans work and has continued to develop LRP for new architectures. Frederick Klauschen, a co-author of the original paper, is a pathologist at Charité and represents the medical side of the collaboration.