See also: explainable AI, feature importance, LIME, permutation feature importance
SHAP (SHapley Additive exPlanations) is a unified framework for interpreting individual predictions of machine learning models. Developed by Scott Lundberg and Su-In Lee at the University of Washington and first published at NeurIPS 2017, SHAP assigns each input feature a numerical value representing its contribution to a specific prediction. These values are computed using Shapley values, a concept from cooperative game theory introduced by Lloyd Shapley in 1953. SHAP provides a unified approach to interpreting model predictions by connecting several existing explanation methods (including LIME, DeepLIFT, layerwise relevance propagation, and classic Shapley regression values) under a single theoretical framework.
The central question SHAP answers is: for a given prediction, how much did each feature push the prediction away from the average (baseline) prediction? By grounding feature attributions in Shapley values, SHAP is the only additive feature attribution method that simultaneously satisfies three desirable properties: local accuracy, missingness, and consistency. This theoretical guarantee, combined with fast model-specific algorithms (most importantly TreeSHAP for tree ensembles), has made SHAP the de facto standard tool for tabular machine learning interpretability and one of the most widely cited methods in explainable AI.
The open-source Python library shap, which has accumulated more than 25,000 stars on GitHub, provides implementations of several SHAP estimation algorithms (KernelSHAP, TreeSHAP, DeepSHAP, LinearSHAP, PermutationSHAP, and PartitionSHAP) along with a rich set of visualization tools. As of 2026 the package supports Python 3.11 and later and is maintained by a community of contributors after originally being authored by Lundberg.
Imagine you and your friends bake a cake together. One friend brought the flour, another brought eggs, and another brought the sugar. Now you want to figure out how much each friend's ingredient helped make the cake taste good. You could try baking the cake without flour to see how much worse it gets, then try without eggs, and so on. But that is not quite fair, because some ingredients work better together. So instead, you try every possible combination of ingredients and figure out, on average, how much each one helped. That average contribution is what SHAP calculates for each feature in a machine learning prediction.
Lloyd Shapley introduced the Shapley value in 1953 as part of his doctoral dissertation at Princeton University, titled "Additive and Non-additive Set Functions." The work appeared in Contributions to the Theory of Games, Volume II. Shapley's goal was to answer a fundamental question in cooperative game theory: when a group of players work together and generate some total payoff, how should that payoff be fairly divided?
Shapley proved that there is exactly one division scheme satisfying a small set of fairness axioms. For this and related contributions (most notably the Gale-Shapley deferred acceptance algorithm for stable matching), Shapley shared the 2012 Nobel Memorial Prize in Economic Sciences with Alvin Roth.
The idea of using Shapley values to attribute model outputs to inputs predates SHAP. Strumbelj and Kononenko (2010, 2014) used sampling approximations to explain individual predictions. Lipovetsky and Conklin (2001) used Shapley values to decompose R-squared in linear regression. Owen (2014) connected Shapley values to the Sobol' indices from global sensitivity analysis. None of these earlier methods unified the various existing attribution techniques or provided fast algorithms for modern tree ensembles and deep networks.
The SHAP framework was introduced by Scott Lundberg and Su-In Lee in a 2017 NeurIPS paper titled "A Unified Approach to Interpreting Model Predictions" (arXiv:1705.07874). The paper introduced three key ideas: a class of additive feature attribution methods that includes LIME, DeepLIFT, classic Shapley regression values, and others; a uniqueness theorem proving that within this class only Shapley values satisfy the three properties of local accuracy, missingness, and consistency; and KernelSHAP, a model-agnostic algorithm for estimating Shapley values via weighted linear regression.
A second key paper, "Consistent Individualized Feature Attribution for Tree Ensembles" by Lundberg, Erion, and Lee (arXiv:1802.03888, 2018), introduced TreeSHAP, an exact polynomial-time algorithm for tree ensembles. The 2020 follow-up "From Local Explanations to Global Understanding with Explainable AI for Trees," published in Nature Machine Intelligence with co-authors including Hugh Chen, Alex DeGrave, Jordan Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, and Nisha Bansal, extended TreeSHAP with interaction values and global aggregation methods and applied SHAP to clinical risk prediction in anesthesia hypoxemia.
The shap Python package was released in 2017 alongside the NeurIPS paper and grew rapidly. By 2025 the original NeurIPS paper had accumulated more than 30,000 citations on Google Scholar, ranking among the most cited machine learning papers of the late 2010s. Lundberg moved from Microsoft Research to industry roles while continuing to maintain the library. Active maintenance later transitioned to the broader open-source community as Lundberg shifted focus.
In a cooperative game with a set N of n players and a value function v that maps each coalition (subset) S of players to a real-valued payoff v(S), the Shapley value for player i is:
phi_i(v) = sum over all S subset of N{i}: [ |S|! * (n - |S| - 1)! / n! ] * [ v(S union {i}) - v(S) ]
The term v(S union {i}) - v(S) is the marginal contribution of player i to coalition S. The weighting factor |S|! * (n - |S| - 1)! / n! corresponds to the probability that coalition S forms before player i arrives in a uniformly random permutation of all players. In other words, the Shapley value is the average marginal contribution of a player across all possible orderings in which the coalition could have been assembled.
A more compact way of writing the same quantity uses permutations directly. Let pi be a uniformly random permutation of the players, and let Pre(pi, i) denote the set of players appearing before i in pi. Then phi_i = E[v(Pre(pi, i) union {i}) - v(Pre(pi, i))], where the expectation is over uniformly random permutations.
Shapley proved that the Shapley value is the unique solution satisfying four axioms:
| Axiom | Description |
|---|---|
| Efficiency | The contributions of all players sum to the total value of the grand coalition: sum of phi_i = v(N) - v(empty set) |
| Symmetry | If two players contribute equally to every coalition (v(S union {i}) = v(S union {j}) for all S not containing i or j), they receive equal Shapley values |
| Dummy (Null player) | A player who adds zero marginal contribution to every coalition (v(S union {i}) = v(S) for all S) receives a Shapley value of zero |
| Additivity (Linearity) | For two games v and w defined on the same player set, the Shapley value of the combined game v + w equals the sum of the individual Shapley values: phi_i(v + w) = phi_i(v) + phi_i(w) |
The uniqueness result is striking. Once you accept these four axioms as desirable, the explicit weighted-sum formula is the only possible answer.
In the context of machine learning, the players are input features and the game is a single prediction. The value function v(S) is typically defined as the expected model output when only the features in coalition S are known and the remaining features are marginalized out. The Shapley value for each feature then quantifies its average marginal contribution to the prediction, providing a principled way to distribute credit for the prediction among the input features.
This maps cleanly onto common questions a practitioner asks about a single prediction. "How much did the patient's age push their predicted readmission risk up or down?" becomes "What is the Shapley value of the age feature for this prediction?"
Lundberg and Lee (2017) defined a class of explanation models called additive feature attribution methods. An explanation model g for an original model f is a linear function of binary variables:
g(z') = phi_0 + sum from j=1 to M of phi_j * z_j'
where z' is a vector in {0, 1}^M (a coalition vector indicating which features are present or absent), M is the number of features, and phi_j is the attribution for feature j. The term phi_0 is the base value, typically equal to the expected model output E[f(X)].
When all features are present (all z_j' = 1), the explanation model produces:
g(x) = phi_0 + sum from j=1 to M of phi_j
which, under the local accuracy property, equals the model's actual prediction f(x).
This class of explanation methods is broader than it first appears. Lundberg and Lee showed that LIME, DeepLIFT, layerwise relevance propagation (LRP), and classic Shapley regression values are all special cases of additive feature attribution methods. The differences lie in how each method computes the phi values.
Lundberg and Lee proved that within the class of additive feature attribution methods, there is a unique set of values phi_0, phi_1, ..., phi_M satisfying three properties simultaneously.
1. Local accuracy. The sum of feature attributions plus the base value equals the model prediction for the instance being explained:
f(x) = phi_0 + sum from j=1 to M of phi_j
This corresponds to the efficiency axiom from Shapley value theory. It ensures that the explanation fully accounts for the model's output rather than leaving an unexplained residual.
2. Missingness. If a feature is not present in the simplified input (z_j' = 0), its attribution must be zero:
z_j' = 0 implies phi_j = 0
This is a consistency constraint ensuring that features not available for a prediction are not assigned any credit.
3. Consistency. If a model changes so that a feature's marginal contribution increases or stays the same for every possible coalition, then that feature's SHAP value must not decrease. Formally, for two models f and f', if for all coalitions S not containing feature j:
f'(S union {j}) - f'(S) >= f(S union {j}) - f(S)
then phi_j(f') >= phi_j(f).
Consistency ensures that a feature that becomes more influential in a new model is never assigned lower importance. Lundberg and Lee showed that several popular explanation methods (including standard Shapley regression values, LIME, and DeepLIFT) are additive feature attribution methods but do not all satisfy these three properties simultaneously. The Shapley value is the only solution that does.
The SHAP values for a model f and instance x are defined as the Shapley values of a conditional expectation game. The value function for a coalition S of features is:
v(S) = E[f(X) | X_S = x_S]
where X_S denotes the subset of features in S, fixed to their values in x, and the expectation is taken over the remaining features according to their conditional distribution. In practice many implementations use the marginal distribution instead of the conditional, treating absent features as drawn independently from a background dataset. The choice between marginal and conditional value functions is one of the most consequential modeling decisions in SHAP, with significant downstream effects on which features receive nonzero attribution. See the discussion of conditional vs. marginal SHAP later in this article.
The SHAP framework includes several estimation algorithms tailored to different model classes. Each makes different trade-offs between speed, generality, and exactness.
| Algorithm | Target model | Approach | Exact or approximate |
|---|---|---|---|
| KernelSHAP | Any (model-agnostic) | Weighted linear regression with SHAP kernel | Approximate |
| TreeSHAP | Tree ensembles (random forest, XGBoost, LightGBM, CatBoost) | Recursive tree traversal | Exact |
| DeepSHAP | Neural networks | Modified DeepLIFT backpropagation | Approximate |
| GradientSHAP | Differentiable models | Expected gradients integration | Approximate |
| LinearSHAP | Linear models | Closed-form analytical | Exact |
| PermutationSHAP | Any (model-agnostic) | Antithetic permutation sampling | Approximate (efficiency exact) |
| PartitionSHAP | Any (model-agnostic) | Hierarchical Owen values over feature partition | Approximate |
| Sampling SHAP | Any (model-agnostic) | Monte Carlo over permutations | Approximate |
KernelSHAP is the original model-agnostic algorithm proposed by Lundberg and Lee for estimating SHAP values. It works with any model that can produce predictions but is computationally more expensive than model-specific alternatives.
The algorithm operates in five steps:
Sample coalitions. Generate K random coalition vectors z_k' from {0, 1}^M, where each vector indicates which features are included (1) or excluded (0).
Map to feature space. For each coalition z_k', construct a corresponding input in the original feature space using a mapping function h_x. Features marked as present (1) take their values from the instance x; features marked as absent (0) are replaced with values from randomly sampled data points (approximating the marginal distribution).
Get predictions. Evaluate the model on each mapped input to obtain f(h_x(z_k')).
Weight coalitions. Assign each coalition a weight using the SHAP kernel:
pi_x(z') = (M - 1) / [ C(M, |z'|) * |z'| * (M - |z'|) ]
where C(M, |z'|) is the binomial coefficient and |z'| is the number of features present. This kernel assigns the highest weights to coalitions that are very small or very large, because those provide the most information about individual feature contributions. Coalitions of size 0 and size M are explicitly enforced through constraints rather than sampled.
minimize over phi: sum over z' in Z: [ f(h_x(z')) - g(z') ]^2 * pi_x(z')
The resulting regression coefficients are the estimated SHAP values.
Exact computation of Shapley values requires evaluating 2^M coalitions, which is exponential in the number of features. KernelSHAP approximates the values by sampling a subset of coalitions and fitting the weighted regression, making it tractable for models with many features. However, it remains slow for large-scale applications because each coalition requires a model evaluation and many samples are needed for stable estimates. Variance in the estimates means that running KernelSHAP twice on the same instance can produce slightly different SHAP values.
Lundberg and Lee showed that LIME (Local Interpretable Model-agnostic Explanations) by Ribeiro, Singh, and Guestrin (2016) is itself an additive feature attribution method. LIME fits a local linear model around the instance being explained using perturbed samples, but it uses a different kernel function and does not guarantee the Shapley properties. KernelSHAP can be understood as a specific configuration of LIME that uses the SHAP kernel and a particular regularization setup, which yields Shapley values as the unique optimal solution. In effect, SHAP is what you get when LIME is forced to satisfy the Shapley axioms.
TreeSHAP is a fast, exact algorithm for computing SHAP values for tree-based models, including decision trees, random forests, gradient boosting machines (XGBoost, LightGBM, CatBoost), and other tree ensemble methods. It was introduced by Lundberg, Erion, and Lee in 2018 and refined in the 2020 Nature Machine Intelligence paper.
TreeSHAP exploits the structure of decision trees to compute SHAP values in polynomial time rather than exponential time. Instead of enumerating all 2^M feature subsets, the algorithm recursively traverses the tree and tracks which feature coalitions lead to which leaf predictions. At each internal node, the algorithm maintains a running account of how many coalitions include or exclude the splitting feature, allowing it to accumulate Shapley values as it walks through the tree.
For an ensemble of trees, the SHAP values are computed independently for each tree and then combined (averaged for random forests, summed for gradient boosting).
TreeSHAP reduces the complexity from O(T * L * 2^M) for exact KernelSHAP applied to a tree ensemble to O(T * L * D^2), where T is the number of trees, L is the maximum number of leaves, and D is the maximum tree depth. Since D is typically much smaller than M (the number of features), this represents a dramatic speedup. For a 1000-tree gradient boosting model with 100 features and depth 6, TreeSHAP can typically compute exact SHAP values for tens of thousands of instances in seconds, while exact KernelSHAP on the same model would be infeasible.
TreeSHAP has two variants that differ in how they handle absent features:
| Variant | Value function | Feature independence assumption | Axiom compliance |
|---|---|---|---|
| Interventional | E[f(X) | do(X_S = x_S)] using a background dataset | Treats absent features as independent draws from the background | Satisfies all Shapley axioms including dummy |
| Path-dependent (conditional) | E[f(X) | X_S = x_S] estimated from tree structure | Uses path frequencies in the tree as a proxy for the conditional distribution | May violate the dummy axiom for correlated features |
The interventional variant computes standard Shapley values by treating absent features as independent of present features. It requires a background dataset and is slower but axiomatically clean. The path-dependent variant is faster (no background dataset required) and follows the tree's branching structure to estimate conditional expectations, but can assign nonzero SHAP values to features the model does not use when those features are correlated with features the model does use.
The SHAP library exposes both via the feature_perturbation argument of TreeExplainer, with values "interventional" and "tree_path_dependent". The choice has been a source of confusion and debate, particularly after Janzing, Minorics, and Bloebaum (2020) argued that interventional Shapley values are the only causally correct version (see Critiques below).
TreeSHAP also supports the computation of SHAP interaction values, which decompose each prediction into main effects and pairwise interaction effects. The interaction value for features i and j is:
phi_ij = sum over S not containing i or j: [ |S|! * (M - |S| - 2)! / (2 * (M - 1)!) ] * delta_ij(S)
where delta_ij(S) = f(S union {i, j}) - f(S union {i}) - f(S union {j}) + f(S). The main effects phi_ii and the interaction effects phi_ij sum to the overall SHAP value for feature i: phi_i = phi_ii + sum over j != i of phi_ij. Interaction values let analysts ask not only "how much did age contribute" but also "how much of age's contribution depended on income."
DeepSHAP is an algorithm for estimating SHAP values for deep learning models, including neural networks built with TensorFlow, Keras, and PyTorch. It builds on a connection between SHAP and DeepLIFT, an earlier attribution method by Shrikumar, Greenside, and Kundaje (2017).
DeepLIFT assigns attribution scores by comparing each neuron's activation to a reference activation and propagating contribution scores backward through the network using custom rules. Lundberg and Lee showed that the per-neuron attribution rules in DeepLIFT can be chosen to approximate Shapley values.
DeepSHAP extends DeepLIFT in two ways. First, instead of a single reference input, DeepSHAP uses a distribution of background samples, averaging the DeepLIFT attributions over multiple reference points. This better approximates the expectation over absent features. Second, DeepSHAP uses Shapley equations to linearize nonlinear operations such as max, softmax, and element-wise products, improving the quality of the approximation for complex architectures.
The modified backpropagation rules define multipliers at each layer that relate input differences to output differences, analogous to how gradients flow backward through the network. These multipliers follow a chain rule through the layers, enabling efficient attribution computation in a single backward pass.
DeepSHAP computes approximate, not exact, SHAP values. The approximation quality depends on the network architecture and the choice of background samples. For architectures with complex interactions between layers, the layer-wise decomposition may not perfectly correspond to true Shapley values. KernelSHAP can be used as a model-agnostic alternative for deep learning models when higher accuracy is needed, though at much greater computational cost.
GradientSHAP estimates SHAP values by integrating gradients along a path from a baseline input to the actual input, similar in spirit to Integrated Gradients (Sundararajan, Taly, and Yan, 2017). Where Integrated Gradients uses a single baseline (often the zero vector), GradientSHAP averages over many baselines drawn from a background distribution, which converts the gradient-based attribution into an estimator of expected SHAP values.
GradientSHAP is faster than KernelSHAP for differentiable models and avoids the path-dependent quirks of DeepSHAP. It is sometimes called "expected gradients" and is implemented in the shap.GradientExplainer class.
For a linear model f(x) = beta_0 + sum of beta_j * x_j, the SHAP value for feature j is phi_j = beta_j * (x_j - E[X_j]). No iterative estimation is needed. LinearSHAP is the trivial case of SHAP that recovers a familiar quantity from regression analysis: the contribution of each feature to a prediction relative to its mean. When features are correlated, shap.LinearExplainer offers a conditional variant via feature_perturbation="correlation_dependent" that accounts for the covariance structure.
PermutationSHAP generates random feature orderings, builds coalitions progressively, and estimates marginal contributions by comparing predictions with and without each feature. The method uses antithetic sampling, processing each permutation both forward and backward, to reduce variance. It has the advantage that the SHAP values always sum exactly to f(x) - E[f(X)] regardless of the number of samples, satisfying the efficiency property exactly even with finite samples.
PermutationSHAP is the default model-agnostic explainer in newer versions of the shap library, replacing KernelSHAP for many use cases. It tends to be more sample-efficient and avoids some of the numerical conditioning issues that can arise when fitting the weighted regression in KernelSHAP.
PartitionSHAP computes Owen values over a hierarchical partition of features. The partition can be supplied by the user (for example, grouping pixels of an image into regions) or learned from feature correlations. PartitionSHAP is particularly useful for high-dimensional inputs like images and text, where attributing importance to individual pixels or tokens is less interpretable than attributing it to regions or phrases.
Jilei Yang of LinkedIn proposed Fast TreeSHAP in a 2021 paper (arXiv:2109.09847), accelerating TreeSHAP by reducing redundant computations. Fast TreeSHAP v1 achieves a 1.5x speedup at the same memory cost; Fast TreeSHAP v2 achieves a 2.5x speedup at slightly higher memory cost. Both enable parallel multi-core computing and share an API with TreeSHAP in shap. The implementation is open-sourced as the linkedin/FastTreeSHAP repository.
SHAP values are local in nature: each set of phi values explains a single prediction. Aggregating SHAP values across many predictions yields global summaries. Three common aggregations:
Global summaries derived from SHAP have a useful property absent from many traditional importance measures: they are guaranteed to be consistent. If model A relies on a feature more than model B for every coalition, the mean absolute SHAP for that feature in A is at least as large as in B.
The SHAP library provides a suite of visualization tools that have become standard in machine learning practice. These visualizations operate at both the individual prediction level (local explanations) and the dataset level (global explanations).
The waterfall plot explains a single prediction by showing how each feature's SHAP value pushes the prediction from the base value (expected model output) to the final prediction. It starts at the bottom with phi_0 = E[f(X)] and shows each feature's contribution as a colored bar: red bars push the prediction higher, blue bars push it lower. Features are ordered by the magnitude of their SHAP values. The waterfall structure visually reinforces the additive nature of SHAP, since the positive and negative contributions stack up to produce the final prediction at the top.
The force plot presents the same information as the waterfall plot in a more compact, horizontal format. Positive SHAP values (features pushing the prediction up) appear on one side and negative SHAP values (features pushing it down) appear on the other, as if competing against each other. Force plots can also be stacked vertically for multiple instances, creating an interactive visualization that reveals patterns across a dataset. When instances are ordered by prediction value or similarity, the stacked force plot shows how feature effects shift across the data distribution.
The summary plot provides a global view of feature importance and effects. For each feature, it plots every instance's SHAP value as a point along the horizontal axis, with features ordered vertically by mean absolute SHAP value. Points are colored by the feature's actual value (typically red for high values, blue for low). The resulting beeswarm pattern reveals which features matter most (wider horizontal spread), the direction of effect (whether high feature values increase or decrease predictions), and the distribution of effects across the dataset.
The beeswarm plot is among the most cited visualizations in applied machine learning papers. It compresses a large amount of information into a compact figure and quickly answers the questions a reader is most likely to ask.
The bar plot is a simpler global importance summary. It displays the mean absolute SHAP value for each feature, averaged across all instances. This single number per feature measures the average magnitude of each feature's contribution, regardless of direction. It serves as a SHAP-based analog of traditional feature importance rankings but is grounded in a theoretically sound attribution framework.
The SHAP dependence plot shows the relationship between a feature's value (x-axis) and its SHAP value (y-axis) across all instances. Each dot represents one data point. The plot reveals nonlinear relationships, thresholds, and saturation effects that the model has learned. It can be enhanced by coloring points according to a second feature's value, which highlights interaction effects. For example, if the effect of age on a prediction depends on income, this interaction will appear as distinct color patterns in the dependence plot.
Dependence plots are particularly valuable for detecting model artifacts. A dependence plot that shows wild scatter rather than a coherent trend often indicates feature interactions, data quality issues, or model overfitting on small subsets of the data.
The decision plot shows how a model arrives at a prediction by tracing the cumulative sum of SHAP values from the base value to the final prediction. Features are listed vertically (typically ordered by importance), and the cumulative contribution is traced as a line from bottom to top. When multiple instances are plotted together, crossing or diverging lines reveal where different predictions start to differ, making it useful for comparing groups of predictions or identifying outliers.
The heatmap plot displays SHAP values for many instances at once, with rows for features and columns for instances. When instances are clustered by prediction or SHAP-vector similarity, the heatmap reveals subgroups the model treats differently, which is useful for fairness audits and detecting subpopulations where the model diverges from the global pattern.
SHAP and LIME are two of the most widely used model-agnostic explanation methods. While both explain individual predictions, they differ in theoretical foundation, stability, and scope.
| Aspect | SHAP | LIME |
|---|---|---|
| Theoretical basis | Shapley values from cooperative game theory | Local surrogate model (weighted linear regression) |
| Axioms satisfied | Local accuracy, missingness, consistency | No formal guarantees |
| Stability | Deterministic for exact methods (TreeSHAP, LinearSHAP); low variance for KernelSHAP | Can produce different explanations on repeated runs due to random perturbations |
| Global interpretability | Yes; SHAP values aggregate naturally to global summaries | Primarily local; global aggregation is ad hoc |
| Computational cost | Expensive for KernelSHAP; fast for TreeSHAP and LinearSHAP | Generally faster for a single explanation |
| Model-specific speedups | TreeSHAP, DeepSHAP, LinearSHAP | None; always model-agnostic |
| Feature interactions | Supported (SHAP interaction values) | Not natively supported |
| Visualization ecosystem | Extensive built-in visualizations | Limited; typically custom plots |
| Ease of understanding | Requires understanding of Shapley values | Conceptually simpler (local linear approximation) |
In practice, SHAP tends to be preferred when consistency, stability, and theoretical rigor are priorities, especially for tree-based models where TreeSHAP makes computation efficient. LIME may be preferred for quick, one-off explanations where speed matters more than formal guarantees, or when the audience benefits from LIME's more intuitive local-linear framing.
Permutation feature importance is a global, model-agnostic measure of feature importance. It works by shuffling the values of one feature and measuring how much the model's loss increases. Important features cause large drops in performance when shuffled.
| Aspect | SHAP | Permutation importance |
|---|---|---|
| Granularity | Per-prediction (local) and aggregable (global) | Global only |
| Output | Signed contribution for each feature for each prediction | Single nonnegative score per feature |
| Cost | Fast for tree models (TreeSHAP); expensive for KernelSHAP | One model evaluation per (feature, permutation, instance) tuple |
| Sensitivity to correlation | Depends on marginal vs. conditional choice | Inflates importance for correlated features (each can be shuffled separately) |
| Use case | Per-prediction debugging, regulatory explanation, feature interaction analysis | Quick global ranking, feature engineering decisions |
Permutation importance is simpler and faster to compute when only a global ranking is needed. SHAP is more nuanced and supports per-prediction explanations, which is essential in regulated domains where individual decisions must be justified.
Integrated Gradients (Sundararajan, Taly, and Yan, 2017) is a gradient-based attribution method for differentiable models. It satisfies two axioms (Sensitivity and Implementation Invariance) and is widely used for interpreting deep neural networks, including image classifiers and increasingly large language models.
| Aspect | SHAP | Integrated Gradients |
|---|---|---|
| Foundation | Cooperative game theory (Shapley values) | Path integral of gradients from baseline to input |
| Axioms | Local accuracy, missingness, consistency | Sensitivity, implementation invariance |
| Model class | Any (with model-specific variants) | Differentiable only |
| Baseline | Background distribution | Single baseline (e.g., zero, blurred image) |
| Cost | TreeSHAP fast for trees; KernelSHAP slow for general models | One pass over a discretized integration path (typically 20 to 100 forward + backward passes) |
| Image and language interpretability | Less common; PartitionSHAP for tokens | Standard tool for vision and LLM attribution |
For differentiable models, GradientSHAP / Expected Gradients can be viewed as a Shapley-respecting extension of Integrated Gradients that averages over a distribution of baselines. Sundararajan and Najmi (2020) gave a careful axiomatic comparison and introduced "Baseline Shapley" as a third alternative.
| Algorithm | Time | Space | Notes |
|---|---|---|---|
| Exact Shapley | O(2^M * T_f) | O(M) | Exponential in M; infeasible beyond about 20 features |
| KernelSHAP | O(K * T_f) | O(K * M) | K sampled coalitions; T_f one prediction; K must be large for stability |
| TreeSHAP | O(T * L * D^2) | O(T * L * D) | T trees, L leaves, D depth |
| Fast TreeSHAP v1 | TreeSHAP / 1.5 | same as TreeSHAP | Drops redundant subtree visits |
| Fast TreeSHAP v2 | TreeSHAP / 2.5 | higher | Caches subtree summaries |
| DeepSHAP | O(B * T_bp) | O(model size) | B background samples; T_bp one backward pass |
| GradientSHAP | O(B * S * T_bp) | O(B + model size) | S integration steps |
| LinearSHAP | O(M) | O(M) | Analytical |
| PermutationSHAP | O(P * M * T_f) | O(M) | P permutations (default around 10) |
For tree models, TreeSHAP makes SHAP practical at the scale of millions of instances and hundreds of features. For deep learning, DeepSHAP and GradientSHAP balance accuracy and speed. For truly model-agnostic explanations, KernelSHAP or PermutationSHAP are available but slow for large feature sets or expensive models. For very large models like LLMs, even one forward pass is costly, motivating token-level alternatives such as PartitionSHAP or mechanistic interpretability methods.
SHAP has been adopted across numerous fields where understanding model predictions is important for trust, regulation, or scientific insight.
SHAP is widely used to explain predictions from clinical risk models, diagnostic classifiers, and drug response predictors. Lundberg et al. (2018, 2020) applied TreeSHAP to a gradient-boosted tree model for anesthesia hypoxemia risk, demonstrating how SHAP values could help clinicians understand which patient characteristics (BMI, age, procedure type, time-varying vital signs) contributed to elevated risk during a surgical procedure. The 2020 Nature Machine Intelligence paper showed that SHAP-based monitoring could detect rising hypoxemia risk earlier than the model's raw probability output, because the SHAP decomposition highlights when previously stable features start contributing.
The method has also been applied to predicting hospital readmission, sepsis onset, ICU mortality, disease diagnosis from imaging, drug response prediction, and treatment effect heterogeneity. In each case the per-prediction explanation is what makes SHAP attractive: clinicians want to know not only the risk score but why a particular patient's score is what it is.
In financial services, regulatory requirements often mandate that automated decisions be explainable. The EU General Data Protection Regulation (GDPR) provides a much-debated "right to explanation" via Article 22, and the U.S. Equal Credit Opportunity Act (ECOA) requires lenders to provide adverse-action notices specifying why a credit application was denied. SHAP provides feature-level explanations for credit scoring models, fraud detection systems, and algorithmic trading strategies. SHAP values help identify which financial indicators (income, debt-to-income ratio, payment history, recent credit inquiries) drive individual credit decisions.
Researchers have built interpretable credit scorecards directly from SHAP values, mapping continuous SHAP contributions to integer point scores reminiscent of traditional credit scoring tables. A 2024 Risks journal study examined SHAP stability in credit risk management and found that SHAP can become significantly less consistent as class imbalance increases, raising concerns about its reliability for severely imbalanced fraud datasets unless carefully validated.
SHAP is applied in fraud detection pipelines to explain why a particular transaction was flagged as suspicious. Summary plots and force plots help investigators understand which transaction features (amount, time, location, merchant category, device fingerprint, behavioral signals) contributed most to the fraud score, allowing analysts to prioritize investigations and refine detection rules. A 2025 review in Artificial Intelligence Review on model-agnostic explainable AI methods in finance identified SHAP and LIME as the dominant tools, with SHAP preferred when consistency across explanations matters.
Insurers use SHAP to explain individual premium calculations and claims decisions. The combination of TreeSHAP with gradient-boosted models is particularly common because gradient boosting dominates insurance modeling and TreeSHAP provides exact attributions in seconds.
Microsoft has integrated SHAP into its responsible-AI tooling and ran SHAP-based interpretability research within Microsoft Research while Lundberg was on staff. Banks, health insurers, manufacturing firms, logistics companies, and cloud providers have published SHAP case studies. LinkedIn maintains the Fast TreeSHAP open-source project for tree-model deployments at scale.
In NLP, SHAP can explain text classification, sentiment analysis, and named entity recognition predictions by attributing importance to individual words or tokens. PartitionSHAP and KernelSHAP are typically used because text inputs do not naturally fit the tree or deep learning-specific methods. The SHAP library includes specialized support for text data through its masker and tokenizer classes.
SHAP has been used to interpret models predicting air quality, wildfire risk, crop yield, and climate variables. In these domains, understanding feature contributions helps researchers validate that models are learning physically meaningful relationships rather than spurious correlations. SHAP values are also used in causal-inference adjacent workflows where the analyst wants to rank predictors before designing follow-up experiments.
SHAP is less commonly used for LLM interpretability than gradient-based methods like Integrated Gradients or mechanistic interpretability tools. The cost of a single LLM forward pass and the high token dimensionality make even PermutationSHAP impractical for explaining a generated response token by token. A 2024 line of research (e.g., Pelaez et al., arXiv:2409.00079) instead uses LLMs to translate raw SHAP outputs into natural-language explanations. SHAP remains the tool of choice when the underlying model is a tabular tree ensemble rather than the LLM itself.
SHAP has been the subject of significant critical literature. The key debates are theoretical, not implementation-level: they concern what Shapley values mean in the context of a learned model and whether they answer the question users actually care about.
The most important debate concerns the value function v(S). Two natural choices give different SHAP values, and the choice affects whether the dummy axiom holds.
Conditional Shapley uses v(S) = E[f(X) | X_S = x_S], the expected prediction conditioned on the observed values of the present features. This respects the data distribution: the model is only ever evaluated on plausible inputs. But it can violate the dummy axiom. A feature that the model genuinely does not use can still receive a nonzero SHAP value if it is correlated with a feature the model does use.
Interventional Shapley uses v(S) = E[f(X) | do(X_S = x_S)], the expected prediction when we intervene to set the present features and let the absent features vary independently from a background dataset. This satisfies the dummy axiom but evaluates the model on inputs that may be implausible or impossible (tall but very light, or warm but very cold).
Janzing, Minorics, and Bloebaum (AISTATS 2020, "Feature relevance quantification in explainable AI: A causal problem") argued forcefully that the interventional version is the only correct one, framing the choice in causal terms via Pearl's do-calculus. They argued that the SHAP package's defaults at the time, which used path-dependent (conditional-style) attribution for TreeSHAP, were causally incorrect. The SHAP library has since added clearer support for the interventional variant via the feature_perturbation="interventional" option.
Heskes, Sijben, Bucur, and Claassen (NeurIPS 2020, "Causal Shapley Values") proposed a third option: causal Shapley values that respect a known causal graph and separate direct from indirect effects. These require structural assumptions beyond what SHAP normally needs but offer more meaningful attributions when a causal model is available.
Sundararajan and Najmi (ICML 2020, "The Many Shapley Values for Model Explanation") catalogued at least three distinct ways to operationalize Shapley values for model explanation: Conditional Expectation Shapley (CES), Interventional Shapley, and Baseline Shapley (BShap), each with different axiomatic properties. They argued that the uniqueness theorem from Shapley (1953) does not apply to model explanation because the choice of value function is a modeling decision, not given by the problem. They advocated for Baseline Shapley as a principled choice with its own uniqueness result and contrasted it with Integrated Gradients.
This paper undercuts a common framing of SHAP as "the unique correct attribution method." The uniqueness is conditional on a particular value function; different value functions give different Shapley values, and the choice involves trade-offs.
Kumar, Venkatasubramanian, Scheidegger, and Friedler (ICML 2020, "Problems with Shapley-value-based explanations as feature importance measures") argued that mathematical problems arise when Shapley values are used as feature importance measures and that fixes to those problems introduce additional complexity, including the need for causal reasoning. Drawing on additional literature, they argued that Shapley values do not provide explanations that suit human-centered goals of explainability. The paper is one of the most cited theoretical critiques of SHAP.
Slack, Hilgard, Jia, Singh, and Lakkaraju (AIES 2020, "Fooling LIME and SHAP") demonstrated a scaffolding technique that hides the biases of a classifier from post hoc explanation methods. Because both LIME and KernelSHAP rely on perturbed inputs that fall outside the natural data distribution, an adversary can construct a model that behaves one way on real inputs and another way on the perturbed inputs the explainer evaluates. The scaffolded model can be arbitrarily biased (the authors demonstrate a racist classifier built on the COMPAS recidivism dataset) yet the SHAP and LIME explanations look innocuous. The attack does not work against TreeSHAP (which uses the model's own tree structure) but it does work against any model-agnostic perturbation-based method.
This result has practical implications for regulatory use of SHAP: an adversary who knows that SHAP will be used for audit can engineer the model to evade it. Defenses include using interventional variants where possible, validating explanations on held-out manipulated inputs, and combining post hoc explanations with intrinsic interpretability.
SHAP values explain how features contribute to a model's prediction, not how features relate to the real-world outcome. A feature can receive a high SHAP value because the model relies on it, even if the feature is a spurious correlate of the true causal mechanism. SHAP does not distinguish between causal and correlational relationships in the underlying data-generating process. This is a feature of the framing (SHAP is about the model, not the world) but is often misunderstood by stakeholders who interpret SHAP rankings as if they were causal effect sizes.
When features are highly multicollinear, SHAP may assign a large attribution to one of the correlated features and near-zero attribution to the others, even if all are genuinely relevant. This can mislead users into thinking certain features are unimportant when they are actually informative but redundant with another feature. The behavior is a property of the Shapley value (credit must be split among interchangeable players) and not a bug, but practitioners often find it counterintuitive.
The base value phi_0 (typically E[f(X)]) represents the prediction when no features are known. The choice of background dataset used to estimate this expectation can influence the resulting SHAP values. Different background datasets (the full training set, a subsample, or a specific reference group) can produce different explanations. Best practice is to choose the background to match the question being asked. To explain a prediction "relative to typical patients," use the training set; to explain it "relative to healthy patients," use a healthy subset.
When the number of features is very large (thousands or more), individual SHAP values become difficult to interpret and visualize. PartitionSHAP and Owen-value-based hierarchical approaches address this by attributing importance to groups of features defined by a partition tree, but they introduce additional modeling choices about how to construct the partition.
A 2024 study in the Risks journal on credit-card default models found that SHAP-based feature rankings became significantly less stable as class imbalance increased, with random initialization alone producing variation in rankings for moderately important features. The takeaway is that SHAP rankings should be validated with bootstrap or cross-validation in domains with severe imbalance.
The primary implementation of SHAP is the open-source Python package shap, originally authored by Scott Lundberg and now community-maintained. It is available via pip (pip install shap) and conda (conda install -c conda-forge shap). As of 2026 the package supports Python 3.11 and later. Recent releases include 0.49 and 0.50 in late 2025 and 0.51 in early 2026. The package's GitHub repository hosts more than 25,000 stars and remains one of the most active interpretability projects in open source.
SHAP support is also integrated directly into several popular machine learning libraries.
| Library | SHAP support |
|---|---|
| XGBoost | Built-in TreeSHAP via model.predict(data, pred_contribs=True) |
| LightGBM | Built-in TreeSHAP support via model.predict(data, pred_contrib=True) |
| CatBoost | Built-in SHAP value computation via get_feature_importance(type="ShapValues") |
| scikit-learn | Compatible via shap.TreeExplainer for tree models, shap.LinearExplainer for linear models, shap.KernelExplainer for arbitrary estimators |
| H2O | SHAP support for tree-based H2O models via the Python and R APIs |
| PiML | Includes SHAP as an explanation module for interpretable ML pipelines |
| FastTreeSHAP (LinkedIn) | Drop-in replacement for TreeExplainer with 1.5x to 2.5x speedups |
| Microsoft InterpretML | Bundles SHAP alongside EBM and other interpretability tools |
| Captum (PyTorch) | Implements GradientSHAP and DeepLIFT-SHAP for PyTorch models |
In R, the shapviz and fastshap packages provide SHAP value computation and visualization, and the xgboost R package includes native SHAP support. Julia has the ShapML.jl package for Monte Carlo Shapley estimation.
A few practical guidelines have emerged from years of SHAP usage in industry. Use TreeSHAP whenever the model is a tree ensemble; there is rarely a reason to use KernelSHAP on a tree model. Prefer the interventional variant of TreeSHAP if the dummy axiom matters or if the explanations will be presented as causal claims. The path-dependent variant is faster but can attribute importance to unused features that are correlated with used ones. Match the background dataset to the question being asked, since the background defines what "typical" means in phi_0. Be skeptical of multicollinear features: when two features carry the same information, SHAP can split the credit arbitrarily between them. Validate stability by running SHAP on bootstrapped or cross-validated samples, especially when class imbalance is severe. Avoid equating SHAP rank with causal importance, since SHAP attributes credit within the model and the model may rely on features that are not causally meaningful. For deep models, prefer GradientSHAP or DeepSHAP over KernelSHAP. For very large models such as LLMs and large transformers, consider mechanistic interpretability or attention-based methods rather than SHAP because the per-instance cost is usually prohibitive.
| Year | Paper | Contribution |
|---|---|---|
| 1953 | Shapley, "A Value for n-Person Games" | Introduced Shapley values |
| 2010 | Strumbelj and Kononenko | Earlier sampling-based Shapley explanations |
| 2014 | Owen, "Sobol' Indices and Shapley Value" | Connected Shapley values to sensitivity analysis |
| 2016 | Ribeiro, Singh, and Guestrin, "Why Should I Trust You?" | Introduced LIME, which SHAP later unified |
| 2017 | Sundararajan, Taly, and Yan | Introduced Integrated Gradients |
| 2017 | Shrikumar, Greenside, and Kundaje | Introduced DeepLIFT, foundation for DeepSHAP |
| 2017 | Lundberg and Lee, "A Unified Approach to Interpreting Model Predictions" | Introduced SHAP, KernelSHAP, uniqueness theorem |
| 2018 | Lundberg, Erion, and Lee | Introduced TreeSHAP |
| 2020 | Lundberg et al., Nature Machine Intelligence | Extended TreeSHAP with interaction values and global views |
| 2020 | Janzing, Minorics, and Bloebaum | Argued interventional Shapley is causally correct |
| 2020 | Sundararajan and Najmi | Catalogued multiple Shapley operationalizations; Baseline Shapley |
| 2020 | Kumar et al. | Critique of SHAP as a feature importance measure |
| 2020 | Heskes, Sijben, Bucur, Claassen | Introduced causal Shapley values |
| 2020 | Slack et al., "Fooling LIME and SHAP" | Adversarial vulnerabilities of explanation methods |
| 2021 | Yang, "Fast TreeSHAP" | Accelerated TreeSHAP, LinkedIn open source |
| 2021 | Covert, Lundberg, and Lee | Unified framework for removal-based explanations |
| 2024 | Pelaez et al. | Used LLMs to translate SHAP outputs into natural language |