See also: Explainable artificial intelligence, Feature importance
SHAP (SHapley Additive exPlanations) is a framework for explaining individual predictions of machine learning models. Developed by Scott Lundberg and Su-In Lee and first published in 2017, SHAP assigns each input feature a numerical value representing its contribution to a specific prediction. These values are computed using Shapley values, a concept from cooperative game theory introduced by Lloyd Shapley in 1953. SHAP provides a unified approach to interpreting model predictions by connecting several existing explanation methods (including LIME, DeepLIFT, and classic Shapley regression values) under a single theoretical framework.
The central question SHAP answers is: for a given prediction, how much did each feature push the prediction away from the average (baseline) prediction? By grounding feature attributions in Shapley values, SHAP is the only additive feature attribution method that simultaneously satisfies three desirable properties: local accuracy, missingness, and consistency. This theoretical guarantee has made SHAP one of the most widely adopted tools in explainable AI.
The open-source Python library shap, which has accumulated over 25,000 stars on GitHub, provides implementations of several SHAP estimation algorithms (KernelSHAP, TreeSHAP, DeepSHAP, and others) along with a rich set of visualization tools.
Imagine you and your friends bake a cake together. One friend brought the flour, another brought eggs, and another brought the sugar. Now you want to figure out how much each friend's ingredient helped make the cake taste good. You could try baking the cake without flour to see how much worse it gets, then try without eggs, and so on. But that is not quite fair, because some ingredients work better together. So instead, you try every possible combination of ingredients and figure out, on average, how much each one helped. That average contribution is what SHAP calculates for each feature in a machine learning prediction.
Lloyd Shapley introduced the Shapley value in 1953 as part of his doctoral dissertation at Princeton University, titled "Additive and Non-additive Set Functions." The work was published in the collection Contributions to the Theory of Games, Volume II. Shapley's goal was to answer a fundamental question in cooperative game theory: when a group of players work together and generate some total payoff, how should that payoff be fairly divided among the players?
Shapley proved that there is exactly one division scheme satisfying a set of fairness axioms. For this and related contributions to game theory, Shapley was awarded the Nobel Memorial Prize in Economic Sciences in 2012 (shared with Alvin Roth).
In a cooperative game with a set N of n players and a value function v that maps each coalition (subset) S of players to a real-valued payoff v(S), the Shapley value for player i is:
phi_i(v) = sum over all S subset of N not containing i: [ |S|! * (n - |S| - 1)! / n! ] * [ v(S union {i}) - v(S) ]
The term v(S union {i}) - v(S) is the marginal contribution of player i to coalition S. The weighting factor |S|! * (n - |S| - 1)! / n! corresponds to the probability of coalition S forming before player i arrives in a uniformly random permutation of all players. In other words, the Shapley value is the average marginal contribution of a player across all possible orderings in which the coalition could have been assembled.
Shapley proved that the Shapley value is the unique solution satisfying four axioms:
| Axiom | Description |
|---|---|
| Efficiency | The contributions of all players sum to the total value of the grand coalition: sum of phi_i = v(N) - v(empty set) |
| Symmetry | If two players contribute equally to every coalition, they receive equal Shapley values |
| Dummy (Null player) | A player who adds zero marginal contribution to every coalition receives a Shapley value of zero |
| Additivity (Linearity) | For two games v and w, the Shapley value of the combined game v + w equals the sum of the individual Shapley values |
In the context of machine learning, the "players" are input features, and the "game" is a single prediction. The value function v(S) is typically defined as the expected model output when only the features in coalition S are known and the remaining features are marginalized out. The Shapley value for each feature then quantifies its average marginal contribution to the prediction, providing a principled way to distribute credit for the prediction among the input features.
Lundberg and Lee (2017) defined a class of explanation models called additive feature attribution methods. An explanation model g for an original model f is a linear function of binary variables:
g(z') = phi_0 + sum from j=1 to M of phi_j * z_j'
where z' is a vector in {0, 1}^M (a coalition vector indicating which features are "present" or "absent"), M is the number of features, and phi_j is the attribution for feature j. The term phi_0 is the base value, typically equal to the expected model output E[f(X)].
When all features are present (all z_j' = 1), the explanation model produces:
g(x) = phi_0 + sum from j=1 to M of phi_j
which, under the local accuracy property, equals the model's actual prediction f(x).
Lundberg and Lee proved that within the class of additive feature attribution methods, there is a unique set of values phi_0, phi_1, ..., phi_M satisfying three properties simultaneously:
1. Local accuracy. The sum of feature attributions plus the base value equals the model prediction for the instance being explained:
f(x) = phi_0 + sum from j=1 to M of phi_j
This corresponds to the efficiency axiom from Shapley value theory. It ensures that the explanation fully accounts for the model's output.
2. Missingness. If a feature is not present in the simplified input (z_j' = 0), its attribution must be zero:
z_j' = 0 implies phi_j = 0
This is a consistency constraint ensuring that features not available for a prediction are not assigned any credit.
3. Consistency. If a model changes so that a feature's marginal contribution increases or stays the same for every possible coalition, then that feature's SHAP value must not decrease. Formally, for two models f and f', if for all coalitions S not containing feature j:
f'(S union {j}) - f'(S) >= f(S union {j}) - f(S)
then phi_j(f') >= phi_j(f).
Consistency ensures that a feature that becomes more influential in a new model is never assigned lower importance. Lundberg and Lee showed that several popular explanation methods (including standard Shapley regression values, LIME, and DeepLIFT) are additive feature attribution methods but do not all satisfy these three properties simultaneously. The Shapley value is the only solution that does.
The SHAP values for a model f and instance x are defined as the Shapley values of a conditional expectation game. The value function for a coalition S of features is:
v(S) = E[f(X) | X_S = x_S]
where X_S denotes the subset of features in S, fixed to their values in x, and the expectation is taken over the remaining features according to their conditional distribution (or, in practice, often their marginal distribution). The SHAP value for feature j is then the Shapley value phi_j(v) computed from this value function.
KernelSHAP is the original model-agnostic algorithm proposed by Lundberg and Lee for estimating SHAP values. It works with any model that can produce predictions but is computationally more expensive than model-specific alternatives.
KernelSHAP operates in five steps:
Sample coalitions. Generate K random coalition vectors z_k' from {0, 1}^M, where each vector indicates which features are included (1) or excluded (0).
Map to feature space. For each coalition z_k', construct a corresponding input in the original feature space using a mapping function h_x. Features marked as present (1) take their values from the instance x; features marked as absent (0) are replaced with values from randomly sampled data points (approximating the marginal distribution).
Get predictions. Evaluate the model on each mapped input to obtain f(h_x(z_k')).
Weight coalitions. Assign each coalition a weight using the SHAP kernel:
pi_x(z') = (M - 1) / [ C(M, |z'|) * |z'| * (M - |z'|) ]
where C(M, |z'|) is the binomial coefficient and |z'| is the number of features present. This kernel assigns the highest weights to coalitions that are very small or very large, because those provide the most information about individual feature contributions.
minimize over phi: sum over z' in Z: [ f(h_x(z')) - g(z') ]^2 * pi_x(z')
The resulting regression coefficients are the estimated SHAP values.
Exact computation of Shapley values requires evaluating 2^M coalitions, which is exponential in the number of features. KernelSHAP approximates the values by sampling a subset of coalitions and fitting the weighted regression, making it tractable for models with many features. However, it remains slow for large-scale applications because each coalition requires a model evaluation, and many samples are needed for stable estimates. The computational cost scales as O(K * T_f), where K is the number of sampled coalitions and T_f is the cost of a single model prediction.
Lundberg and Lee showed that LIME (Local Interpretable Model-agnostic Explanations) by Ribeiro et al. (2016) is itself an additive feature attribution method. LIME fits a local linear model around the instance being explained using perturbed samples, but it uses a different kernel function and does not guarantee the Shapley properties. KernelSHAP can be understood as a specific configuration of LIME that uses the SHAP kernel and a particular regularization setup, which yields Shapley values as the unique optimal solution.
TreeSHAP is a fast, exact algorithm for computing SHAP values for tree-based models, including decision trees, random forests, gradient boosting machines (XGBoost, LightGBM, CatBoost), and other tree ensemble methods. It was introduced by Lundberg, Erion, and Lee in 2018 and later expanded upon in a 2020 Nature Machine Intelligence paper.
TreeSHAP exploits the structure of decision trees to compute SHAP values in polynomial time rather than exponential time. Instead of enumerating all 2^M feature subsets, the algorithm recursively traverses the tree and tracks which feature coalitions lead to which leaf predictions. At each internal node, the algorithm maintains a running account of how many coalitions include or exclude the splitting feature, allowing it to accumulate Shapley values as it walks through the tree.
For an ensemble of trees, the SHAP values are computed independently for each tree and then combined (e.g., averaged for random forests or summed for gradient boosting).
TreeSHAP reduces the complexity from O(T * L * 2^M) for exact KernelSHAP applied to a tree ensemble to O(T * L * D^2), where T is the number of trees, L is the maximum number of leaves, and D is the maximum tree depth. Since D is typically much smaller than M (the number of features), this represents a dramatic speedup.
TreeSHAP has two variants:
| Variant | Value function | Feature independence assumption | Axiom compliance |
|---|---|---|---|
| Interventional | E[f(X) | do(X_S = x_S)] | Assumes features are independent when marginalizing | Satisfies all Shapley axioms |
| Path-dependent (conditional) | E[f(X) | X_S = x_S] | Uses the data distribution within tree paths | May violate the dummy axiom for correlated features |
The interventional variant computes standard Shapley values by treating absent features as independent of present features. The path-dependent variant follows the tree's branching structure to estimate conditional expectations, which can be more faithful to the tree's actual decision process but may assign nonzero SHAP values to features the model does not use (when those features are correlated with features the model does use).
TreeSHAP also supports the computation of SHAP interaction values, which decompose each prediction into main effects and pairwise interaction effects. The interaction value for features i and j is:
phi_ij = sum over S not containing i or j: [ |S|! * (M - |S| - 2)! / (2 * (M - 1)!) ] * delta_ij(S)
where delta_ij(S) = f(S union {i, j}) - f(S union {i}) - f(S union {j}) + f(S). The main effects phi_ii and the interaction effects phi_ij sum to the overall SHAP value for feature i: phi_i = phi_ii + sum over j != i of phi_ij.
DeepSHAP is an algorithm for estimating SHAP values for deep learning models, including neural networks built with TensorFlow, Keras, and PyTorch. It builds on a connection between SHAP and DeepLIFT, an earlier attribution method by Shrikumar, Greenside, and Kundaje (2017).
DeepLIFT assigns attribution scores by comparing each neuron's activation to a reference activation and propagating contribution scores backward through the network using custom rules. Lundberg and Lee showed that the per-neuron attribution rules in DeepLIFT can be chosen to approximate Shapley values.
DeepSHAP extends DeepLIFT in two ways:
Instead of a single reference input, DeepSHAP uses a distribution of background samples, averaging the DeepLIFT attributions over multiple reference points. This better approximates the expectation over absent features.
DeepSHAP uses Shapley equations to linearize nonlinear operations such as max, softmax, and element-wise products, improving the quality of the approximation for complex architectures.
The modified backpropagation rules define "multipliers" at each layer that relate input differences to output differences, analogous to how gradients flow backward through the network. These multipliers follow a chain rule through the layers, enabling efficient attribution computation in a single backward pass.
DeepSHAP computes approximate, not exact, SHAP values. The approximation quality depends on the network architecture and the choice of background samples. For architectures with complex interactions between layers, the layer-wise decomposition may not perfectly correspond to true Shapley values. KernelSHAP can be used as a model-agnostic alternative for deep learning models when higher accuracy is needed, though at much greater computational cost.
Beyond KernelSHAP, TreeSHAP, and DeepSHAP, the SHAP framework includes several additional algorithms:
| Method | Target model type | Approach | Exact or approximate |
|---|---|---|---|
| KernelSHAP | Any (model-agnostic) | Weighted linear regression with SHAP kernel | Approximate |
| TreeSHAP | Tree ensembles | Recursive tree traversal | Exact |
| DeepSHAP | Neural networks | Modified DeepLIFT backpropagation | Approximate |
| LinearSHAP | Linear models | Analytical computation from model weights | Exact |
| Permutation SHAP | Any (model-agnostic) | Antithetic permutation sampling | Approximate (satisfies efficiency exactly) |
| Partition SHAP | Any (model-agnostic) | Hierarchical feature partitioning (Owen values) | Approximate |
| Sampling SHAP | Any (model-agnostic) | Monte Carlo sampling of permutations | Approximate |
LinearSHAP is particularly simple: for a linear model f(x) = beta_0 + sum of beta_j * x_j, the SHAP value for feature j is phi_j = beta_j * (x_j - E[X_j]). No iterative estimation is needed.
Permutation SHAP generates random feature orderings, builds coalitions progressively, and estimates marginal contributions by comparing predictions with and without each feature. The method uses antithetic sampling (processing each permutation both forward and backward) to reduce variance. It has the advantage that the SHAP values always sum exactly to f(x) - E[f(X)] regardless of the number of samples.
The SHAP library provides a suite of visualization tools that have become standard in machine learning practice. These visualizations operate at both the individual prediction level (local explanations) and the dataset level (global explanations).
The waterfall plot explains a single prediction by showing how each feature's SHAP value pushes the prediction from the base value (expected model output) to the final prediction. It starts at the bottom with phi_0 = E[f(X)] and shows each feature's contribution as a colored bar: red bars push the prediction higher, and blue bars push it lower. Features are ordered by the magnitude of their SHAP values. The waterfall structure visually reinforces the additive nature of SHAP: the positive and negative contributions stack up to produce the final prediction at the top.
The force plot presents the same information as the waterfall plot in a more compact, horizontal format. Positive SHAP values (features pushing the prediction up) appear on one side and negative SHAP values (features pushing it down) appear on the other, as if competing against each other. Force plots can also be stacked vertically for multiple instances, creating an interactive visualization that reveals patterns across a dataset. When instances are ordered by prediction value or similarity, the stacked force plot shows how feature effects shift across the data distribution.
The summary plot provides a global view of feature importance and effects. For each feature, it plots every instance's SHAP value as a point along the horizontal axis, with features ordered vertically by mean absolute SHAP value (importance). Points are colored by the feature's actual value (e.g., red for high, blue for low). This produces a "beeswarm" pattern that reveals:
The bar plot is a simpler global importance summary. It displays the mean absolute SHAP value for each feature, averaged across all instances. This single number per feature measures the average magnitude of each feature's contribution, regardless of direction. It serves as a SHAP-based analog of traditional feature importance rankings but is grounded in a theoretically sound attribution framework.
The SHAP dependence plot shows the relationship between a feature's value (x-axis) and its SHAP value (y-axis) across all instances. Each dot represents one data point. The plot reveals nonlinear relationships, thresholds, and saturation effects that the model has learned. It can be enhanced by coloring points according to a second feature's value, which highlights interaction effects. For example, if the effect of "age" on a prediction depends on "income," this interaction will appear as distinct color patterns in the dependence plot.
The decision plot shows how a model arrives at a prediction by tracing the cumulative sum of SHAP values from the base value to the final prediction. Features are listed vertically (typically ordered by importance), and the cumulative contribution is traced as a line from bottom to top. When multiple instances are plotted together, crossing or diverging lines reveal where different predictions start to differ, making it useful for comparing groups of predictions or identifying outliers.
SHAP and LIME are two of the most widely used model-agnostic explanation methods. While both explain individual predictions, they differ in theoretical foundation, stability, and scope.
| Aspect | SHAP | LIME |
|---|---|---|
| Theoretical basis | Shapley values from cooperative game theory | Local surrogate model (weighted linear regression) |
| Axioms satisfied | Local accuracy, missingness, consistency | No formal guarantees |
| Stability | Deterministic for exact methods (TreeSHAP, LinearSHAP); low variance for KernelSHAP | Can produce different explanations on repeated runs due to random perturbations |
| Global interpretability | Yes; SHAP values aggregate naturally to global summaries | Primarily local; global aggregation is ad hoc |
| Computational cost | Expensive for KernelSHAP; fast for TreeSHAP and LinearSHAP | Generally faster for a single explanation |
| Model-specific speedups | TreeSHAP, DeepSHAP, LinearSHAP | None; always model-agnostic |
| Feature interactions | Supported (SHAP interaction values) | Not natively supported |
| Visualization ecosystem | Extensive built-in visualizations | Limited; typically custom plots |
| Ease of understanding | Requires understanding of Shapley values | Conceptually simpler (local linear approximation) |
In practice, SHAP tends to be preferred when consistency, stability, and theoretical rigor are priorities, especially for tree-based models where TreeSHAP makes computation efficient. LIME may be preferred for quick, one-off explanations where speed matters more than formal guarantees, or when the audience benefits from LIME's more intuitive local-linear framing.
SHAP has been adopted across numerous fields where understanding model predictions is important for trust, regulation, or scientific insight.
SHAP is widely used to explain predictions from clinical risk models, diagnostic classifiers, and drug response predictors. Lundberg et al. (2020) applied TreeSHAP to explain predictions of a gradient-boosted tree model for anesthesia hypoxemia risk, demonstrating how SHAP values could help clinicians understand which patient characteristics (e.g., BMI, age, procedure type) contributed to elevated risk. The method has also been applied to explain models predicting hospital readmission, disease diagnosis, and treatment outcomes.
In financial services, regulatory requirements (such as the EU's General Data Protection Regulation and the Equal Credit Opportunity Act in the United States) often mandate that automated decisions be explainable. SHAP provides feature-level explanations for credit scoring models, fraud detection systems, and algorithmic trading strategies. SHAP values help identify which financial indicators (income, debt ratio, payment history) drive individual credit decisions.
SHAP is applied in fraud detection pipelines to explain why a particular transaction was flagged as suspicious. Summary plots and force plots help investigators understand which transaction features (amount, time, location, merchant category) contributed most to the fraud score, allowing analysts to prioritize investigations and refine detection rules.
In NLP, SHAP can explain text classification, sentiment analysis, and named entity recognition predictions by attributing importance to individual words or tokens. KernelSHAP and Partition SHAP are typically used because text inputs do not naturally fit the tree or deep learning-specific methods. The SHAP library includes specialized support for text data through its masker and tokenizer classes.
SHAP has been used to interpret models predicting air quality, wildfire risk, crop yield, and climate variables. In these domains, understanding feature contributions helps researchers validate that models are learning physically meaningful relationships rather than spurious correlations.
The computational cost of SHAP varies dramatically depending on the algorithm used:
| Algorithm | Time complexity | Space complexity | Notes |
|---|---|---|---|
| Exact Shapley values | O(2^M * T_f) | O(M) | Exponential in number of features M; infeasible beyond approximately 20 features |
| KernelSHAP | O(K * T_f) | O(K * M) | K = number of sampled coalitions; T_f = cost of one prediction; K must be large for stability |
| TreeSHAP | O(T * L * D^2) | O(T * L * D) | T = trees, L = max leaves, D = max depth; polynomial and fast |
| DeepSHAP | O(B * T_bp) | O(model size) | B = background samples; T_bp = cost of one backward pass; roughly B times a single backprop |
| LinearSHAP | O(M) | O(M) | Analytical; trivially fast |
| Permutation SHAP | O(P * M * T_f) | O(M) | P = number of permutations (default around 10) |
For tree-based models, TreeSHAP makes SHAP practical even for datasets with millions of instances and hundreds of features. For deep learning models, DeepSHAP provides a reasonable balance between accuracy and speed. For truly model-agnostic explanations, KernelSHAP or Permutation SHAP are available but may be slow for large feature sets or expensive models.
Jilei Yang (2021) proposed Fast TreeSHAP, which further accelerates TreeSHAP by reducing redundant computations, achieving speedups of 1.5x to 20x over the original TreeSHAP implementation depending on the model structure.
KernelSHAP and Permutation SHAP require many model evaluations per explained instance. For models that are expensive to evaluate (large neural networks, complex ensembles), computing SHAP values for every instance in a test set can be prohibitively slow.
The standard SHAP formulation (interventional, using marginal distributions) assumes that features are independent when simulating absent features. When features are correlated, this means the algorithm evaluates the model on data points that may be unrealistic or impossible in practice. For example, if height and weight are positively correlated, the marginal approach might evaluate the model on a combination of very tall height and very low weight. This can lead to misleading attributions. The conditional approach addresses this but introduces other problems (see below).
Using conditional expectations E[f(X) | X_S = x_S] instead of marginal expectations respects the data distribution but can violate the dummy axiom: a feature that the model does not use may still receive a nonzero SHAP value if it is correlated with a feature the model does use. There is ongoing debate in the research community about which approach is more appropriate, and the choice can affect interpretation.
SHAP values explain how features contribute to a model's prediction, not how features relate to the real-world outcome. A feature can receive a high SHAP value because the model relies on it, even if the feature is a spurious correlate of the true causal mechanism. SHAP does not distinguish between causal and correlational relationships.
The base value phi_0 (typically E[f(X)]) represents the prediction when no features are known. The choice of background dataset used to estimate this expectation can influence the resulting SHAP values. Different background datasets (the full training set, a subsample, or a specific reference group) can produce different explanations.
Slack et al. (2020) demonstrated that it is possible to construct models that intentionally produce misleading SHAP explanations. A model can be designed to behave differently on the realistic data points used for prediction than on the unrealistic combinations generated during the SHAP estimation process, effectively hiding biased behavior from SHAP-based audits.
When the number of features is very large (thousands or more), individual SHAP values become difficult to interpret and visualize. Feature grouping or hierarchical SHAP approaches (such as Partition SHAP with Owen values) have been proposed to address this, but they introduce additional modeling choices.
When features are highly multicollinear, SHAP may assign a large attribution to one of the correlated features and near-zero attribution to the others, even if all are genuinely relevant. This can mislead users into thinking certain features are unimportant when they are actually informative but redundant with another feature.
The primary implementation of SHAP is the open-source Python package shap, originally authored by Scott Lundberg and now community-maintained. It is available via pip (pip install shap) and conda (conda install -c conda-forge shap). The package supports Python 3.11 and later.
SHAP support is also integrated directly into several popular machine learning libraries:
| Library | SHAP support |
|---|---|
| XGBoost | Built-in TreeSHAP via model.predict(data, pred_contribs=True) |
| LightGBM | Built-in TreeSHAP support |
| CatBoost | Built-in SHAP value computation |
| scikit-learn | Compatible via shap.TreeExplainer for tree models |
| PiML | Includes SHAP as an explanation module |
In R, the shapviz and fastshap packages provide SHAP value computation and visualization, and the xgboost R package includes native SHAP support.
The development of SHAP has been documented across several papers:
| Year | Paper | Contribution |
|---|---|---|
| 1953 | Shapley, "A Value for n-Person Games" | Introduced Shapley values in cooperative game theory |
| 2016 | Ribeiro et al., "Why Should I Trust You?" | Introduced LIME, which SHAP later unified |
| 2017 | Lundberg and Lee, "A Unified Approach to Interpreting Model Predictions" | Introduced SHAP framework, KernelSHAP, proved uniqueness theorem |
| 2017 | Shrikumar et al., "Learning Important Features Through Propagating Activation Differences" | Introduced DeepLIFT, foundation for DeepSHAP |
| 2018 | Lundberg, Erion, and Lee, "Consistent Individualized Feature Attribution for Tree Ensembles" | Introduced TreeSHAP |
| 2020 | Lundberg et al., "From Local Explanations to Global Understanding with Explainable AI for Trees" | Extended TreeSHAP with interaction values, global visualizations; published in Nature Machine Intelligence |
| 2020 | Slack et al., "Fooling LIME and SHAP" | Demonstrated adversarial vulnerabilities of explanation methods |
| 2021 | Yang, "Fast TreeSHAP" | Accelerated TreeSHAP computation |