See also: Machine learning terms
Variable importance, also called feature importance, is a score assigned to each input variable of a predictive model that measures how much that variable contributes to the model's output. The score gives a ranking of inputs from most influential to least influential. It is one of the oldest and most heavily used tools in explainable AI, used by data scientists and statisticians to interpret models, select features, debug data pipelines, and produce regulatory explanations for high-stakes decisions.
The term covers a large family of methods. Some are model specific (such as the impurity reduction reported by a random forest or the absolute coefficients of a linear model), and others are model agnostic (such as permutation importance, SHAP and LIME). They differ in what they measure, how they compute the measurement, and what assumptions they make about the data. A naive practitioner who reads a single bar chart and concludes that the most important feature is the most causally relevant has likely made a mistake. Importance values reflect the model's behaviour on a particular dataset, not the underlying physics of the world.
There is no single canonical use case. Practitioners reach for variable importance in several different settings.
| Use case | What the score answers | Example |
|---|---|---|
| Model interpretation | Which inputs drive predictions in this model | Showing a credit risk model leans heavily on debt-to-income ratio |
| Feature engineering | Which inputs to keep, drop, or transform | Removing low-importance columns to shrink a tabular pipeline |
| Debugging | Whether the model relies on something it should not | Detecting that an image classifier latched onto a watermark |
| Regulatory explanation | What factors led to a specific decision | Explaining a denied loan under GDPR or the EU AI Act |
| Scientific discovery | Which variables warrant further study | Highlighting genes whose expression correlates with disease state |
| Fairness audit | Whether protected attributes leak through proxies | Spotting that ZIP code is acting as a stand-in for race |
These goals overlap, but they are not identical. A score that is fine for ranking columns by predictive power is not necessarily a good causal explanation, and a metric that works on the training set may give a different answer on held-out data. Choosing a method that fits the goal is half the work.
Variable importance methods fall into two broad camps. Model-specific methods exploit the structure of a particular learner (a tree, a linear model, a neural network) to compute importance cheaply. Model-agnostic methods treat the trained model as a black box and probe it with carefully chosen inputs. Each camp has trade-offs.
| Family | Examples | Pros | Cons |
|---|---|---|---|
| Linear coefficients | Standardised | coefficient | , t-statistic, LASSO selection |
| Tree impurity (MDI) | sklearn feature_importances_, R randomForest::importance | Free during training | Biased toward high-cardinality features, computed on training data |
| Tree gain or split count | XGBoost gain, weight, cover | Cheap, integrates with boosted ensembles | Different metrics give different rankings |
| Permutation importance | sklearn.inspection.permutation_importance | Model agnostic, uses real loss | Misleading with correlated features, costly for large data |
| Shapley values | SHAP, TreeSHAP, KernelSHAP | Strong axiomatic basis, local and global | Expensive in general, choice of background matters |
| Local linear surrogates | LIME, Anchors | Works on any model, easy to grasp | Stochastic, sensitive to neighbourhood definition |
| Counterfactuals | Wachter counterfactuals, DiCE | Fits regulatory framing | Many valid counterfactuals exist, no single ranking |
| Gradient attributions | Saliency, Integrated Gradients, DeepLIFT, Grad-CAM | Designed for differentiable models | Sensitive to baseline, gradient saturation |
For a linear model with response y = w_1 x_1 + ... + w_d x_d + b, the absolute value of each coefficient |w_j| is the standard variable importance. The interpretation is direct: |w_j| measures how much the prediction changes when x_j changes by one unit, holding the other inputs fixed. The score makes sense only if the inputs are on comparable scales, so practitioners standardise the features (subtracting the mean and dividing by the standard deviation) before reading off importances.
Regularised linear models add a second layer of importance signal. LASSO (L1 regularisation) drives many coefficients to exactly zero, so the surviving features form a selected subset. Ridge regression (L2) keeps all features but shrinks correlated ones together. Elastic net interpolates between the two. The shrinkage path itself, plotted as features enter the active set as the regularisation parameter decreases, is sometimes used as an importance ranking.
The coefficient interpretation breaks down when features are highly correlated. Two near-duplicate columns can split the credit and end up with small coefficients each, even though the underlying signal is strong. Variance inflation factors and condition numbers help diagnose multicollinearity before trusting coefficient magnitudes.
Methods built on trees are everywhere because tabular data is everywhere. The two dominant scores are mean decrease in impurity and gain.
Mean decrease in impurity (MDI), the default for sklearn's RandomForestClassifier.feature_importances_ and GradientBoostingRegressor.feature_importances_, sums the reduction in Gini impurity (or entropy or mean squared error) at every split that uses a given feature, weighted by the fraction of training samples reaching that node. The values are normalised across all features to sum to one. MDI is essentially free because the impurity reductions are already computed during tree induction, which is one reason it became the default.
The MDI score has a well documented bias. Carolin Strobl and colleagues showed in 2007 that random forest variable importance measures are unreliable when potential predictors vary in their scale or number of categories. High-cardinality features (such as ZIP code, user ID, or any categorical with many levels) and continuous features with many unique values have more candidate split points than low-cardinality features, so they are more likely to be picked at random and end up with inflated MDI scores. Bootstrap sampling with replacement compounds the bias. The recommended fixes include conditional inference forests with unbiased split selection, holding out validation data for the importance calculation, and switching to permutation importance as a sanity check.
Tree based libraries report several alternative importance scores, each measuring a slightly different thing.
| Score | What it counts | Where it shows up |
|---|---|---|
| MDI / mean decrease in Gini | Total impurity reduction, weighted by samples | sklearn feature_importances_, randomForest importance(type=2) |
| Mean decrease in accuracy (MDA) | Drop in OOB accuracy after permuting the feature | randomForest importance(type=1) |
| Gain | Sum of loss reduction at every split using the feature | XGBoost (default for sklearn API), LightGBM split_gain |
| Weight (or split count) | Number of times the feature is used to split | XGBoost weight, LightGBM splits |
| Cover | Number (or sum of Hessians) of training samples touched by splits on the feature | XGBoost cover, total_cover |
XGBoost makes the choice of metric explicit. The library exposes gain, weight, cover, total_gain, and total_cover. Gain is usually the most informative, weight rewards features that are used as splits often even when each split improves the loss only a little, and cover weighs by the number of training samples affected. The three metrics can disagree dramatically on the same model, which is why practitioners now treat XGBoost importance as a starting point rather than a definitive answer and follow up with SHAP.
Permutation importance was introduced by Leo Breiman in his 2001 paper Random Forests. To measure the importance of feature j, take a fitted model, shuffle the column for j across rows, recompute the model's prediction performance, and record the drop. A large drop means the model relied heavily on j, a small drop means the model could shrug off the loss of that signal. The procedure is fully model agnostic because it only needs to call the model's prediction function.
scikit-learn implements this through sklearn.inspection.permutation_importance. The function repeats the shuffle several times (typically n_repeats=10 or higher) and reports both the mean and the standard deviation of the importance, which lets users tell apart features whose scores are reliably different from random noise.
Fisher, Rudin, and Dominici generalised the idea in 2018 with model class reliance, which uses permutation to measure how much any model in a hypothesis class would have to rely on a feature to achieve a given accuracy. The framework formalises the intuition that a single trained model's importance score is one estimate of an underlying quantity, not the quantity itself.
Permutation importance has a known failure mode. When two features are strongly correlated, permuting one does not break the model, because the other still carries the same information. The model's score barely drops, so both features get a low permutation importance even though the joint signal is critical. Strobl and colleagues proposed conditional permutation importance in 2008, where the column is permuted within strata defined by correlated features, breaking the marginal but preserving the joint. The conditional version is harder to compute and depends on choosing the right stratification, but it is the cleanest fix when correlations cannot be ignored.
Shapley values come from cooperative game theory. They divide the total payout of a coalition of players among individual players in the unique way that satisfies efficiency, symmetry, dummy, and additivity axioms. Scott Lundberg and Su-In Lee, in their 2017 NeurIPS paper A Unified Approach to Interpreting Model Predictions, applied this idea to model explanations. They treat each feature as a player and the model's prediction (relative to a baseline) as the payout, then assign each feature its Shapley value. The resulting framework, SHAP, turned out to unify several earlier explainers (LIME, DeepLIFT, layer-wise relevance propagation) under one set of axioms and proved that a unique solution exists in this class.
The naive computation is exponential in the number of features because the Shapley value averages contributions over all subsets. Practical implementations approximate. KernelSHAP regresses a weighted local linear model over sampled coalitions, similar in spirit to LIME but with the Shapley weighting kernel. DeepSHAP adapts DeepLIFT for neural networks. The most successful is TreeSHAP, presented by Lundberg, Erion, and Lee in 2018 and expanded in their 2020 Nature Machine Intelligence paper From local explanations to global understanding with explainable AI for trees. TreeSHAP exploits the tree structure to compute exact Shapley values for tree ensembles in polynomial time, reducing the cost from O(T L 2^M) to O(T L D^2), where T is the number of trees, L the maximum number of leaves, and D the maximum depth. The same paper introduces SHAP interaction values, which separate main effects from pairwise interactions, and SHAP summary plots, which combine local and global views in a single chart.
SHAP values are local: each prediction has its own attribution. Aggregating local SHAP values across a dataset (taking the mean absolute SHAP value per feature) gives a global importance ranking that respects the same axioms.
The SHAP framework also brought a long-running theoretical debate to the surface. The original paper used conditional expectations to define the value of a coalition (what does the model predict given the features in the coalition, integrating over the rest in a way that respects the data distribution). Dominik Janzing, Lenon Minorics, and Patrick Bloebaum argued in their 2020 paper Feature relevance quantification in explainable AI: A causal problem that this conflates correlation with causation. They show that unconditional (interventional) expectations match Pearl's interventional semantics and avoid attributing importance to a feature that has no causal effect on the prediction but happens to correlate with one that does. The two definitions agree when features are independent and diverge when they are correlated, which is the more interesting case in practice. Modern SHAP implementations expose both modes; users have to pick.
LIME (Local Interpretable Model-agnostic Explanations) was introduced by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin at KDD 2016. The recipe is short: pick a prediction to explain, sample perturbed instances around it, weight the samples by proximity, fit a sparse linear model on those weighted samples, and present the linear coefficients as the explanation. LIME treats the original model as a black box and assumes that any complex decision boundary looks linear when zoomed in close enough.
LIME's strength is interpretability. A linear model with five non-zero terms is easy to read. Its weakness is stability. Small changes in the perturbation kernel, the sampling seed, or the choice of distance metric can produce different explanations for the same instance, which makes it hard to use LIME for high-stakes decisions without ensembling or careful tuning.
Anchors, introduced by the same authors at AAAI 2018, replaces the linear surrogate with a high-precision rule. An anchor is an if-then condition on the inputs (for example, if income > 50k AND age < 35) that locks in the prediction with a stated probability across the perturbed neighbourhood. The result is a sufficient explanation rather than a continuous attribution, which fits cases where users want to understand which feature values guarantee the outcome.
Counterfactual explanations approach the problem differently. Instead of asking how each feature contributed, they ask what the smallest change to the input would flip the prediction. Sandra Wachter, Brent Mittelstadt, and Chris Russell introduced unconditional counterfactual explanations in 2017 specifically as a way to satisfy the GDPR's right to explanation requirements without revealing model internals. A loan denial counterfactual might say if your annual income had been 5,000 higher, the model would have approved the loan, which is actionable in a way that a SHAP plot sometimes is not.
Neural networks are differentiable, which opens a different family of attribution methods. The simplest is the gradient of the output with respect to the input (the saliency map), but raw gradients have known problems: they saturate when activations are flat, they can give zero attribution to features the network clearly uses, and they fail certain natural axioms.
Integrated Gradients, proposed by Mukund Sundararajan, Ankur Taly, and Qiqi Yan at ICML 2017 in Axiomatic Attribution for Deep Networks, fixes both problems. The method picks a baseline (often the all-zero input or a blurred version of the image) and integrates the gradient along a straight line from the baseline to the actual input. The integral satisfies two axioms the authors call sensitivity and implementation invariance, which other gradient methods violate. In practice the integral is approximated with a Riemann sum over twenty to a few hundred steps, and the resulting attribution sums to the difference between the prediction at the input and the prediction at the baseline.
DeepLIFT, by Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje at ICML 2017, computes attributions in a single backward pass by comparing each neuron's activation to a reference activation and propagating contribution scores through the network. DeepLIFT separates positive and negative contributions, which can reveal patterns that gradient-based methods miss.
Grad-CAM, by Ramprasaath Selvaraju and colleagues at ICCV 2017, is the standard tool for explaining convolutional networks. It uses the gradients flowing into the last convolutional layer to produce a coarse heat map highlighting the image regions that contributed to a class prediction. Grad-CAM is widely used in medical imaging because the resulting maps line up well with what a radiologist would point to.
Attention weights inside transformer models are sometimes presented as a free explanation, since attention values are already produced during the forward pass. Several studies (notably Sarah Wiegreffe and Yuval Pinter's 2019 EMNLP paper Attention is not not Explanation, building on earlier critique by Sarthak Jain and Byron Wallace) showed that attention weights can be replaced with very different distributions while keeping the prediction unchanged, so they should not be treated as faithful attributions without further validation.
Variable importance is easy to compute and easy to misread. The literature is full of cautionary results.
| Pitfall | What goes wrong | Mitigation |
|---|---|---|
| MDI bias | High-cardinality features inflate scores | Use permutation importance on held-out data, or unbiased trees |
| Permutation with correlated features | Importance is shared across correlated columns and looks weak | Conditional permutation, group features, or look at SHAP interactions |
| Training set scoring | Importance reflects memorisation rather than generalisation | Always compute on a validation or test set |
| Single-method reliance | Different methods give different rankings | Compute several scores and look for agreement |
| Causal vs predictive confusion | High importance does not mean the variable causes the outcome | Use causal inference, do-calculus, or randomised experiments for causal claims |
| Local vs global confusion | The most important feature on average may be irrelevant for an individual prediction | Look at SHAP local explanations, not just the global summary |
| Stochastic methods | LIME and KernelSHAP have run-to-run variance | Average across many seeds, report variance |
| Ignoring interactions | A single number per feature loses signal when features interact | Plot SHAP interaction values, ICE curves, or PDPs |
| Baseline choice for IG | Different baselines yield different attributions | Try several baselines, average, document the choice |
Janzing's critique deserves repeating because it cuts deep. A SHAP value can attribute importance to a feature that has no influence on the model's prediction function whenever that feature is correlated with one that does. The fix (interventional Shapley values) is mathematically clean but breaks the data distribution, which sometimes leads to predictions on impossible inputs. There is no escape from making the trade-off explicit.
The following workflow captures what experienced practitioners actually do when asked to explain a tabular model.
Most of the methods discussed have battle-tested open source implementations.
| Library | Methods | Notes |
|---|---|---|
| scikit-learn | feature_importances_, permutation_importance | Built into every tree-based estimator |
| SHAP (Python) | KernelSHAP, TreeSHAP, DeepSHAP, GradientSHAP, LinearSHAP | Reference implementation by Scott Lundberg |
| LIME (Python) | Tabular, text, and image explainers | Original implementation by Marco Tulio Ribeiro |
| Captum (PyTorch) | Integrated Gradients, DeepLIFT, GradientSHAP, Saliency, Grad-CAM | Maintained by Meta AI |
| Alibi Explain | Anchors, counterfactuals, integrated gradients | Maintained by Seldon |
| InterpretML | Explainable Boosting Machines plus wrappers around SHAP and LIME | From Microsoft Research |
| H2O | MOJO scoring with SHAP for GBM, GLM, RF | Production-friendly Java runtime |
| XGBoost / LightGBM / CatBoost | Built-in gain, weight, cover plus integrated SHAP | SHAP support is native, not a wrapper |
| dalex | Permutation importance, partial dependence, SHAP, ceteris paribus profiles | R and Python |
| ELI5 | Permutation importance, formatted explanations | Python |
A minimal SHAP example for a gradient boosting model in scikit-learn looks like this.
import shap
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = GradientBoostingRegressor(random_state=0).fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)
The summary plot shows each feature on a row, with one dot per test sample positioned by its SHAP value and coloured by the feature's value. This single chart reveals magnitude, direction, and interaction structure at once, which is why it has become the default global explanation in tabular data science.
Feature importance grew up in the world of small tabular models. Two trends are reshaping it.
The first is regulation. Article 22 of the General Data Protection Regulation gives EU citizens the right not to be subject to decisions based solely on automated processing, and Article 15(1)(h) requires meaningful information about the logic involved. The EU AI Act, which became law in 2024 and is rolling out through 2026, adds Article 86, the right to obtain from the deployer clear and meaningful explanations of the role of a high-risk AI system in a decision and the main elements of the decision taken. SHAP and counterfactual explanations are the two methods most cited in compliance discussions, because both produce per-instance explanations that satisfy the meaningful information requirement without forcing the company to reveal the entire model. Insurance, lending, hiring, and healthcare deployments increasingly bundle SHAP explanations with every prediction.
The second is mechanistic interpretability for large language models. Feature importance over input tokens is shallow; modern LLM interpretability digs into the network's internal representations. Anthropic's 2023 paper Towards Monosemanticity by Trenton Bricken and colleagues used sparse autoencoders to decompose a one-layer transformer into 131,072 monosemantic features that humans rated as interpretable about 70 percent of the time. The 2024 follow-up Scaling Monosemanticity by Adly Templeton and colleagues scaled the technique to Claude 3 Sonnet and discovered features for concepts ranging from the Golden Gate Bridge to insecure code. These methods are not labelled feature importance, but they answer the same underlying question (what does this model attend to?) at a much finer grain than input-level scores can. For tabular data, classical importance still rules. For language models, sparse autoencoders and circuit analysis are the new front line.
Variable importance is a tool, not a theorem. Several limitations apply across methods.
A single number per feature loses interaction information. If two features only matter when they appear together (think latitude and longitude), neither will look important in isolation. SHAP interaction values, partial dependence plots, and individual conditional expectation curves recover some of the lost structure.
Local and global importance are different. The feature with the highest mean importance across a dataset may be irrelevant for explaining one specific prediction, and vice versa. Both views matter.
Importance is not causation. Judea Pearl has argued for years that statistical methods alone cannot answer causal questions; you need a causal model. A feature can have high importance because it really drives the outcome, because it acts as a proxy for the true cause, or because the data collection introduced a confounder. Distinguishing the cases requires either a randomised experiment or a causal model with assumptions you are willing to defend.
Finally, importance scores depend on the data the model was trained on. Move the model to a different population, and the importances can shift even if the model's parameters are unchanged. Practitioners working across domains should recompute importance on data drawn from the deployment population, not the training set.
Imagine you are baking a cake and want to know which ingredient matters most for how good it tastes. You could bake the cake the normal way and ask people to score it. Then you bake it again, but secretly swap the sugar for sand. People hate it, so sugar must be very important. You bake again, this time leaving out one tablespoon of vanilla. People barely notice, so vanilla is less important. Doing this for every ingredient gives you a ranking from most important to least important. That is what variable importance does for a machine learning model. The model is the recipe, the ingredients are the input columns, and the ranking tells you which inputs the model leaned on the most when it made its predictions.