Variable importances

Variable importance

Variable importance, also called feature importance, is a score assigned to each input variable of a predictive model that measures how much that variable contributes to the model's output. The score gives a ranking of inputs from most influential to least influential. It is one of the oldest and most heavily used tools in explainable AI, used by data scientists and statisticians to interpret models, select features, debug data pipelines, and produce regulatory explanations for high-stakes decisions.

The term covers a large family of methods. Some are model specific (such as the impurity reduction reported by a random forest or the absolute coefficients of a linear model), and others are model agnostic (such as permutation importance, SHAP and LIME). They differ in what they measure, how they compute the measurement, and what assumptions they make about the data. A naive practitioner who reads a single bar chart and concludes that the most important feature is the most causally relevant has likely made a mistake. Importance values reflect the model's behaviour on a particular dataset, not the underlying physics of the world.

What variable importance is for

There is no single canonical use case. Practitioners reach for variable importance in several different settings.

Use case	What the score answers	Example
Model interpretation	Which inputs drive predictions in this model	Showing a credit risk model leans heavily on debt-to-income ratio
Feature engineering	Which inputs to keep, drop, or transform	Removing low-importance columns to shrink a tabular pipeline
Debugging	Whether the model relies on something it should not	Detecting that an image classifier latched onto a watermark
Regulatory explanation	What factors led to a specific decision	Explaining a denied loan under GDPR or the EU AI Act
Scientific discovery	Which variables warrant further study	Highlighting genes whose expression correlates with disease state
Fairness audit	Whether protected attributes leak through proxies	Spotting that ZIP code is acting as a stand-in for race

These goals overlap, but they are not identical. A score that is fine for ranking columns by predictive power is not necessarily a good causal explanation, and a metric that works on the training set may give a different answer on held-out data. Choosing a method that fits the goal is half the work.

Categories of methods

Variable importance methods fall into two broad camps. Model-specific methods exploit the structure of a particular learner (a tree, a linear model, a neural network) to compute importance cheaply. Model-agnostic methods treat the trained model as a black box and probe it with carefully chosen inputs. Each camp has trade-offs.

Family	Examples	Pros	Cons
Linear coefficients	Standardised	coefficient	, t-statistic, LASSO selection
Tree impurity (MDI)	sklearn `feature_importances_`, R `randomForest::importance`	Free during training	Biased toward high-cardinality features, computed on training data
Tree gain or split count	XGBoost `gain`, `weight`, `cover`	Cheap, integrates with boosted ensembles	Different metrics give different rankings
Permutation importance	`sklearn.inspection.permutation_importance`	Model agnostic, uses real loss	Misleading with correlated features, costly for large data
Shapley values	SHAP, TreeSHAP, KernelSHAP	Strong axiomatic basis, local and global	Expensive in general, choice of background matters
Local linear surrogates	LIME, Anchors	Works on any model, easy to grasp	Stochastic, sensitive to neighbourhood definition
Counterfactuals	Wachter counterfactuals, DiCE	Fits regulatory framing	Many valid counterfactuals exist, no single ranking
Gradient attributions	Saliency, Integrated Gradients, DeepLIFT, Grad-CAM	Designed for differentiable models	Sensitive to baseline, gradient saturation

Linear models

For a linear model with response y = w_1 x_1 + ... + w_d x_d + b, the absolute value of each coefficient |w_j| is the standard variable importance. The interpretation is direct: |w_j| measures how much the prediction changes when x_j changes by one unit, holding the other inputs fixed. The score makes sense only if the inputs are on comparable scales, so practitioners standardise the features (subtracting the mean and dividing by the standard deviation) before reading off importances.

Regularised linear models add a second layer of importance signal. LASSO (L1 regularisation) drives many coefficients to exactly zero, so the surviving features form a selected subset. Ridge regression (L2) keeps all features but shrinks correlated ones together. Elastic net interpolates between the two. The shrinkage path itself, plotted as features enter the active set as the regularisation parameter decreases, is sometimes used as an importance ranking.

The coefficient interpretation breaks down when features are highly correlated. Two near-duplicate columns can split the credit and end up with small coefficients each, even though the underlying signal is strong. Variance inflation factors and condition numbers help diagnose multicollinearity before trusting coefficient magnitudes.

Tree based importance

Methods built on trees are everywhere because tabular data is everywhere. The two dominant scores are mean decrease in impurity and gain.

Mean decrease in impurity (MDI), the default for sklearn's RandomForestClassifier.feature_importances_ and GradientBoostingRegressor.feature_importances_, sums the reduction in Gini impurity (or entropy or mean squared error) at every split that uses a given feature, weighted by the fraction of training samples reaching that node. The values are normalised across all features to sum to one. MDI is essentially free because the impurity reductions are already computed during tree induction, which is one reason it became the default.

The MDI score has a well documented bias. Carolin Strobl and colleagues showed in 2007 that random forest variable importance measures are unreliable when potential predictors vary in their scale or number of categories. High-cardinality features (such as ZIP code, user ID, or any categorical with many levels) and continuous features with many unique values have more candidate split points than low-cardinality features, so they are more likely to be picked at random and end up with inflated MDI scores. Bootstrap sampling with replacement compounds the bias. The recommended fixes include conditional inference forests with unbiased split selection, holding out validation data for the importance calculation, and switching to permutation importance as a sanity check.

Tree based libraries report several alternative importance scores, each measuring a slightly different thing.

Score	What it counts	Where it shows up
MDI / mean decrease in Gini	Total impurity reduction, weighted by samples	sklearn `feature_importances_`, randomForest `importance(type=2)`
Mean decrease in accuracy (MDA)	Drop in OOB accuracy after permuting the feature	randomForest `importance(type=1)`
Gain	Sum of loss reduction at every split using the feature	XGBoost (default for sklearn API), LightGBM `split_gain`
Weight (or split count)	Number of times the feature is used to split	XGBoost `weight`, LightGBM `splits`
Cover	Number (or sum of Hessians) of training samples touched by splits on the feature	XGBoost `cover`, `total_cover`

XGBoost makes the choice of metric explicit. The library exposes gain, weight, cover, total_gain, and total_cover. Gain is usually the most informative, weight rewards features that are used as splits often even when each split improves the loss only a little, and cover weighs by the number of training samples affected. The three metrics can disagree dramatically on the same model, which is why practitioners now treat XGBoost importance as a starting point rather than a definitive answer and follow up with SHAP.

Permutation importance

Permutation importance was introduced by Leo Breiman in his 2001 paper Random Forests. To measure the importance of feature j, take a fitted model, shuffle the column for j across rows, recompute the model's prediction performance, and record the drop. A large drop means the model relied heavily on j, a small drop means the model could shrug off the loss of that signal. The procedure is fully model agnostic because it only needs to call the model's prediction function.

scikit-learn implements this through sklearn.inspection.permutation_importance. The function repeats the shuffle several times (typically n_repeats=10 or higher) and reports both the mean and the standard deviation of the importance, which lets users tell apart features whose scores are reliably different from random noise.

Fisher, Rudin, and Dominici generalised the idea in 2018 with model class reliance, which uses permutation to measure how much any model in a hypothesis class would have to rely on a feature to achieve a given accuracy. The framework formalises the intuition that a single trained model's importance score is one estimate of an underlying quantity, not the quantity itself.

Permutation importance has a known failure mode. When two features are strongly correlated, permuting one does not break the model, because the other still carries the same information. The model's score barely drops, so both features get a low permutation importance even though the joint signal is critical. Strobl and colleagues proposed conditional permutation importance in 2008, where the column is permuted within strata defined by correlated features, breaking the marginal but preserving the joint. The conditional version is harder to compute and depends on choosing the right stratification, but it is the cleanest fix when correlations cannot be ignored.

Shapley value methods (SHAP)

Shapley values come from cooperative game theory. They divide the total payout of a coalition of players among individual players in the unique way that satisfies efficiency, symmetry, dummy, and additivity axioms. Scott Lundberg and Su-In Lee, in their 2017 NeurIPS paper A Unified Approach to Interpreting Model Predictions, applied this idea to model explanations. They treat each feature as a player and the model's prediction (relative to a baseline) as the payout, then assign each feature its Shapley value. The resulting framework, SHAP, turned out to unify several earlier explainers (LIME, DeepLIFT, layer-wise relevance propagation) under one set of axioms and proved that a unique solution exists in this class.

The naive computation is exponential in the number of features because the Shapley value averages contributions over all subsets. Practical implementations approximate. KernelSHAP regresses a weighted local linear model over sampled coalitions, similar in spirit to LIME but with the Shapley weighting kernel. DeepSHAP adapts DeepLIFT for neural networks. The most successful is TreeSHAP, presented by Lundberg, Erion, and Lee in 2018 and expanded in their 2020 Nature Machine Intelligence paper From local explanations to global understanding with explainable AI for trees. TreeSHAP exploits the tree structure to compute exact Shapley values for tree ensembles in polynomial time, reducing the cost from O(T L 2^M) to O(T L D^2), where T is the number of trees, L the maximum number of leaves, and D the maximum depth. The same paper introduces SHAP interaction values, which separate main effects from pairwise interactions, and SHAP summary plots, which combine local and global views in a single chart.

SHAP values are local: each prediction has its own attribution. Aggregating local SHAP values across a dataset (taking the mean absolute SHAP value per feature) gives a global importance ranking that respects the same axioms.

The SHAP framework also brought a long-running theoretical debate to the surface. The original paper used conditional expectations to define the value of a coalition (what does the model predict given the features in the coalition, integrating over the rest in a way that respects the data distribution). Dominik Janzing, Lenon Minorics, and Patrick Bloebaum argued in their 2020 paper Feature relevance quantification in explainable AI: A causal problem that this conflates correlation with causation. They show that unconditional (interventional) expectations match Pearl's interventional semantics and avoid attributing importance to a feature that has no causal effect on the prediction but happens to correlate with one that does. The two definitions agree when features are independent and diverge when they are correlated, which is the more interesting case in practice. Modern SHAP implementations expose both modes; users have to pick.

LIME, Anchors, and counterfactuals

LIME (Local Interpretable Model-agnostic Explanations) was introduced by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin at KDD 2016. The recipe is short: pick a prediction to explain, sample perturbed instances around it, weight the samples by proximity, fit a sparse linear model on those weighted samples, and present the linear coefficients as the explanation. LIME treats the original model as a black box and assumes that any complex decision boundary looks linear when zoomed in close enough.

LIME's strength is interpretability. A linear model with five non-zero terms is easy to read. Its weakness is stability. Small changes in the perturbation kernel, the sampling seed, or the choice of distance metric can produce different explanations for the same instance, which makes it hard to use LIME for high-stakes decisions without ensembling or careful tuning.

Anchors, introduced by the same authors at AAAI 2018, replaces the linear surrogate with a high-precision rule. An anchor is an if-then condition on the inputs (for example, if income > 50k AND age < 35) that locks in the prediction with a stated probability across the perturbed neighbourhood. The result is a sufficient explanation rather than a continuous attribution, which fits cases where users want to understand which feature values guarantee the outcome.

Counterfactual explanations approach the problem differently. Instead of asking how each feature contributed, they ask what the smallest change to the input would flip the prediction. Sandra Wachter, Brent Mittelstadt, and Chris Russell introduced unconditional counterfactual explanations in 2017 specifically as a way to satisfy the GDPR's right to explanation requirements without revealing model internals. A loan denial counterfactual might say if your annual income had been 5,000 higher, the model would have approved the loan, which is actionable in a way that a SHAP plot sometimes is not.

Methods for neural networks

Neural networks are differentiable, which opens a different family of attribution methods. The simplest is the gradient of the output with respect to the input (the saliency map), but raw gradients have known problems: they saturate when activations are flat, they can give zero attribution to features the network clearly uses, and they fail certain natural axioms.

Integrated Gradients, proposed by Mukund Sundararajan, Ankur Taly, and Qiqi Yan at ICML 2017 in Axiomatic Attribution for Deep Networks, fixes both problems. The method picks a baseline (often the all-zero input or a blurred version of the image) and integrates the gradient along a straight line from the baseline to the actual input. The integral satisfies two axioms the authors call sensitivity and implementation invariance, which other gradient methods violate. In practice the integral is approximated with a Riemann sum over twenty to a few hundred steps, and the resulting attribution sums to the difference between the prediction at the input and the prediction at the baseline.

DeepLIFT, by Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje at ICML 2017, computes attributions in a single backward pass by comparing each neuron's activation to a reference activation and propagating contribution scores through the network. DeepLIFT separates positive and negative contributions, which can reveal patterns that gradient-based methods miss.

Grad-CAM, by Ramprasaath Selvaraju and colleagues at ICCV 2017, is the standard tool for explaining convolutional networks. It uses the gradients flowing into the last convolutional layer to produce a coarse heat map highlighting the image regions that contributed to a class prediction. Grad-CAM is widely used in medical imaging because the resulting maps line up well with what a radiologist would point to.

Attention weights inside transformer models are sometimes presented as a free explanation, since attention values are already produced during the forward pass. Several studies (notably Sarah Wiegreffe and Yuval Pinter's 2019 EMNLP paper Attention is not not Explanation, building on earlier critique by Sarthak Jain and Byron Wallace) showed that attention weights can be replaced with very different distributions while keeping the prediction unchanged, so they should not be treated as faithful attributions without further validation.

Common pitfalls

Variable importance is easy to compute and easy to misread. The literature is full of cautionary results.

Pitfall	What goes wrong	Mitigation
MDI bias	High-cardinality features inflate scores	Use permutation importance on held-out data, or unbiased trees
Permutation with correlated features	Importance is shared across correlated columns and looks weak	Conditional permutation, group features, or look at SHAP interactions
Training set scoring	Importance reflects memorisation rather than generalisation	Always compute on a validation or test set
Single-method reliance	Different methods give different rankings	Compute several scores and look for agreement
Causal vs predictive confusion	High importance does not mean the variable causes the outcome	Use causal inference, do-calculus, or randomised experiments for causal claims
Local vs global confusion	The most important feature on average may be irrelevant for an individual prediction	Look at SHAP local explanations, not just the global summary
Stochastic methods	LIME and KernelSHAP have run-to-run variance	Average across many seeds, report variance
Ignoring interactions	A single number per feature loses signal when features interact	Plot SHAP interaction values, ICE curves, or PDPs
Baseline choice for IG	Different baselines yield different attributions	Try several baselines, average, document the choice

Janzing's critique deserves repeating because it cuts deep. A SHAP value can attribute importance to a feature that has no influence on the model's prediction function whenever that feature is correlated with one that does. The fix (interventional Shapley values) is mathematically clean but breaks the data distribution, which sometimes leads to predictions on impossible inputs. There is no escape from making the trade-off explicit.

A practical workflow

The following workflow captures what experienced practitioners actually do when asked to explain a tabular model.

Train and validate the model. Importance is meaningless if the model itself does not generalise.
Compute several importance scores in parallel. Pair MDI with permutation importance and SHAP. If they agree, the answer is robust. If they disagree, dig into why.
Always compute permutation importance and SHAP on held-out data, never on the training set.
Plot a SHAP summary plot to see both the magnitude and the direction of each feature's effect.
Plot SHAP dependence plots for the top features to see how the effect changes across the feature's range and whether interactions are present.
Validate by ablation: drop the top features and retrain. The model performance should fall by roughly the amount the importance scores predicted.
If the goal is feature selection, use recursive feature elimination or LASSO regularisation. Pure ranking is rarely the right answer.
If the goal is causal, use a method designed for causal inference (instrumental variables, regression discontinuity, do-calculus). Importance is not a substitute.

Implementations

Most of the methods discussed have battle-tested open source implementations.

Library	Methods	Notes
scikit-learn	`feature_importances_`, `permutation_importance`	Built into every tree-based estimator
SHAP (Python)	KernelSHAP, TreeSHAP, DeepSHAP, GradientSHAP, LinearSHAP	Reference implementation by Scott Lundberg
LIME (Python)	Tabular, text, and image explainers	Original implementation by Marco Tulio Ribeiro
Captum (PyTorch)	Integrated Gradients, DeepLIFT, GradientSHAP, Saliency, Grad-CAM	Maintained by Meta AI
Alibi Explain	Anchors, counterfactuals, integrated gradients	Maintained by Seldon
InterpretML	Explainable Boosting Machines plus wrappers around SHAP and LIME	From Microsoft Research
H2O	MOJO scoring with SHAP for GBM, GLM, RF	Production-friendly Java runtime
XGBoost / LightGBM / CatBoost	Built-in `gain`, `weight`, `cover` plus integrated SHAP	SHAP support is native, not a wrapper
dalex	Permutation importance, partial dependence, SHAP, ceteris paribus profiles	R and Python
ELI5	Permutation importance, formatted explanations	Python

A minimal SHAP example for a gradient boosting model in scikit-learn looks like this.

import shap
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split

X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = GradientBoostingRegressor(random_state=0).fit(X_train, y_train)

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

shap.summary_plot(shap_values, X_test)

The summary plot shows each feature on a row, with one dot per test sample positioned by its SHAP value and coloured by the feature's value. This single chart reveals magnitude, direction, and interaction structure at once, which is why it has become the default global explanation in tabular data science.

Variable importance in modern AI

Feature importance grew up in the world of small tabular models. Two trends are reshaping it.

The first is regulation. Article 22 of the General Data Protection Regulation gives EU citizens the right not to be subject to decisions based solely on automated processing, and Article 15(1)(h) requires meaningful information about the logic involved. The EU AI Act, which became law in 2024 and is rolling out through 2026, adds Article 86, the right to obtain from the deployer clear and meaningful explanations of the role of a high-risk AI system in a decision and the main elements of the decision taken. SHAP and counterfactual explanations are the two methods most cited in compliance discussions, because both produce per-instance explanations that satisfy the meaningful information requirement without forcing the company to reveal the entire model. Insurance, lending, hiring, and healthcare deployments increasingly bundle SHAP explanations with every prediction.

The second is mechanistic interpretability for large language models. Feature importance over input tokens is shallow; modern LLM interpretability digs into the network's internal representations. Anthropic's 2023 paper Towards Monosemanticity by Trenton Bricken and colleagues used sparse autoencoders to decompose a one-layer transformer into 131,072 monosemantic features that humans rated as interpretable about 70 percent of the time. The 2024 follow-up Scaling Monosemanticity by Adly Templeton and colleagues scaled the technique to Claude 3 Sonnet and discovered features for concepts ranging from the Golden Gate Bridge to insecure code. These methods are not labelled feature importance, but they answer the same underlying question (what does this model attend to?) at a much finer grain than input-level scores can. For tabular data, classical importance still rules. For language models, sparse autoencoders and circuit analysis are the new front line.

Limitations

Variable importance is a tool, not a theorem. Several limitations apply across methods.

A single number per feature loses interaction information. If two features only matter when they appear together (think latitude and longitude), neither will look important in isolation. SHAP interaction values, partial dependence plots, and individual conditional expectation curves recover some of the lost structure.

Local and global importance are different. The feature with the highest mean importance across a dataset may be irrelevant for explaining one specific prediction, and vice versa. Both views matter.

Importance is not causation. Judea Pearl has argued for years that statistical methods alone cannot answer causal questions; you need a causal model. A feature can have high importance because it really drives the outcome, because it acts as a proxy for the true cause, or because the data collection introduced a confounder. Distinguishing the cases requires either a randomised experiment or a causal model with assumptions you are willing to defend.

Finally, importance scores depend on the data the model was trained on. Move the model to a different population, and the importances can shift even if the model's parameters are unchanged. Practitioners working across domains should recompute importance on data drawn from the deployment population, not the training set.

Explain like I'm 5

Imagine you are baking a cake and want to know which ingredient matters most for how good it tastes. You could bake the cake the normal way and ask people to score it. Then you bake it again, but secretly swap the sugar for sand. People hate it, so sugar must be very important. You bake again, this time leaving out one tablespoon of vanilla. People barely notice, so vanilla is less important. Doing this for every ingredient gives you a ranking from most important to least important. That is what variable importance does for a machine learning model. The model is the recipe, the ingredients are the input columns, and the ranking tells you which inputs the model leaned on the most when it made its predictions.

Variable importances