LIME (Local Interpretable Model-Agnostic Explanations) is a technique for explaining individual predictions of any black-box machine learning classifier or regressor by approximating the model locally with a simple, interpretable surrogate. It was introduced by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin in the 2016 paper "Why Should I Trust You?: Explaining the Predictions of Any Classifier," presented at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Within a few years it became one of the most widely cited tools in explainable AI, helping practitioners debug models, audit fairness, and satisfy regulatory demands for algorithmic transparency. The method works by perturbing the input around a single prediction, recording how the model's output shifts, and fitting a sparse linear model (or another simple surrogate) to those perturbations. Because LIME treats the underlying model as a black box and only needs query access to its inputs and outputs, it is genuinely model-agnostic and applies to neural networks, random forests, gradient-boosted trees, support vector machines, and any other predictor.
Imagine a magic box that looks at a photo and says "cat" or "dog." You want to know why the box said "cat." So you start covering up bits of the photo with paper. When you cover the ears and the box suddenly says "dog," you learn that the ears were carrying the "cat" answer. LIME does roughly the same thing. It hides or changes small parts of the input, watches how the answer wobbles, then writes down which parts moved the needle the most.
As models grew from logistic regressions with a handful of features into deep neural networks with millions of parameters and stacked tree ensembles with thousands of estimators, they also grew opaque. A model can hit 99 percent accuracy on a held-out test set and still be making decisions for completely wrong reasons. The 2016 paper opens with the now-famous example of a 20 Newsgroups random forest that separated "Christianity" from "Atheism" posts at 92.4 percent accuracy by latching onto email headers like "Posting-Host" and "NNTP-Posting-Host" rather than the religious content of the messages. Without an explanation tool, that data leakage would have shipped to production.
Three pressures made local explanations valuable around the time LIME appeared:
Before LIME, most interpretability tools were either model-specific (reading the weights of a logistic regression, inspecting decision paths in a decision tree) or only global (averaged variable importances across the whole dataset). LIME staked out the middle ground: instance-level explanations that work on any model.
The paper was written while Ribeiro was a PhD student at the University of Washington, advised by Carlos Guestrin and collaborating with Sameer Singh, who is now a professor at UC Irvine. Ribeiro later joined Microsoft Research and then Microsoft AI in Seattle, where he continued to publish on model debugging, behavioral testing (the CheckList paper at ACL 2020), and interactive ML.
The paper was uploaded to arXiv on 16 February 2016 as preprint 1602.04938 and presented at KDD 2016 in San Francisco that August, where it received the audience appreciation award. By the mid 2020s it had accumulated over 15,000 citations on Google Scholar, placing it among the most cited papers in machine learning interpretability.
The acronym encodes three design goals.
| Property | Meaning |
|---|---|
| Local | Explanations describe the model's behavior in a small neighborhood around one prediction, not the whole input space. A black-box model may behave very differently in different regions, and local fidelity is far more achievable than global fidelity. |
| Interpretable | The explanation must be readable by a human. LIME uses inherently interpretable representations: binary indicators for word presence, binary masks over image superpixels, discretized bins for tabular features. |
| Model-agnostic | LIME never inspects gradients, weights, or architecture. It only queries the model with arbitrary inputs and reads the outputs, so the same code can explain a logistic regression, a random forest, or a 175-billion parameter language model. |
The core procedure has five steps.
The interpretable representation used inside LIME is usually a binary vector. For text, each dimension is one if the word is present and zero if removed. For images, each dimension is one if a superpixel is shown and zero if grayed out. For tabular data, each dimension corresponds to a discretized bin. The surrogate model lives in this binary space, but the predictions it tries to match come from the model evaluated on the actual perturbed input.
| Data type | Interpretable representation | Perturbation method | Distance metric |
|---|---|---|---|
| Tabular | Discretized feature bins, often quartiles for continuous features and original categories for categorical ones | Sample new values from a normal distribution fitted to the per-feature mean and standard deviation of the training data, or sample from the empirical distribution | Euclidean distance on the scaled feature vector |
| Text | Binary vector indicating presence or absence of each token in the original document | Randomly drop tokens from the text, replacing them with an empty string or a placeholder | Cosine distance, or one minus the fraction of tokens retained |
| Image | Binary vector indicating whether each superpixel is on or off | Segment the image into superpixels using quickshift (default), SLIC, or felzenszwalb from scikit-image, then randomly turn superpixels off by replacing their pixels with a fixed color (gray) or the mean RGB value | Cosine distance on the binary superpixel vector |
LIME poses the explanation as an optimization problem that balances local fidelity against simplicity:
explanation(x) = argmin_{g in G} L(f, g, pi_x) + Omega(g)
The terms are:
For sparse linear surrogates, the loss is locally weighted squared error:
L(f, g, pi_x) = sum_{z, z' in Z} pi_x(z) * (f(z) - g(z'))^2
Here z is a perturbed sample in the original feature space, z' is its binary interpretable representation, and Z is the set of generated perturbations.
The proximity kernel is exponential:
pi_x(z) = exp(-D(x, z)^2 / sigma^2)
where D is a distance function (Euclidean for tabular, cosine for text and images) and sigma is a kernel width. The reference Python implementation defaults to sigma = 0.75 * sqrt(number_of_features), a value whose origin is not justified theoretically and which has become one of the most criticized aspects of the package.
To keep the surrogate readable, LIME restricts it to at most K nonzero features. The reference implementation runs a LASSO regression with a decreasing regularization parameter and stops when exactly K features have entered the model (the so-called regularization path). Other supported feature selection strategies include forward selection and selecting the K features with the largest absolute weights from a non-sparse fit.
The official Python package ships three classes that wrap the same core algorithm.
| Variant | Class | Typical use case | Notes |
|---|---|---|---|
| LIME-Tabular | LimeTabularExplainer | Credit scoring, fraud, churn, medical risk | Requires the training dataset to fit per-feature statistics. Categorical features are passed via categorical_features and categorical_names. |
| LIME-Text | LimeTextExplainer | Sentiment analysis, spam detection, topic classification | Tokenizes the document with a regex by default and explains in terms of word presence. Includes a built-in HTML visualizer that highlights influential words inline. |
| LIME-Image | LimeImageExplainer | Image classification debugging, medical imaging | Uses scikit-image segmentation (quickshift by default; SLIC or felzenszwalb optional) to define superpixels, then perturbs by graying out subsets. |
A fourth construct in the same library is SP-LIME (submodular pick), which selects a small set of instances whose individual explanations together give a global picture.
Individual LIME explanations are local. Practitioners often need a global view, but generating an explanation for every instance is impractical and most explanations would be redundant. SP-LIME frames instance selection as a submodular optimization problem.
SP-LIME builds an explanation matrix W of size n by d', where n is the number of candidate instances and d' is the number of interpretable features. Entry W_ij is the absolute weight that feature j received in the LIME explanation for instance i:
W_ij = |w_{g_i, j}|
A global importance vector I summarizes how representative each feature is across explanations:
I_j = sqrt(sum_{i=1}^{n} W_ij)
The coverage of a candidate set V of selected instances is the total importance of features that appear in at least one of those instances' explanations:
c(V, W, I) = sum_{j=1}^{d'} 1_{[exists i in V : W_ij > 0]} * I_j
The pick problem is to choose at most B instances that maximize coverage:
Pick(W, I) = argmax_{V, |V| <= B} c(V, W, I)
Maximizing coverage is NP-hard, but the function c is monotone submodular, so the standard greedy algorithm gives a (1 - 1/e), or roughly 63 percent, approximation guarantee. The procedure is:
In the original paper, SP-LIME outperformed random sampling for small budgets. Trust assessments based on SP-LIME-selected examples were a much better predictor of the model's true generalization than the same number of randomly chosen explanations.
The husky-versus-wolf experiment is the canonical illustration of why local explanations matter. Ribeiro and his coauthors trained a logistic regression on a hand-curated set of 20 images: every wolf photo had snow in the background, every husky photo did not. The classifier achieved high accuracy on a held-out test set drawn from the same distribution. LIME explanations, however, revealed that the classifier had learned almost nothing about the animals themselves; it predicted "wolf" whenever the lower portion of the image was snowy and "husky" otherwise.
The authors then ran a small user study with 27 graduate students who had taken at least one machine learning course. Without LIME, 10 of 27 trusted the classifier; only 12 of 27 mentioned snow as a likely feature. After being shown LIME explanations, only 3 of 27 still trusted the classifier and 25 of 27 identified snow as the cue the model was relying on. The experiment is now a standard teaching example for spurious correlations, dataset bias, and the difference between accuracy and validity.
In the same paper, a random forest trained on the 20 Newsgroups corpus to distinguish Christianity posts from Atheism posts hit 92.4 percent test accuracy. LIME explanations highlighted features such as "Posting-Host," "Re," "NNTP," and "there," almost none of which carry semantic meaning about religion. The classifier was exploiting metadata from email headers that happened to correlate with class labels in this particular collection. Stripped of that metadata, the model's accuracy collapsed. The example became a textbook case of leakage in academic NLP datasets.
SHAP (SHapley Additive exPlanations), introduced by Scott Lundberg and Su-In Lee in 2017, is the most direct successor in the same family of local, model-agnostic explanation methods. The two share design goals but differ in foundations and trade-offs.
| Aspect | LIME | SHAP |
|---|---|---|
| Theoretical basis | Locally weighted surrogate model fit to perturbations | Shapley values from cooperative game theory |
| Explanation scope | Local only (SP-LIME bolts on a global view) | Local, with principled global aggregations |
| Consistency | Not guaranteed; reruns can yield different attributions | Satisfies local accuracy, missingness, and consistency axioms |
| Speed | Fast for low-dimensional inputs; per-instance cost grows with the number of perturbations | Exact computation is exponential in features; KernelSHAP is similar to LIME in cost; TreeSHAP is polynomial for tree ensembles |
| Feature interactions | Not captured by the linear surrogate | Captured via Shapley interaction values |
| Additivity | Coefficients do not have to sum to the prediction | Attributions sum exactly to f(x) minus the expected value |
| Hyperparameter sensitivity | Highly sensitive to kernel width, sample count, and K | Fewer hyperparameters; more deterministic outputs |
| Model-specific optimizations | None | TreeSHAP, DeepSHAP, LinearSHAP |
| Typical use today | Quick, exploratory, or low-stakes explanations | Default for production tabular interpretability |
The original SHAP paper showed that LIME, DeepLIFT, and a few other methods can be unified as special cases of additive feature attribution methods, and that LIME's choice of weighting function and loss does not satisfy the Shapley axioms. In practice, SHAP has displaced LIME as the default for tabular interpretability in industry tooling, while LIME survives mainly for text and image use cases or as a quick first pass.
| Method | Family | Model access | Output | Strength | Weakness |
|---|---|---|---|---|---|
| LIME | Local surrogate | Black box (predict only) | Sparse linear weights over interpretable features | Works on any model, fast for low dimensions, easy to read | Unstable, sensitive to kernel width and sampling |
| SHAP | Shapley values | Black box (KernelSHAP) or model-specific (TreeSHAP, DeepSHAP) | Additive attributions per feature | Theoretical guarantees, additive | Slow in the model-agnostic case |
| Integrated Gradients | Gradient based | White box, requires differentiable model | Per-feature attribution along a baseline-to-input path | Satisfies completeness and sensitivity axioms, fast | Needs gradients, requires a baseline choice |
| Grad-CAM | Activation map | White box, convolutional networks | Heatmap over feature map locations | Fast, visually intuitive for CNNs | Limited to convolutional architectures |
| Permutation importance | Global feature importance | Black box | Drop in performance when a feature is shuffled | Simple, model-agnostic, global | No instance-level view, mishandles correlated features |
| Anchors | Rule based | Black box | IF-THEN rule with coverage and precision | Explicit boundaries, easy to verify | Coarse for high-dimensional inputs |
For smooth differentiable models, LIME's superpixel attribution and the sum of integrated gradients over the same superpixel converge in expectation, giving a theoretical link between perturbation-based and gradient-based methods.
LIME relies on random perturbation sampling, so two runs on the same instance can yield different attributions. Empirical studies have shown that explanations for two nearly identical inputs can disagree on which features matter most. This nondeterminism is one of the strongest objections to using LIME as evidence in regulatory or auditing settings.
The kernel width sigma controls how local the explanation is, and there is no principled way to pick it. Different values can produce contradictory explanations for the same instance. Christoph Molnar, author of the open-source book Interpretable Machine Learning, calls this "the biggest problem with LIME" and advises practitioners to try several kernel widths and check the explanations for consistency. The default value of 0.75 * sqrt(d) is a heuristic with no clear theoretical basis.
The default tabular perturbation samples each feature independently from a normal distribution. This ignores correlations between features and routinely produces unrealistic synthetic samples (a 30 kg adult who is 2 meters tall, a credit card transaction with a negative amount, an image where pixel intensities are wildly inconsistent). The black-box model is then queried on inputs from a distribution it was never trained on, and the resulting explanations may not reflect anything the model actually does on real data.
LIME's surrogate is usually a sparse linear model. If the black-box model is highly nonlinear in the local neighborhood, no linear approximation can be faithful. Interaction effects between features are invisible by construction.
Because perturbations are generated synthetically rather than drawn from the data manifold, the black-box model is evaluated on inputs that may be out of distribution. For high-dimensional images or long documents, the perturbed inputs are often visibly broken (an image with random patches grayed out, a sentence with random words removed), and the model's behavior on these is not necessarily indicative of its behavior on real inputs.
In 2020, Dylan Slack, Hilgard, Jia, Singh, and Lakkaraju showed at the AAAI/ACM Conference on AI, Ethics, and Society that an attacker can construct an adversarial classifier that behaves discriminatorily on real data and benignly on the perturbations LIME generates. The trick is a scaffolding model that detects whether the input is real or synthetic, returns the biased prediction in the former case and an innocuous one in the latter. LIME (and to a lesser extent SHAP) then reports a clean explanation while the deployed model continues to discriminate. The paper used the COMPAS recidivism dataset and showed that explanations could be made arbitrary while accuracy on real data was unchanged.
Beyond the kernel width, LIME requires the user to set the number of features K, the number of perturbations, the distance metric, and (for images) the segmentation algorithm and its parameters. Each of these choices changes the output. There is little practical guidance and almost no automated tuning support in the reference package.
A typical LIME explanation queries the model thousands of times. For large language models or expensive computer vision pipelines, this is slow and costly. Batching helps but does not change the asymptotic cost.
The limitations above have driven a long line of follow-up work, much of it from the same research community.
| Method | Year | Improvement over LIME |
|---|---|---|
| SHAP | 2017 | Theoretically grounded Shapley attributions with consistency, local accuracy, and missingness guarantees |
| Anchors | 2018 | Same authors as LIME; produces high-precision IF-THEN rules with explicit coverage instead of linear surrogates |
| DLIME | 2019 | Replaces random perturbation with hierarchical clustering plus k-nearest neighbors, producing deterministic explanations on healthcare datasets |
| BayLIME | 2021 | Adds Bayesian inference over surrogate weights to give uncertainty intervals on each attribution |
| OptiLIME | 2021 | Automates kernel width selection by optimizing for stability of the explanation |
| GLIME | 2023 | Generalizes LIME under a unified framework that recovers KernelSHAP and other variants |
| US-LIME | 2024 | Replaces random perturbation with uncertainty sampling to improve fidelity on tabular data |
| BMB-LIME | 2024 | Extends the surrogate beyond linear models to handle local nonlinearity and uncertainty |
Anchors, introduced by Ribeiro, Singh, and Guestrin at AAAI 2018, is the most direct successor and is often described by its authors as a remedy for several LIME failure modes. Instead of a linear approximation that holds locally with no stated boundary, anchors produce conditional rules of the form "IF word 'excellent' is present AND word 'awful' is absent THEN positive sentiment, with precision 0.97 and coverage 0.18." The rule explicitly states the conditions under which it holds and the fraction of inputs it covers, so users know when to apply it.
The reference Python package (lime on PyPI, marcotcr/lime on GitHub) is maintained by Ribeiro and exposes LimeTabularExplainer, LimeTextExplainer, LimeImageExplainer, and submodular_pick. It is open source under the BSD-2-Clause license.
| Package | Language | Description |
|---|---|---|
| lime | Python | The original implementation. Install with pip install lime. Ships LimeTabularExplainer, LimeTextExplainer, LimeImageExplainer, and submodular_pick. |
| lime (R) | R | R port maintained by Thomas Lin Pedersen, available on CRAN. |
| iml | R | Christoph Molnar's general interpretability package; includes a LIME variant called LocalModel. |
| DALEX | R, Python | Model exploration toolkit from Przemyslaw Biecek's group; wraps LIME alongside breakdown, ceteris paribus, and other methods. |
| InterpretML | Python | Microsoft's interpretability toolkit; includes LIME alongside SHAP, explainable boosting machines, partial dependence, and morris sensitivity. |
| eli5 | Python | Library for debugging and explaining classifiers; supports LIME-style explanations for text classifiers. |
| alibi | Python | Seldon's interpretability library; includes Anchors, integrated gradients, counterfactuals, and a LIME-compatible interface. |
| Captum | Python | PyTorch interpretability toolkit; primarily gradient-based but ships an Occlusion baseline that resembles LIME-Image for vision models. |
| Skater | Python | Older interpretability toolkit (now mostly unmaintained) that bundled LIME, partial dependence, and feature importance. |
A minimal example using LIME with scikit-learn:
import lime
import lime.lime_tabular
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
iris = load_iris()
rf = RandomForestClassifier(n_estimators=100, random_state=0)
rf.fit(iris.data, iris.target)
explainer = lime.lime_tabular.LimeTabularExplainer(
training_data=iris.data,
feature_names=iris.feature_names,
class_names=iris.target_names,
mode='classification',
)
exp = explainer.explain_instance(
data_row=iris.data<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>,
predict_fn=rf.predict_proba,
num_features=4,
num_samples=5000,
)
exp.show_in_notebook()
For text classification, LimeTextExplainer takes a callable that maps a list of strings to a probability matrix:
from lime.lime_text import LimeTextExplainer
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
pipeline = make_pipeline(TfidfVectorizer(), LogisticRegression())
pipeline.fit(train_texts, train_labels)
explainer = LimeTextExplainer(class_names=['neg', 'pos'])
exp = explainer.explain_instance(
test_texts<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>,
pipeline.predict_proba,
num_features=10,
)
exp.show_in_notebook()
For images, LimeImageExplainer accepts a callable returning class probabilities for a batch of images:
from lime.lime_image import LimeImageExplainer
from skimage.segmentation import slic
explainer = LimeImageExplainer()
exp = explainer.explain_instance(
image,
classifier_fn=model.predict,
top_labels=5,
hide_color=0,
num_samples=1000,
segmentation_fn=lambda x: slic(x, n_segments=50, compactness=10),
)
image_with_mask, mask = exp.get_image_and_mask(
label=exp.top_labels<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>, positive_only=True, num_features=5,
)
LIME has been applied to chest X-ray classifiers, retinal screening models, histopathology models, and Alzheimer's disease detection from MRI. Clinicians use the highlighted regions to check whether the model attends to clinically relevant anatomy or to artifacts such as scanner watermarks, image borders, or device-specific noise. The deterministic variant DLIME was developed specifically because instability in standard LIME explanations made it hard to deploy in computer-aided diagnosis. A 2024 review in PMC catalogs dozens of LIME applications in Alzheimer's research alone.
The landmark Esteva et al. 2017 Nature paper on dermatologist-level skin cancer classification used t-SNE to visualize learned features rather than LIME, but follow-up dermatology research has applied LIME to similar models to understand which lesion regions and color patterns contribute to malignancy predictions. Stieler and colleagues, for example, embedded the dermatological ABCD rule (asymmetry, border, color, diameter) inside LIME to align machine explanations with clinical heuristics.
Banks use LIME to explain individual credit decisions, both internally for risk officers and externally for adverse-action notices required under regulations such as the US Equal Credit Opportunity Act. It is also used for transaction-level explanations in fraud detection and anti-money-laundering systems, where investigators need to understand why a particular transaction was flagged.
Researchers have used LIME to probe risk-assessment tools such as COMPAS for racial bias. The Slack et al. 2020 attack on LIME used the COMPAS dataset directly, both as a demonstration that biased classifiers can hide behind LIME explanations and as a warning to auditors who rely on LIME alone.
Beyond the husky-wolf example, LIME has been used to find spurious cues in image classifiers, such as background context (grass for cows, sand for camels), watermarks left over from web scraping, capture-device artifacts in medical imaging, and hospital-specific tags burnt into X-ray corners. In 2017 a widely shared analysis of a pneumonia classifier showed that the model had learned to detect the portable X-ray scanner used in intensive care units, not pneumonia itself; LIME-style superpixel masks were one of the tools used to make this visible.
LIME-Text is used to debug sentiment classifiers, spam filters, toxicity detectors, and clinical NLP systems. A common finding is that classifiers trained on user-generated text rely heavily on stylistic markers (punctuation, capitalization, specific function words) rather than content. Recent work has applied LIME to transformer-based models such as BERT, although as model size grows the alignment between LIME explanations and the model's internal computations weakens, suggesting that LIME captures input-level salience rather than mechanistic reasoning in large models.
The interpretability landscape has shifted since 2016. For tabular data, SHAP has largely displaced LIME because of its theoretical guarantees and its highly optimized TreeSHAP implementation. For image classifiers, gradient-based methods such as Grad-CAM, Integrated Gradients, and SmoothGrad are usually faster and produce smoother heatmaps than LIME's superpixel masks; LIME-Image is most useful when the model is genuinely a black box and gradients are unavailable. For text classifiers and especially for transformer language models, LIME-Text still works but is one of several options that includes attention attribution, integrated gradients on token embeddings, and feature ablation.
A different paradigm, mechanistic interpretability, has emerged for understanding large language models. Where LIME asks "which input features moved this prediction," mechanistic interpretability asks "which internal circuits implement this behavior." Anthropic's work on monosemantic features in Claude, Neel Nanda's TransformerLens library, and the broader sparse autoencoder literature pursue questions about mid-network computation that post-hoc methods such as LIME are not designed to answer. The two paradigms address different questions and are complementary rather than competing.
Research from 2024 examined how LIME explanations vary with LLM size and found that bigger models do not produce more plausible LIME explanations, suggesting the local linear surrogate captures something other than the model's internal reasoning as the model grows.
Practitioners who continue to use LIME generally follow a small set of rules of thumb.
The original LIME paper has accumulated more than 15,000 citations on Google Scholar by the mid 2020s and is one of the most cited papers in the explainable AI literature. The Python package on GitHub has more than 11,000 stars and remains in active maintenance. LIME helped establish post-hoc local explanation as a mainstream subfield of machine learning research and influenced regulatory thinking about algorithmic transparency in the EU and elsewhere. It is taught in essentially every introductory interpretability course and ships as a default option in major MLOps platforms.
Its most lasting contribution may be conceptual rather than algorithmic. The husky-versus-wolf example reframed the question of model trust from "is the test accuracy high" to "is the model attending to the right features," and it made the answer accessible to non-machine-learning experts. The follow-up literature, including SHAP, Anchors, and the entire model-agnostic explanation toolkit, builds on the framing LIME introduced.