# TransformerLens

> Source: https://aiwiki.ai/wiki/transformerlens
> Updated: 2026-06-25
> Categories: Developer Tools, Interpretability, Open Source AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**TransformerLens** is an open-source Python library for the [mechanistic interpretability](/wiki/mechanistic_interpretability) of GPT-style language models. It loads a pretrained transformer such as [GPT-2](/wiki/gpt-2), exposes every internal activation through a network of named hook points, and lets researchers cache, inspect, edit, ablate, and replace those intermediate computations during a forward pass. The library was created by interpretability researcher Neel Nanda and originally released under the name EasyTransformer; it is distributed under the MIT License and, as of 2026, is maintained by Bryce Meyer and Jonah Larson through the TransformerLensOrg GitHub organization, where it has accumulated roughly 3,600 stars and over 600 forks.[^1][^2]

TransformerLens has become the de facto standard tooling for [mechanistic interpretability](/wiki/mechanistic_interpretability) research, and the hook-point naming conventions it introduced now appear in a large fraction of published interpretability papers and blog posts. The library's own README states that "the goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms the model learned during training from its weights," and describes itself simply as "a library for doing mechanistic interpretability of GPT-2 Style language models."[^1] Its `HookedTransformer` class (and, from version 3.0, the `TransformerBridge` adapter) wraps a [transformer](/wiki/transformer) language model with hook points that expose every residual stream addition, attention pattern, MLP activation, and projection in the model, so that built-in methods such as `run_with_cache` and `run_with_hooks` can perform causal interventions, [activation patching](/wiki/activation_patching), direct logit attribution, and other interpretability techniques.[^1][^3]

## What is TransformerLens used for?

TransformerLens is used to reverse-engineer the internal algorithms of [transformer](/wiki/transformer) language models, a research program known as [mechanistic interpretability](/wiki/mechanistic_interpretability). In practice this means loading an open-weight model, running text through it, and then reading or rewriting the model's intermediate tensors to test hypotheses about how a behavior is computed. Typical workflows include locating the attention circuit responsible for a task (for example the indirect-object-identification circuit in GPT-2 Small), identifying [induction heads](/wiki/induction_heads) that implement in-context copying, performing activation patching to measure which components are causally necessary for a prediction, and decomposing the model's output logits into the contributions of individual heads and layers. Because the library exposes a stable, named hook for every interesting tensor, these interventions can be expressed in a few lines of code without modifying the underlying model.[^1][^8][^9]

## History and origins

### EasyTransformer (2022)

The project that became TransformerLens began life in 2022 as **EasyTransformer**, a [PyTorch](/wiki/pytorch) reimplementation of GPT-style transformers tailored for interpretability work. Neel Nanda, who had previously worked on the interpretability team at [Anthropic](/wiki/anthropic), created EasyTransformer because existing open-source tooling did not expose model internals in a form convenient for mechanistic analysis. A fork of an early EasyTransformer code base was used by Arthur Conmy, Alexandre Variengien, Kevin Wang, and Jacob Steinhardt at Redwood Research for their 2022 paper "Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small." A snapshot of that fork remains available on GitHub under the `redwoodresearch/Easy-Transformer` repository, which explicitly notes that the project is a one-time code release and recommends that interested researchers use TransformerLens instead.[^4][^5]

### When was TransformerLens released?

The library was renamed **TransformerLens** ahead of its first stable release on the Python Package Index. Version 1.0.0 was published to PyPI on 16 January 2023 with Neel Nanda listed as the author and the MIT License attached.[^6] The recommended academic citation for the library names Neel Nanda and Joseph Bloom as authors and uses 2022 as the project year, reflecting the EasyTransformer origin date even though the rename and first tagged release occurred in early 2023. The exact BibTeX entry on the documentation site reads `@misc{nanda2022transformerlens, title = {TransformerLens}, author = {Neel Nanda and Joseph Bloom}, year = {2022}, ...}`.[^7]

### Transition to TransformerLensOrg

After Nanda joined Google DeepMind, day-to-day maintenance of the library moved to a dedicated GitHub organization, `TransformerLensOrg`. The README now lists the project as "Maintained by Bryce Meyer and Jonah Larson," who coordinate releases, code review, and roadmap discussions as of the v3.x series, while the PyPI metadata lists the TransformerLensOrg organization as the maintainer contact.[^1][^2]

## How does TransformerLens work?

TransformerLens replaces standard transformer modules with versions instrumented for interpretability. The core idea is to insert a small no-op module called a `HookPoint` at every place inside the network where an interpretability researcher might want to read or write an intermediate tensor.

### HookPoint and HookedRootModule

A `HookPoint` is implemented as a PyTorch module whose `forward` method is the identity function: it returns its input unchanged. Its purpose is to provide a stable, named location in the model graph at which forward and backward hooks can be attached. Because every interesting tensor (the embeddings, the queries, keys, values, attention scores and patterns, attention output, MLP pre- and post-activations, and the residual stream additions for each block) flows through a dedicated `HookPoint`, an interpretability researcher can refer to any of these tensors by name and intervene on it without modifying the underlying model code.[^8]

`HookedRootModule` is the parent class shared by the library's instrumented models. It provides utilities for registering hooks across the whole model, removing them again at the end of a context, and managing nesting so that hooks attached at one level can be cleared without disturbing hooks at another level. Each registered hook is wrapped in a `LensHandle` that records metadata about whether the hook is permanent and at which context level it was registered.[^8]

### HookedTransformer

The flagship class historically exposed by the library is `HookedTransformer`. It is a from-scratch PyTorch implementation of a generic decoder-only transformer that exposes hook points at all of the locations described above. When a pretrained model is loaded via `HookedTransformer.from_pretrained(...)`, TransformerLens downloads the Hugging Face weights for the requested model and rewrites them into the library's standardized parameter layout. This "weight standardization" step is what allows GPT-2, Pythia, Llama, Gemma, Mistral, and many other architectures to share the same interpretability API, but it also means that initial model loading with TransformerLens can take noticeably longer than loading the same model directly through Hugging Face Transformers.[^1][^9]

### ActivationCache

Cached activations are returned wrapped in an `ActivationCache` object. The cache behaves like a dictionary keyed by hook name, but it also provides convenience methods that are common in interpretability work: stacking activations across layers into a single tensor, applying layer norm folding, decomposing the residual stream into per-head contributions, computing direct logit attributions, and slicing along the batch or position dimensions. These helpers reflect a deliberate design choice to make common analysis patterns one-line operations rather than ad-hoc tensor manipulation.[^9]

### TransformerBridge (v3.0+)

Version 3.0 of the library, released in April 2026, introduced a new abstraction called **TransformerBridge**. Instead of re-implementing every architecture from scratch inside TransformerLens, the bridge wraps a Hugging Face model in place and attaches HookPoints to its existing modules through adapters. The stated goal of this change is to expand model coverage dramatically while reducing the maintenance burden of keeping bespoke implementations in sync with upstream model releases. According to the README, TransformerBridge is the recommended 3.0 path and "supports 9,000+ models across 50+ architecture families"; `HookedTransformer` is deprecated as of 3.0 but remains available through a compatibility layer for the duration of the 3.x branch.[^1][^10]

## Which models does TransformerLens support?

TransformerLens supports a wide range of open-source decoder-only language models. The library exposes a list called `OFFICIAL_MODEL_NAMES` that enumerates the model identifiers it can load directly, and it also accepts arbitrary Hugging Face checkpoints that share an architecture with one of those families. Documented supported families include:[^1][^11]

- **GPT-2** (all four sizes: small, medium, large, xl), the canonical example used throughout the documentation; see [GPT-2](/wiki/gpt-2).
- **Pythia** (EleutherAI), with checkpoints ranging from 14M to 12B parameters; widely used because EleutherAI publishes hundreds of training checkpoints, enabling developmental interpretability research.
- **LLaMA**, **Llama 2**, **Llama 3 / 3.1 / 3.2 / 3.3** (Meta).
- **Mistral** and **Mixtral** (Mistral AI).
- **Gemma**, **Gemma 2**, **Gemma 3** ([Google DeepMind](/wiki/google_deepmind)).
- **Qwen**, **Qwen2**, **Qwen3** (Alibaba).
- **Phi** and **Phi-3** (Microsoft).
- **GPT-J**, **GPT-Neo**, **GPT-NeoX** (EleutherAI).
- **BLOOM**, **OPT**, **CodeLlama**.
- Encoder models such as **BERT** and **T5**, and audio models such as **HuBERT** and **Wav2Vec2** through more recent bridges.
- Experimental support for state-space models, including **Mamba-1** and **Mamba-2**.

The TransformerLensOrg release notes for the v3.0 transition describe the TransformerBridge mechanism as expanding compatibility from roughly 200 directly supported models to approximately 9,000 models when including all sub-variants on Hugging Face, although the documentation cautions that only a subset are formally verified.[^10]

## What are the key TransformerLens APIs?

The two methods most often used in TransformerLens workflows are `run_with_cache` and `run_with_hooks`. These are inherited from `HookedRootModule` and exposed on both `HookedTransformer` and `TransformerBridge`.

### run_with_cache

`run_with_cache(input)` runs a forward pass and returns a tuple of `(logits, ActivationCache)`. Every tensor that flows through a `HookPoint` is recorded under its hook name. A minimal example, drawn from the library's quick-start documentation, looks like this:[^1]

```python
from transformer_lens.model_bridge import TransformerBridge

bridge = TransformerBridge.boot_transformers("gpt2", device="cpu")
logits, cache = bridge.run_with_cache("Hello World")
```

The returned `cache` can be queried with names such as `cache["blocks.5.attn.hook_pattern"]` to obtain the attention pattern at layer 5, or with helpers such as `cache.stack_head_results()` to assemble a tensor containing the contribution of every attention head across every layer.[^9]

### run_with_hooks

`run_with_hooks(input, fwd_hooks=[...], bwd_hooks=[...])` runs a forward (and optionally backward) pass while temporarily attaching user-supplied hook functions. Each hook receives the activation tensor and a `HookPoint` object as keyword arguments, and may return a modified tensor that is substituted in place. By default, hooks are removed when the call returns, leaving the model in its original state. This API is the primary mechanism for [activation patching](/wiki/activation_patching), zero or mean ablations, and other causal interventions.[^8]

### Additional utilities

Beyond the two headline methods, TransformerLens exposes a number of utilities that are heavily used in research code:

- `to_tokens` and `to_str_tokens` for tokenization and detokenization with offsets, supporting per-token interpretability.
- `tokens_to_residual_directions`, which projects vocabulary tokens into the residual stream basis so that researchers can decompose logit attributions in the embedding space.
- Layer-norm "folding" that absorbs the affine parameters of layer normalization into adjacent linear layers, simplifying analyses that treat layer norm as a fixed rescaling.
- Stateful generation utilities that combine cache reuse with hook-based interventions.[^1][^9]

## Notable research using TransformerLens

The library's official documentation maintains a gallery of papers that use it. Representative examples include:[^12]

- **"Progress Measures for Grokking via Mechanistic Interpretability"** (ICLR Spotlight, 2023) by Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith and Jacob Steinhardt, which reverse-engineered the algorithm a small transformer learns to perform modular addition.
- **"Towards Automated Circuit Discovery for Mechanistic Interpretability"** (Conmy et al., 2023), which introduced the ACDC algorithm for automatically identifying minimal circuits responsible for a given behavior and was implemented on top of TransformerLens.
- **"Finding Neurons in a Haystack: Case Studies with Sparse Probing"** (Gurnee, Nanda, Pauly, Harvey, Troitskii, Bertsimas, 2023), which used TransformerLens to study how specific concepts are encoded across MLP neurons.
- **"A Toy Model of Universality"** (Chughtai, Chan, Nanda, ICML 2023), which examined whether networks trained on the same task converge to the same internal algorithm.
- **"Actually, Othello-GPT Has A Linear Emergent World Representation"** by Neel Nanda, which reanalyzed Kenneth Li and colleagues' Othello-GPT model and used TransformerLens activations to argue for a linear representation of the board state.
- **"Eliciting Latent Predictions from Transformers with the Tuned Lens"** (Belrose, Furman, Smith, Halawi, Ostrovsky, McKinney, Biderman, Steinhardt, 2023), which extended the logit-lens technique by training affine probes on intermediate residual streams.
- **"N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models"** (Foote, Nanda, Kran, Konstas, Barez, ICLR RTML Workshop 2023).
- **"A circuit for Python docstrings in a 4-layer attention-only transformer"** (Heimersheim and Janiak), a community write-up published on the Alignment Forum.

The "Interpretability in the Wild" paper on indirect object identification, which originally relied on the EasyTransformer fork, is the most-cited demonstration of the circuit-analysis style of work that TransformerLens enables. Its tutorial-style replication using `run_with_hooks` activation patching is one of the canonical examples taught in the official documentation.[^4][^12]

Outside the gallery, TransformerLens has been used as the modeling layer for replications of "Locating and Editing Factual Associations in GPT" (the ROME paper by Meng, Bau, Andonian, and Belinkov), as well as in feature-level circuit work and in studies of refusal and steering behaviors. The original ROME implementation was a separate code base, but a number of follow-up papers and reproductions have reimplemented its causal tracing procedure on top of TransformerLens because the library's hook system makes the required activation surgery straightforward.[^13]

## How does TransformerLens compare to nnsight and Captum?

Several other libraries occupy adjacent positions in the interpretability tooling landscape. TransformerLens is most often compared to **nnsight** and **Captum**.

### nnsight and NDIF

[nnsight](/wiki/nnsight), developed at the Northeastern University National Deep Inference Fabric (NDIF), operates on arbitrary PyTorch networks rather than only on transformers. Its design centres on building a serializable "intervention graph" that can be sent to a remote machine where the target model is already resident in GPU memory; this is the mechanism by which NDIF makes very large open-weight models (such as Llama 3 405B) available to outside researchers. The 2024 paper "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals" explicitly compares the two libraries and concludes that TransformerLens provides a more ergonomic interface for the specific class of models it supports, with named HookPoints and a richer cache abstraction, while nnsight is more general and integrates directly with the unmodified Hugging Face model. The same paper notes that TransformerLens model loading is approximately three times slower than the alternatives it benchmarks, attributing the difference to TransformerLens's weight-standardization preprocessing. The paper's overall recommendation is that the libraries are complementary: TransformerLens for rapid exploration on smaller models, nnsight when exact Hugging Face behaviour or very large model access is required.[^14]

### Captum

`Captum`, developed by Meta, is a general-purpose interpretability library for PyTorch. It implements feature-attribution methods such as integrated gradients, saliency maps, SmoothGrad, layer-wise relevance propagation, and shapley value sampling, along with evaluation metrics for those methods. Captum is broader in scope than TransformerLens but does not provide an analogue to the named hook-point graph that makes circuit-style work convenient: it is oriented toward attribution and importance scoring rather than toward causal interventions and circuit discovery on transformer internals.[^15]

### SAELens

Although not a direct alternative, **SAELens** is a closely related library in the same ecosystem. SAELens trains and serves [sparse autoencoders](/wiki/sparse_autoencoder) on the activations of language models, and historically integrated with TransformerLens through a class called `HookedSAETransformer`. TransformerLens v2.0 removed that class and migrated the corresponding functionality to SAELens itself, formalizing the division of responsibilities: TransformerLens handles transformer instrumentation, SAELens handles sparse-feature analysis. SAELens is maintained by Joseph Bloom and collaborators, and it reuses TransformerLens hook names to identify the sites at which SAEs are trained.[^1][^16]

## Community and ecosystem

### ARENA

The largest organized teaching context for TransformerLens is the **Alignment Research Engineer Accelerator (ARENA)**, an in-person and online curriculum led by Callum McDougall. Chapter 1 of the ARENA curriculum, "Transformer Interpretability," walks students through building a transformer from scratch in PyTorch, then transitions to TransformerLens for circuit analysis, including locating [induction heads](/wiki/induction_heads) in a two-layer model and reproducing the indirect-object-identification circuit in GPT-2 Small. Later chapters cover feature superposition, sparse autoencoders (via SAELens), and activation-vector steering. All ARENA materials are released freely on GitHub.[^17]

### AI Safety Camp and other community programs

TransformerLens is also widely used in the project portfolio of the **AI Safety Camp (AISC)**, an annual remote research program. AISC 2024 ran multiple mechanistic-interpretability streams whose project summaries explicitly reference TransformerLens as the default tool for accessing model internals, alongside related programs and reading lists organized through Apart Research and the Alignment Forum.[^18]

### Documentation, tutorials, and demos

The official documentation site at `transformerlensorg.github.io/TransformerLens` hosts a substantial collection of Jupyter-notebook demos, including a "Main Demo" walking through model loading, caching, and hook installation; demos for specific architectures such as Llama and BERT; and a curated "Getting Started in Mechanistic Interpretability" page that links to Neel Nanda's "200 Concrete Open Problems," his glossary, and his paper-walkthrough video channel. The same page advertises the field as one with a low barrier to entry and emphasises that, in the maintainers' view, low participation rather than technical difficulty explains many of the discipline's unsolved problems.[^9][^19]

### Related interpretability infrastructure

TransformerLens is frequently combined with tooling for visualising and sharing the features it helps researchers discover. **Neuronpedia**, a web platform for browsing sparse-autoencoder features, ingests SAEs trained with SAELens and indexes them by the TransformerLens hook names at which they were trained, so a feature discovered in a notebook can be linked to its public Neuronpedia entry with no additional metadata. The library is also commonly used in conjunction with the open-source **circuitsvis** package for inline visualisations of attention patterns inside Jupyter notebooks, and with [attribution graphs](/wiki/attribution_graphs) constructed via causal patching of cached activations between forward passes.[^16]

## Is TransformerLens open source?

Yes. TransformerLens is distributed under the **MIT License**, as recorded both in the GitHub repository and in the PyPI metadata for the package. The MIT terms permit unrestricted use, modification, and redistribution provided that the copyright notice is preserved.[^1][^6]

Governance is informal but follows a maintainer-led model. Bryce Meyer and Jonah Larson review pull requests and tag releases on behalf of the TransformerLensOrg organization, while broader design discussions happen on GitHub issues and on the mechanistic interpretability Slack community linked from the README. Neel Nanda continues to participate as the project's original author and to publish research that uses the library, but is no longer the day-to-day maintainer.[^1][^2]

### Release cadence

Tagged releases on PyPI follow semantic versioning. Major version transitions to date include version 1.0.0 in January 2023, version 2.0 (which removed `HookedSAETransformer` and required Python 3.10 or newer), and version 3.0 in April 2026 (which introduced `TransformerBridge`). As of mid-2026, the repository's GitHub statistics list approximately 3,600 stars and more than 600 forks.[^1][^2][^10]

The contributing guidelines invite outside pull requests to add new model adapters, expand the demo notebooks, and improve documentation. Adding a new architecture in the v3.x line typically involves writing an adapter that maps the upstream model's modules to the TransformerLens hook-point naming convention and registering the resulting bridge with the loader. Test coverage is run automatically via GitHub Actions, and recent maintenance work has focused on tightening quantisation handling, hardening the continuous-integration pipeline, and pinning dependencies in response to specific upstream security advisories.[^10]

## ELI5: What does TransformerLens do?

Imagine an AI language model as a huge machine with thousands of tiny gears turning inside it as it reads a sentence. Normally you only see what comes out of the machine, not what the gears are doing. TransformerLens is like fitting the machine with little glass windows (called hook points) at every gear, so you can watch each one, write down what it is doing (caching), and even reach in and nudge a gear to see how the answer changes (patching and ablation). Researchers use these windows to figure out the hidden "recipes" the model taught itself, which is the whole point of mechanistic interpretability.[^1][^8]

## See also

- [Mechanistic interpretability](/wiki/mechanistic_interpretability)
- [Activation patching](/wiki/activation_patching)
- [Induction Heads](/wiki/induction_heads)
- [Sparse autoencoder](/wiki/sparse_autoencoder)
- [nnsight](/wiki/nnsight)

## References

[^1]: TransformerLensOrg. "TransformerLens: A library for mechanistic interpretability of GPT-style language models." GitHub README. https://github.com/TransformerLensOrg/TransformerLens. Accessed 2026-06-25.

[^2]: Python Package Index. "transformer-lens" project page. https://pypi.org/project/transformer-lens/. Accessed 2026-06-25.

[^3]: Open Source Initiative. "The MIT License." https://opensource.org/license/mit. Accessed 2026-06-25.

[^4]: Redwood Research. "Easy-Transformer" repository README. https://github.com/redwoodresearch/Easy-Transformer. Accessed 2026-06-25.

[^5]: Wang, Kevin; Variengien, Alexandre; Conmy, Arthur; Shlegeris, Buck; Steinhardt, Jacob. "Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small." arXiv:2211.00593, 2022. https://arxiv.org/abs/2211.00593. Accessed 2026-06-25.

[^6]: Python Package Index. "transformer-lens 1.0.0" release page. https://pypi.org/project/transformer-lens/1.0.0/. Accessed 2026-06-25.

[^7]: TransformerLens Documentation. "Citation." https://transformerlensorg.github.io/TransformerLens/content/citation.html. Accessed 2026-06-25.

[^8]: TransformerLens Documentation. "transformer_lens.hook_points module." https://transformerlensorg.github.io/TransformerLens/generated/code/transformer_lens.hook_points.html. Accessed 2026-06-25.

[^9]: TransformerLens Documentation. "Main Demo Notebook." https://transformerlensorg.github.io/TransformerLens/generated/demos/Main_Demo.html. Accessed 2026-06-25.

[^10]: TransformerLensOrg. "TransformerLens releases page." https://github.com/TransformerLensOrg/TransformerLens/releases. Accessed 2026-06-25.

[^11]: TransformerLens Documentation. "Model Properties Table." https://transformerlensorg.github.io/TransformerLens/generated/model_properties_table.html. Accessed 2026-06-25.

[^12]: TransformerLens Documentation. "Gallery." https://transformerlensorg.github.io/TransformerLens/content/gallery.html. Accessed 2026-06-25.

[^13]: Meng, Kevin; Bau, David; Andonian, Alex; Belinkov, Yonatan. "Locating and Editing Factual Associations in GPT." Advances in Neural Information Processing Systems, 2022. https://arxiv.org/abs/2202.05262. Accessed 2026-06-25.

[^14]: Fiotto-Kaufman, Jaden; et al. "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals." arXiv:2407.14561, 2024. https://arxiv.org/abs/2407.14561. Accessed 2026-06-25.

[^15]: Kokhlikyan, Narine; et al. "Captum: A unified and generic model interpretability library for PyTorch." arXiv:2009.07896, 2020. https://arxiv.org/abs/2009.07896. Accessed 2026-06-25.

[^16]: Bloom, Joseph; et al. "SAELens" repository. https://github.com/decoderesearch/SAELens. Accessed 2026-06-25.

[^17]: McDougall, Callum. "ARENA 3.0" repository. https://github.com/callummcdougall/ARENA_3.0. Accessed 2026-06-25.

[^18]: AI Safety Camp. "AISC 2024 Project Summaries." Alignment Forum. https://www.alignmentforum.org/posts/npkvZG67hRvBneoQ9/aisc-2024-project-summaries-1. Accessed 2026-06-25.

[^19]: TransformerLens Documentation. "Getting Started in Mechanistic Interpretability." https://transformerlensorg.github.io/TransformerLens/content/getting_started_mech_interp.html. Accessed 2026-06-25.

