nnsight

Developer Tools Interpretability Open Source AI

17 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

24 citations

Revision

v3 · 3,334 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

nnsight is an open-source Python library for the interpretation and intervention of deep learning models, developed by the Bau Lab at Northeastern University. It extends PyTorch with a deferred-execution model that lets researchers inspect, modify, save, and differentiate through any intermediate activation, gradient, or parameter of a neural network using a unified API. The same code can run locally on small models or be sent to remote infrastructure for very large open-weight foundation models, in particular through the National Deep Inference Fabric (NDIF), a U.S. National Science Foundation funded service that hosts models such as Llama 3.1 405B and DeepSeek-R1 for academic research.^[1]^[2]^[3]

nnsight and NDIF were introduced jointly in the paper "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals," presented at the International Conference on Learning Representations (ICLR) 2025.^[4]^[5] The library is released under the MIT License and is maintained on GitHub under the ndif-team/nnsight repository.^[6]

Background and motivation

Research on the internals of large neural networks, which is broadly known as mechanistic interpretability, historically required two things that were difficult to combine: full white-box access to a model's weights and activations, and enough computing capacity to actually run that model. Commercial application programming interfaces from major model providers offer scale but expose only black-box inputs and outputs, with no way to read or write intermediate activations. Conversely, open-weight models can be inspected freely, but the largest of them require expensive multi-GPU hardware that is unavailable to most academic groups.^[4]^[7]

The nnsight authors documented this gap empirically by surveying 184 interpretability papers and reporting that 60.6 percent of work published since February 2023 was still being carried out on models with less than 40 percent on the Massive Multitask Language Understanding (MMLU) benchmark, even though much larger open-weight models had been released during that period.^[4] The stated goal of nnsight together with NDIF is to close this gap by providing a single library that gives transparent access to model internals whether the user is running a 124-million-parameter model locally or a 405-billion-parameter model on shared infrastructure.^[4]^[5]

The project is led by David Bau, an Assistant Professor of Computer Science at Northeastern University's Khoury College of Computer Sciences, whose earlier work includes Network Dissection and the ROME method for editing factual associations in language models.^[8] Co-principal investigators on NDIF include Byron Wallace, Arjun Guha, Jonathan Bell, and Carla Brodley, all at Northeastern University.^[9]

Architecture

Tracing context

The central abstraction in nnsight is the tracing context, entered with a Python with statement on a model's .trace() method. Inside this context, attribute access on the wrapped model does not immediately invoke a forward pass. Instead, it constructs nodes in an intervention graph, a bipartite directed acyclic graph of variable nodes and apply nodes that represents the experiment the researcher wants to run.^[4]^[10] When the context exits, the intervention graph is interleaved with the model's native forward pass: PyTorch hooks fire at the appropriate modules and either read values out, write modified values in, or both.^[4]

Each submodule of the wrapped model is recursively wrapped in an Envoy object, which exposes .input and .output attributes. These attributes are not concrete tensors; they are Proxy values, placeholders that stand in for activations that will exist only when the forward pass actually runs.^[4]^[10] To persist a value beyond the tracing context, a user calls .save() on the relevant proxy. This explicit save model gives nnsight precise control over which intermediate tensors to retain, which is important when intervening on very large models where naive caching would exhaust memory.^[10]

A minimal example illustrates the pattern:

from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto")

with model.trace("The Eiffel Tower is in"):
    hidden = model.transformer.h[-1].output[0].save()
    logits = model.output.save()

Here hidden and logits become available as concrete tensors after the with block returns.^[6]^[10]

Lazy execution and proxy operations

Operations on proxies, such as arithmetic, indexing, or torch function calls, themselves return new proxies. This lazy execution model means an entire experimental procedure can be described as a graph that is compiled and then executed in a single forward pass.^[4]^[11] During graph construction, nnsight uses PyTorch's FakeTensor mechanism to validate shapes and dtypes without allocating real memory, which surfaces errors early.^[7]

The deferred-execution design distinguishes nnsight from libraries that rely on imperative hooks attached to running models. Because the graph is a first-class object, it can be optimized, for example by dead-node elimination, serialized for transport, or rebuilt on a different machine, which is what enables nnsight's remote-execution mode.^[4]^[7]

Worker threads and hook synchronization

Internally, the code inside a tracing context is not simply transformed into a static graph. The user's block is extracted at the Python abstract syntax tree level, compiled into a function, and executed in a worker thread. When that thread reads a proxy attribute such as .output, it blocks until the model's actual forward pass reaches the corresponding module, at which point a PyTorch forward hook fires and hands the live tensor to the worker. Conversely, when the user assigns a new value back to a proxy attribute, the hook intercepts the running activation and substitutes the modified tensor before computation continues.^[21] The combination of worker threads and hooks is what lets nnsight present an imperative, Pythonic interface while still controlling execution at a fine-grained level, and it is the bridge between the lazy intervention graph and the eager forward pass.^[21]

Intervention graph internals

The intervention graph itself is a bipartite directed acyclic graph: variable nodes stand for values, and apply nodes stand for operations. Connections between the original computation graph and the intervention graph are mediated by getters, which read activations out, and setters, which write modified activations back; the acyclic constraint prevents the cycles that would otherwise arise if a setter's output fed an upstream getter. The graph supports incremental optimizations such as dead-node elimination, where unused branches are pruned before execution, and it can be serialized to a custom JSON format for transport to a remote backend.^[4]^[11]

Invoker context and batched interventions

When a single tracing context needs to operate on multiple inputs, for example to perform activation patching where an activation from one prompt is substituted into the forward pass of another, nnsight provides nested invoker contexts:

with model.trace() as tracer:
    with tracer.invoke("Paris is in"):
        clean_hidden = model.transformer.h[5].output[0].save()
    with tracer.invoke("Berlin is in"):
        model.transformer.h[5].output[0][:] = clean_hidden
        patched_logits = model.output.save()

Each invoker block describes the value of .input and .output proxies for that specific prompt; the underlying engine batches the invocations into a single forward pass while keeping the per-prompt computations causally separated.^[11]

National Deep Inference Fabric

The National Deep Inference Fabric (NDIF) is a shared inference service that executes nnsight requests on hosted models too large for typical researcher hardware. NDIF was established through a U.S. National Science Foundation award of approximately 9 million dollars (Award IIS-2408455) announced on May 2, 2024, with David Bau as principal investigator and Northeastern University as the awardee institution.^[3]^[12]^[13] Compute for the service runs on the Delta and DeltaAI systems at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign.^[2]^[12]

System architecture

Architecturally, an nnsight client serializes its intervention graph to a custom JSON-based format and transmits it to an NDIF frontend server. The backend hosts one or more model instances, each pinned to a dedicated set of GPU nodes. For the largest models, weights are sharded across many GPUs using tensor parallelism, and the frontend uses the Ray framework's Global Control Service to route incoming requests to the appropriate head node.^[4]^[7] Because interventions execute server side rather than requiring activations to round-trip back to the client, NDIF avoids the prohibitive bandwidth cost that would otherwise dominate remote interpretability workloads.^[4]

To transition a script from local to remote execution, the user typically only needs to set remote=True on the trace call; the rest of the API is identical.^[7]^[11]

Performance and benchmarks

The paper's benchmarks show that NDIF's break-even point against running a model on a local high-performance computing allocation occurs at roughly three billion parameters, beyond which avoiding repeated model-loading overhead more than compensates for the constant communication latency.^[4] Compared to the Petals decentralized inference system, NDIF and Petals are similar for plain inference, but NDIF is substantially faster for interventions because hidden states do not have to be transferred between client and server.^[4] In direct comparisons against TransformerLens, pyvene, and baukit on GPT-2 XL, Gemma 7B, and Llama 3.1 8B, nnsight produces comparable runtimes across activation patching and attribution patching tasks; TransformerLens's standardization preprocessing makes its weight-loading step roughly three times slower than the other three libraries.^[4]

Access and governance

Access to NDIF is granted via a free API key obtained through registration at login.ndif.us, with applications required for the very largest models such as Llama 3.1 405B.^[14] NDIF is run in partnership with the Public Interest Technology University Network (PIT-UN), a coalition of 63 universities and colleges that supports outreach and training, and is overseen by an external advisory board described in the project's about page as spanning machine learning, humanities, technology policy, and supercomputing.^[9] The grant also funds workforce development activities, including training of graduate students and dissemination of educational materials to disciplines affected by large-scale AI, as part of the same NSF program that established the fabric.^[13]

API and usage examples

The two primary user-facing classes are NNsight, which wraps an arbitrary PyTorch nn.Module, and LanguageModel, which adds tokenizer integration and convenience accessors for transformer-style models loaded from the Hugging Face Transformers library.^[6]^[10] Version 0.6, released February 26, 2026, added VisionLanguageModel (for models such as LLaVA that integrate image preprocessing alongside tokenization) and DiffusionModel (which supports both U-Net-based architectures such as Stable Diffusion and transformer-based architectures such as FLUX.1 and the Diffusion Transformer).^[15]

Saving activations

from nnsight import LanguageModel

model = LanguageModel("meta-llama/Meta-Llama-3.1-8B")

with model.trace("The capital of France is"):
    layer_5 = model.model.layers[5].output[0].save()
    final = model.output.save()

Modifying activations

Assignment to a proxy attribute writes a new value during the forward pass:

with model.trace("The capital of France is"):
    model.model.layers[5].output[0][:] = 0
    zeroed_out = model.output.save()

This pattern underlies activation patching, ablation studies, and steering-vector experiments.^[4]^[11]

Computing gradients with respect to intermediate values

Because the intervention graph is differentiable, proxies can have .grad attached and saved like activations, which makes attribution-style analyses such as attribution patching straightforward.^[4]^[11]

Remote execution

from nnsight import LanguageModel

llama = LanguageModel("meta-llama/Meta-Llama-3.1-405B-Instruct")

with llama.trace("Interpretability is", remote=True):
    layer_50 = llama.model.layers[50].output[0].save()

Only the remote=True flag distinguishes this from a local trace; the model is never downloaded to the client.^[7]^[11]

Sparse autoencoder and adapter support

Version 0.6 added the ability to upload sparse autoencoder and LoRA adapter modules with a remote request so that feature-level analysis or low-rank adaptations can be performed against NDIF-hosted models without downloading the underlying weights. nnsight serializes user code and rebuilds it on the server, even for packages not installed on NDIF.^[15]

Other features

The library supports generation-time interventions including early stopping, token streaming via vLLM, cross-prompt intervention through nested invoker blocks, and model editing through assignment to module parameters; vLLM integration in 0.6 covers single-GPU, tensor-parallel, Ray-distributed, and multi-node configurations.^[6]^[15]

Supported models

Locally, nnsight can wrap any PyTorch nn.Module, and LanguageModel accepts any causal or seq2seq model loadable through Hugging Face Transformers, including GPT-2, the various sizes of Llama 3.1, Gemma, and Qwen families.^[4]^[6] On the remote side, NDIF's hosted catalog includes multiple sizes of Llama 3.1 (with Llama 3.1 405B available subject to application approval) and DeepSeek-R1 models, with the current list published at nnsight.net/status/.^[14] Version 0.6 adds first-class support for vision-language models such as LLaVA and for diffusion models built on either U-Nets (Stable Diffusion) or transformer backbones (FLUX, DiT).^[15]

Comparison to other libraries

nnsight is one of several libraries used in mechanistic interpretability research. The most frequently cited comparison is to TransformerLens, an earlier library focused specifically on decoder-only transformer language models. The nnsight paper benchmarks both libraries alongside pyvene and baukit on GPT-2 XL, Gemma 7B, and Llama 3.1 8B, and reports that all four frameworks achieve comparable runtimes for activation patching and attribution patching, although TransformerLens loads weights roughly three times more slowly than the alternatives because it preprocesses them into a standardized form.^[4]

The principal architectural differences are:

Scope. nnsight wraps any PyTorch module, so it has been used on language models, diffusion models, and vision-language models alike, whereas TransformerLens targets transformer language models specifically.^[4]^[15]
Execution model. nnsight's tracing context is lazy and produces a first-class intervention graph; TransformerLens uses eager hooks. The lazy graph is what permits serialization and remote execution.^[4]
Remote inference. nnsight is the only widely used interpretability library that integrates with a hosted backend (NDIF) for large open-weight models such as Llama 3.1 405B.^[4]^[7]
Ergonomics. TransformerLens offers a richer set of named hook points and a built-in activation cache that some researchers find more convenient for small models; nnsight requires users to know the underlying module structure of the wrapped model.^[16] The companion library nnterp, also from the Bau Lab, adds a TransformerLens-style standardized interface on top of nnsight for common transformer architectures.^[16]^[17]

Notable research and use cases

Although nnsight is a recent library, it has been adopted in interpretability research published since 2024. Examples documented in the project's own paper and in subsequent literature include:

The original NNsight and NDIF paper's case studies, which include activation patching on the Indirect Object Identification task in GPT-2, attribution patching with gradients through the intervention graph, and large-model experiments on Llama 3.1 70B and 405B.^[4]
"Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders," which released 256 sparse autoencoders trained per layer and sublayer of Llama 3.1 8B and provides training, interpretation, and visualization tooling intended to interoperate with the nnsight and NDIF stack.^[18]
An NNsight implementation of AtP* (Kramar et al., 2024), an efficient attribution-patching variant for circuit discovery in language models, made publicly available as a reference implementation.^[19]
"RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching," which uses interpretation tooling in the nnsight family to identify and evaluate circuits.^[20]
nnterp, a companion library that builds on nnsight to provide a standardized TransformerLens-like interface across transformer architectures so that interpretability methods can be written once and applied to GPT-2, Llama, Gemma, or Qwen without per-model reimplementation.^[17]

Other lines of research from the Bau Lab and collaborators that have used nnsight or are closely associated with it include the analysis of function vectors in language models, the study of how fine-tuning enhances pre-existing mechanisms, and the linearity of relation decoding in transformers, all presented at ICLR 2024.^[8] The library's authors also documented its applicability beyond text models in the 0.6 release, where it was demonstrated on vision-language and diffusion architectures, opening interpretability research on these classes to the same workflow used for language models.^[15]

The infrastructural argument for nnsight is supported by a quantitative literature survey reported in the project's paper. By analyzing 184 interpretability publications, the authors found that the majority of recent work has continued to use models substantially smaller than the open-weight state of the art, which they attribute to barriers around access to compute and white-box internals rather than to a lack of interesting research questions at larger scales.^[4]

Funding, license, and governance

NDIF is funded by NSF Award IIS-2408455, awarded to Northeastern University with David Bau as principal investigator and announced on May 2, 2024.^[3]^[13] The award is approximately 9 million U.S. dollars over its term, and a parallel NSF grant funds interdisciplinary research on the societal impacts of generative AI using the same fabric.^[3]^[13]

The nnsight library itself is distributed under the MIT License and developed in the open on GitHub at ndif-team/nnsight, with releases also published to the Python Package Index as nnsight.^[6] The current version as of May 2026 is 0.7.0, released on May 5, 2026, with 0.6 having added vision-language and diffusion model support, vLLM-based generation, remote sparse autoencoder and LoRA adapter execution, and significant performance improvements over the previous 0.5 series. Benchmarks reported with the 0.6 release showed approximately 3.9 times faster empty traces and 2.4 times faster traces with multiple saves compared to version 0.5.15, due to changes such as trace caching, persistent process mounting, and more selective copying of Python globals.^[15]

Team

Day-to-day engineering is led by Jaden Fiotto-Kaufman as principal software engineer, with Michael Ripa and Adam Belfki as research software engineers and Emma Bortz as technical outreach manager. Open-source community engagement is coordinated by co-principal investigator Jonathan Bell; outreach and training are coordinated by co-principal investigator Carla Brodley in conjunction with the Public Interest Technology University Network.^[9]

Authorship of the ICLR 2025 paper

The ICLR 2025 publication lists twenty authors: Jaden Fried Fiotto-Kaufman, Alexander Russell Loftus, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla E. Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, and David Bau. The paper was accepted as a poster at ICLR 2025 and the preprint, arXiv:2407.14561, has gone through four revisions between July 2024 and April 2025.^[4]^[5]

Open development

Issues, discussions, and contributions are coordinated through the ndif-team/nnsight GitHub repository, and the parallel ndif-team/ndif repository hosts the server-side code that performs deep inference and serves nnsight requests remotely.^[6]^[22] The project's documentation, tutorials, and announcements are published at nnsight.net, and an instance status page at nnsight.net/status/ lists the models currently hosted by NDIF along with their availability.^[1]^[14]

References

nnsight project website, "NNsight." https://nnsight.net/ Accessed 2026-05-19. ↩
National Deep Inference Fabric, "NSF National Deep Inference Fabric." https://ndif.us/fabric.html Accessed 2026-05-19. ↩
U.S. National Science Foundation, "New NSF grant targets large language models and generative AI, exploring how they work and implications for societal impacts." https://www.nsf.gov/news/new-nsf-grant-targets-large-language-models-generative-ai Accessed 2026-05-19. ↩
Fiotto-Kaufman, J., Loftus, A. R., Todd, E., Brinkmann, J., Pal, K., Troitskii, D., Ripa, M., Belfki, A., Rager, C., Juang, C., Mueller, A., Marks, S., Sen Sharma, A., Lucchetti, F., Prakash, N., Brodley, C., Guha, A., Bell, J., Wallace, B. C., and Bau, D., "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals," arXiv:2407.14561. https://arxiv.org/abs/2407.14561 Accessed 2026-05-19. ↩
OpenReview, "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals" (ICLR 2025 Conference, Poster). https://openreview.net/forum?id=MxbEiFRf39 Accessed 2026-05-19. ↩
ndif-team, "nnsight" GitHub repository. https://github.com/ndif-team/nnsight Accessed 2026-05-19. ↩
arXiv full text of "NNsight and NDIF: Democratizing Access to Foundation Model Internals" (HTML version v1). https://arxiv.org/html/2407.14561v1 Accessed 2026-05-19. ↩
Bau Lab, "David Bau: Interpretation of Deep Networks." https://baulab.info/ Accessed 2026-05-19. ↩
National Deep Inference Fabric, "About NDIF." https://ndif.us/about.html Accessed 2026-05-19. ↩
nnsight documentation, "Walkthrough." https://nnsight.net/notebooks/tutorials/walkthrough/ Accessed 2026-05-19. ↩
arXiv full text of "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals" (HTML version v4). https://arxiv.org/html/2407.14561v4 Accessed 2026-05-19. ↩
HPCwire, "New NSF Grant to Establish National Deep Inference Fabric, Enhancing US Research on AI Transparency." https://www.hpcwire.com/off-the-wire/new-nsf-grant-to-establish-national-deep-inference-fabric-enhancing-us-research-on-ai-transparency/ Accessed 2026-05-19. ↩
Northeastern University, "NSF funds groundbreaking research project led by Northeastern to 'democratize' artificial intelligence." https://news.northeastern.edu/2024/05/02/nsf-funds-democratizing-ai-research/ Accessed 2026-05-19. ↩
National Deep Inference Fabric, "Getting Started." https://ndif.us/start.html Accessed 2026-05-19. ↩
nnsight blog, "Introducing NNsight 0.6," February 26, 2026. https://nnsight.net/blog/2026/02/26/introducing-nnsight-06/ Accessed 2026-05-19. ↩
Learn Mechanistic Interpretability, "nnsight and nnterp." https://learnmechinterp.com/topics/nnsight-and-nnterp/ Accessed 2026-05-19. ↩
Bricout, C., et al., "nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers," arXiv:2511.14465. https://arxiv.org/abs/2511.14465 Accessed 2026-05-19. ↩
He, Z., et al., "Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders," arXiv:2410.20526. https://arxiv.org/abs/2410.20526 Accessed 2026-05-19. ↩
koayon, "atp_star: PyTorch and NNsight implementation of AtP\* (Kramar et al 2024, DeepMind)." https://github.com/koayon/atp_star Accessed 2026-05-19. ↩
"RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching," arXiv:2508.21258. https://arxiv.org/abs/2508.21258 Accessed 2026-05-19. ↩
Learn Mechanistic Interpretability, "nnsight and nnterp" (description of nnsight's worker-thread and hook-based deferred execution scheme). https://learnmechinterp.com/topics/nnsight-and-nnterp/ Accessed 2026-05-19. ↩
ndif-team, "ndif" GitHub repository (NDIF server, which performs deep inference and serves nnsight requests remotely). https://github.com/ndif-team/ndif Accessed 2026-05-19. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

TransformerLens

Background and motivation

Architecture

Tracing context

Lazy execution and proxy operations

Worker threads and hook synchronization

Intervention graph internals

Invoker context and batched interventions

National Deep Inference Fabric

System architecture

Performance and benchmarks

Access and governance

API and usage examples

Saving activations

Modifying activations

Computing gradients with respect to intermediate values

Remote execution

Sparse autoencoder and adapter support

Other features

Supported models

Comparison to other libraries

Notable research and use cases

Funding, license, and governance

Team

Authorship of the ICLR 2025 paper

Open development

See also

References

Improve this article

Related Articles

TransformerLens

Hugging Face

LangChain

Ollama

LlamaIndex

PyTorch

What links here

Related Articles

TransformerLens

Hugging Face

LangChain

Ollama

LlamaIndex

PyTorch