nnsight
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,336 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,336 words
Add missing citations, update stale details, or suggest a clearer explanation.
nnsight is an open-source Python library for the interpretation and intervention of deep learning models, developed by the Bau Lab at Northeastern University. It extends PyTorch with a deferred-execution model that lets researchers inspect, modify, save, and differentiate through any intermediate activation, gradient, or parameter of a neural network using a unified API. The same code can run locally on small models or be sent to remote infrastructure for very large open-weight foundation models, in particular through the National Deep Inference Fabric (NDIF), a U.S. National Science Foundation funded service that hosts models such as Llama 3.1 405B and DeepSeek-R1 for academic research.[^1][^2][^3]
nnsight and NDIF were introduced jointly in the paper "NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals," presented at the International Conference on Learning Representations (ICLR) 2025.[^4][^5] The library is released under the MIT License and is maintained on GitHub under the ndif-team/nnsight repository.[^6]
Research on the internals of large neural networks, which is broadly known as mechanistic interpretability, historically required two things that were difficult to combine: full white-box access to a model's weights and activations, and enough computing capacity to actually run that model. Commercial application programming interfaces from major model providers offer scale but expose only black-box inputs and outputs, with no way to read or write intermediate activations. Conversely, open-weight models can be inspected freely, but the largest of them require expensive multi-GPU hardware that is unavailable to most academic groups.[^4][^7]
The nnsight authors documented this gap empirically by surveying 184 interpretability papers and reporting that 60.6 percent of work published since February 2023 was still being carried out on models with less than 40 percent on the Massive Multitask Language Understanding (MMLU) benchmark, even though much larger open-weight models had been released during that period.[^4] The stated goal of nnsight together with NDIF is to close this gap by providing a single library that gives transparent access to model internals whether the user is running a 124-million-parameter model locally or a 405-billion-parameter model on shared infrastructure.[^4][^5]
The project is led by David Bau, an Assistant Professor of Computer Science at Northeastern University's Khoury College of Computer Sciences, whose earlier work includes Network Dissection and the ROME method for editing factual associations in language models.[^8] Co-principal investigators on NDIF include Byron Wallace, Arjun Guha, Jonathan Bell, and Carla Brodley, all at Northeastern University.[^9]
The central abstraction in nnsight is the tracing context, entered with a Python with statement on a model's .trace() method. Inside this context, attribute access on the wrapped model does not immediately invoke a forward pass. Instead, it constructs nodes in an intervention graph, a bipartite directed acyclic graph of variable nodes and apply nodes that represents the experiment the researcher wants to run.[^4][^10] When the context exits, the intervention graph is interleaved with the model's native forward pass: PyTorch hooks fire at the appropriate modules and either read values out, write modified values in, or both.[^4]
Each submodule of the wrapped model is recursively wrapped in an Envoy object, which exposes .input and .output attributes. These attributes are not concrete tensors; they are Proxy values, placeholders that stand in for activations that will exist only when the forward pass actually runs.[^4][^10] To persist a value beyond the tracing context, a user calls .save() on the relevant proxy. This explicit save model gives nnsight precise control over which intermediate tensors to retain, which is important when intervening on very large models where naive caching would exhaust memory.[^10]
A minimal example illustrates the pattern:
from nnsight import LanguageModel
model = LanguageModel("openai-community/gpt2", device_map="auto")
with model.trace("The Eiffel Tower is in"):
hidden = model.transformer.h[-1].output<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.save()
logits = model.output.save()
Here hidden and logits become available as concrete tensors after the with block returns.[^6][^10]
Operations on proxies, such as arithmetic, indexing, or torch function calls, themselves return new proxies. This lazy execution model means an entire experimental procedure can be described as a graph that is compiled and then executed in a single forward pass.[^4][^11] During graph construction, nnsight uses PyTorch's FakeTensor mechanism to validate shapes and dtypes without allocating real memory, which surfaces errors early.[^7]
The deferred-execution design distinguishes nnsight from libraries that rely on imperative hooks attached to running models. Because the graph is a first-class object, it can be optimized, for example by dead-node elimination, serialized for transport, or rebuilt on a different machine, which is what enables nnsight's remote-execution mode.[^4][^7]
Internally, the code inside a tracing context is not simply transformed into a static graph. The user's block is extracted at the Python abstract syntax tree level, compiled into a function, and executed in a worker thread. When that thread reads a proxy attribute such as .output, it blocks until the model's actual forward pass reaches the corresponding module, at which point a PyTorch forward hook fires and hands the live tensor to the worker. Conversely, when the user assigns a new value back to a proxy attribute, the hook intercepts the running activation and substitutes the modified tensor before computation continues.[^21] The combination of worker threads and hooks is what lets nnsight present an imperative, Pythonic interface while still controlling execution at a fine-grained level, and it is the bridge between the lazy intervention graph and the eager forward pass.[^21]
The intervention graph itself is a bipartite directed acyclic graph: variable nodes stand for values, and apply nodes stand for operations. Connections between the original computation graph and the intervention graph are mediated by getters, which read activations out, and setters, which write modified activations back; the acyclic constraint prevents the cycles that would otherwise arise if a setter's output fed an upstream getter. The graph supports incremental optimizations such as dead-node elimination, where unused branches are pruned before execution, and it can be serialized to a custom JSON format for transport to a remote backend.[^4][^11]
When a single tracing context needs to operate on multiple inputs, for example to perform activation patching where an activation from one prompt is substituted into the forward pass of another, nnsight provides nested invoker contexts:
with model.trace() as tracer:
with tracer.invoke("Paris is in"):
clean_hidden = model.transformer.h<sup><a href="#cite_note-5" class="cite-ref">[5]</a></sup>.output<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.save()
with tracer.invoke("Berlin is in"):
model.transformer.h<sup><a href="#cite_note-5" class="cite-ref">[5]</a></sup>.output<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>[:] = clean_hidden
patched_logits = model.output.save()
Each invoker block describes the value of .input and .output proxies for that specific prompt; the underlying engine batches the invocations into a single forward pass while keeping the per-prompt computations causally separated.[^11]
The National Deep Inference Fabric (NDIF) is a shared inference service that executes nnsight requests on hosted models too large for typical researcher hardware. NDIF was established through a U.S. National Science Foundation award of approximately 9 million dollars (Award IIS-2408455) announced on May 2, 2024, with David Bau as principal investigator and Northeastern University as the awardee institution.[^3][^12][^13] Compute for the service runs on the Delta and DeltaAI systems at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign.[^2][^12]
Architecturally, an nnsight client serializes its intervention graph to a custom JSON-based format and transmits it to an NDIF frontend server. The backend hosts one or more model instances, each pinned to a dedicated set of GPU nodes. For the largest models, weights are sharded across many GPUs using tensor parallelism, and the frontend uses the Ray framework's Global Control Service to route incoming requests to the appropriate head node.[^4][^7] Because interventions execute server side rather than requiring activations to round-trip back to the client, NDIF avoids the prohibitive bandwidth cost that would otherwise dominate remote interpretability workloads.[^4]
To transition a script from local to remote execution, the user typically only needs to set remote=True on the trace call; the rest of the API is identical.[^7][^11]
The paper's benchmarks show that NDIF's break-even point against running a model on a local high-performance computing allocation occurs at roughly three billion parameters, beyond which avoiding repeated model-loading overhead more than compensates for the constant communication latency.[^4] Compared to the Petals decentralized inference system, NDIF and Petals are similar for plain inference, but NDIF is substantially faster for interventions because hidden states do not have to be transferred between client and server.[^4] In direct comparisons against TransformerLens, pyvene, and baukit on GPT-2 XL, Gemma 7B, and Llama 3.1 8B, nnsight produces comparable runtimes across activation patching and attribution patching tasks; TransformerLens's standardization preprocessing makes its weight-loading step roughly three times slower than the other three libraries.[^4]
Access to NDIF is granted via a free API key obtained through registration at login.ndif.us, with applications required for the very largest models such as Llama 3.1 405B.[^14] NDIF is run in partnership with the Public Interest Technology University Network (PIT-UN), a coalition of 63 universities and colleges that supports outreach and training, and is overseen by an external advisory board described in the project's about page as spanning machine learning, humanities, technology policy, and supercomputing.[^9] The grant also funds workforce development activities, including training of graduate students and dissemination of educational materials to disciplines affected by large-scale AI, as part of the same NSF program that established the fabric.[^13]
The two primary user-facing classes are NNsight, which wraps an arbitrary PyTorch nn.Module, and LanguageModel, which adds tokenizer integration and convenience accessors for transformer-style models loaded from the Hugging Face Transformers library.[^6][^10] Version 0.6, released February 26, 2026, added VisionLanguageModel (for models such as LLaVA that integrate image preprocessing alongside tokenization) and DiffusionModel (which supports both U-Net-based architectures such as Stable Diffusion and transformer-based architectures such as FLUX.1 and the Diffusion Transformer).[^15]
from nnsight import LanguageModel
model = LanguageModel("meta-llama/Meta-Llama-3.1-8B")
with model.trace("The capital of France is"):
layer_5 = model.model.layers<sup><a href="#cite_note-5" class="cite-ref">[5]</a></sup>.output<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.save()
final = model.output.save()
Assignment to a proxy attribute writes a new value during the forward pass:
with model.trace("The capital of France is"):
model.model.layers<sup><a href="#cite_note-5" class="cite-ref">[5]</a></sup>.output<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>[:] = 0
zeroed_out = model.output.save()
This pattern underlies activation patching, ablation studies, and steering-vector experiments.[^4][^11]
Because the intervention graph is differentiable, proxies can have .grad attached and saved like activations, which makes attribution-style analyses such as attribution patching straightforward.[^4][^11]
from nnsight import LanguageModel
llama = LanguageModel("meta-llama/Meta-Llama-3.1-405B-Instruct")
with llama.trace("Interpretability is", remote=True):
layer_50 = llama.model.layers<sup><a href="#cite_note-50" class="cite-ref">[50]</a></sup>.output<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.save()
Only the remote=True flag distinguishes this from a local trace; the model is never downloaded to the client.[^7][^11]
Version 0.6 added the ability to upload sparse autoencoder and LoRA adapter modules with a remote request so that feature-level analysis or low-rank adaptations can be performed against NDIF-hosted models without downloading the underlying weights. nnsight serializes user code and rebuilds it on the server, even for packages not installed on NDIF.[^15]
The library supports generation-time interventions including early stopping, token streaming via vLLM, cross-prompt intervention through nested invoker blocks, and model editing through assignment to module parameters; vLLM integration in 0.6 covers single-GPU, tensor-parallel, Ray-distributed, and multi-node configurations.[^6][^15]
Locally, nnsight can wrap any PyTorch nn.Module, and LanguageModel accepts any causal or seq2seq model loadable through Hugging Face Transformers, including GPT-2, the various sizes of Llama 3.1, Gemma, and Qwen families.[^4][^6] On the remote side, NDIF's hosted catalog includes multiple sizes of Llama 3.1 (with Llama 3.1 405B available subject to application approval) and DeepSeek-R1 models, with the current list published at nnsight.net/status/.[^14] Version 0.6 adds first-class support for vision-language models such as LLaVA and for diffusion models built on either U-Nets (Stable Diffusion) or transformer backbones (FLUX, DiT).[^15]
nnsight is one of several libraries used in mechanistic interpretability research. The most frequently cited comparison is to TransformerLens, an earlier library focused specifically on decoder-only transformer language models. The nnsight paper benchmarks both libraries alongside pyvene and baukit on GPT-2 XL, Gemma 7B, and Llama 3.1 8B, and reports that all four frameworks achieve comparable runtimes for activation patching and attribution patching, although TransformerLens loads weights roughly three times more slowly than the alternatives because it preprocesses them into a standardized form.[^4]
The principal architectural differences are:
nnterp, also from the Bau Lab, adds a TransformerLens-style standardized interface on top of nnsight for common transformer architectures.[^16][^17]Although nnsight is a recent library, it has been adopted in interpretability research published since 2024. Examples documented in the project's own paper and in subsequent literature include:
nnterp, a companion library that builds on nnsight to provide a standardized TransformerLens-like interface across transformer architectures so that interpretability methods can be written once and applied to GPT-2, Llama, Gemma, or Qwen without per-model reimplementation.[^17]Other lines of research from the Bau Lab and collaborators that have used nnsight or are closely associated with it include the analysis of function vectors in language models, the study of how fine-tuning enhances pre-existing mechanisms, and the linearity of relation decoding in transformers, all presented at ICLR 2024.[^8] The library's authors also documented its applicability beyond text models in the 0.6 release, where it was demonstrated on vision-language and diffusion architectures, opening interpretability research on these classes to the same workflow used for language models.[^15]
The infrastructural argument for nnsight is supported by a quantitative literature survey reported in the project's paper. By analyzing 184 interpretability publications, the authors found that the majority of recent work has continued to use models substantially smaller than the open-weight state of the art, which they attribute to barriers around access to compute and white-box internals rather than to a lack of interesting research questions at larger scales.[^4]
NDIF is funded by NSF Award IIS-2408455, awarded to Northeastern University with David Bau as principal investigator and announced on May 2, 2024.[^3][^13] The award is approximately 9 million U.S. dollars over its term, and a parallel NSF grant funds interdisciplinary research on the societal impacts of generative AI using the same fabric.[^3][^13]
The nnsight library itself is distributed under the MIT License and developed in the open on GitHub at ndif-team/nnsight, with releases also published to the Python Package Index as nnsight.[^6] The current version as of May 2026 is 0.7.0, released on May 5, 2026, with 0.6 having added vision-language and diffusion model support, vLLM-based generation, remote sparse autoencoder and LoRA adapter execution, and significant performance improvements over the previous 0.5 series. Benchmarks reported with the 0.6 release showed approximately 3.9 times faster empty traces and 2.4 times faster traces with multiple saves compared to version 0.5.15, due to changes such as trace caching, persistent process mounting, and more selective copying of Python globals.[^15]
Day-to-day engineering is led by Jaden Fiotto-Kaufman as principal software engineer, with Michael Ripa and Adam Belfki as research software engineers and Emma Bortz as technical outreach manager. Open-source community engagement is coordinated by co-principal investigator Jonathan Bell; outreach and training are coordinated by co-principal investigator Carla Brodley in conjunction with the Public Interest Technology University Network.[^9]
The ICLR 2025 publication lists twenty authors: Jaden Fried Fiotto-Kaufman, Alexander Russell Loftus, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla E. Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, and David Bau. The paper was accepted as a poster at ICLR 2025 and the preprint, arXiv:2407.14561, has gone through four revisions between July 2024 and April 2025.[^4][^5]
Issues, discussions, and contributions are coordinated through the ndif-team/nnsight GitHub repository, and the parallel ndif-team/ndif repository hosts the server-side code that performs deep inference and serves nnsight requests remotely.[^6][^22] The project's documentation, tutorials, and announcements are published at nnsight.net, and an instance status page at nnsight.net/status/ lists the models currently hosted by NDIF along with their availability.[^1][^14]