Christopher Olah

Anthropic Interpretability People

25 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

44 citations

Revision

v3 · 4,969 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Christopher Olah (commonly Chris Olah) is a Canadian machine learning researcher, a co-founder of Anthropic, and the researcher most often credited with founding mechanistic interpretability, the subfield that reverse-engineers the internal computations of trained neural networks.^[1]^[2]^[3]^[4] The term "mechanistic interpretability" itself was popularized by Olah and first used publicly in the Distill Circuits thread, a series of articles published between March 2020 and April 2021.^[5]^[27] He leads the interpretability team at Anthropic, where his group used sparse autoencoders to extract on the order of 34 million interpretable features from the production language model Claude 3 Sonnet in 2024, work that landed him on TIME magazine's TIME100 AI list that September.^[37]^[4] Olah does not hold an undergraduate degree: he left the University of Toronto after one year and received a $100,000 Thiel Fellowship in 2012, entering research through self-study, blogging, and mentorship.^[9]^[10]

Before Anthropic, Olah worked at Google Brain (roughly 2014 to 2018) and led the interpretability team at OpenAI (2018 to late 2021), where he initiated the Circuits research thread.^[5]^[6] He is also a co-founder and former editor-in-chief of the open-science journal Distill, which launched in March 2017 and went on indefinite hiatus in July 2021.^[7]^[8]


Nationality	Canadian^[1]
Education	The Abelard School, Toronto (AP National Scholar, 2010); brief enrollment at University of Toronto; no university degree^[11]^[9]
Known for	Mechanistic interpretability; Distill journal; DeepDream; Circuits thread; sparse-autoencoder feature extraction^[3]^[5]^[12]
Notable employers	Google Brain (2014 to 2018); OpenAI (2018 to 2021); Anthropic (2021 to present)^[5]^[2]
Notable awards	Thiel Fellowship (2012); TIME100 AI (2024)^[9]^[4]
Personal site	colah.github.io^[1]

Early life and education

Olah grew up in Toronto, Canada, and attended The Abelard School, a small private high school. He graduated in July 2010 and was named an Advanced Placement National Scholar, having completed six AP-level subjects.^[11] As a teenager he became involved in the local technology community, joining the hacklab.to hackerspace in 2009; the space gave him early exposure to programming, electronics, and self-directed projects.^[11]

In a 2020 blog post titled "Do I Need to Go to University?", Olah recounts that he attended the University of Toronto for a single year. He audited courses there as a high school student and took some advanced coursework during his year of formal enrollment, but did not complete a degree.^[9] He has since written publicly that "I've been somewhat successful as a researcher without an undergraduate degree or PhD," while cautioning that the path is risky for most people and that, for almost everyone who writes to him for advice, the right choice is to attend university.^[9] In a podcast interview with the career-advice organization 80,000 Hours, Olah described leaving school in part to support an acquaintance who faced what he describes as false terrorism charges related to hobby chemistry equipment; he spent roughly two years on that case, which he later judged "altruistic, but not especially effective."^[10]

After this period, Olah's career was redirected by his selection as a Thiel Fellow in 2012. The Thiel Fellowship, run by venture investor Peter Thiel's foundation, awards a $100,000 grant to young people under 20 who agree to skip or leave university in order to pursue research or entrepreneurship.^[9]^[10] His initial Thiel project focused on three-dimensional printing and open-source CAD software, but he pivoted to machine learning around 2013 after attending a seminar series on neural networks led by physicist and science writer Michael Nielsen.^[10] Nielsen subsequently became an informal mentor and collaborator. Olah has also credited Yoshua Bengio, who invited him to visit Mila in Montreal in 2013, with helping arrange a path into academic research; Bengio extended a PhD offer that Olah ultimately declined.^[10]^[9]

Career

Google Brain (2014 to 2018)

Olah joined Google Brain in Mountain View as a research intern around 2013 to 2014, with senior researchers Jeff Dean and Greg Corrado supporting his hiring despite his lack of a formal degree.^[10] He converted to a full-time researcher and spent roughly four years on the team. His Google Brain work centered on visualization and interpretation of trained convolutional networks, with a particular focus on the InceptionV1 image classifier.^[5]^[13]

In June 2015, Google researchers Alexander Mordvintsev, Olah, and Mike Tyka published "Inceptionism: Going Deeper into Neural Networks" on the Google Research Blog, describing a procedure that iteratively modifies an input image to amplify whatever features a chosen layer of a classifier responds to.^[13] The resulting hallucinatory images, including dogs and eyes appearing in clouds and landscapes, were released as the open-source DeepDream code on 1 July 2015 and became one of the first viral demonstrations of feature visualization.^[13]^[14] Olah was the second-named author on the original blog post.^[13]

Building on this line of work, Olah and collaborators produced a series of long-form articles on neural-network visualization, including "Feature Visualization" (Distill, 7 November 2017) and "The Building Blocks of Interpretability" (Distill, 6 March 2018, with Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, and Alexander Mordvintsev).^[15]^[16] The latter introduced techniques for combining feature visualization with attribution methods, and was accompanied by the open-source Lucid library.^[15] In March 2019, the same group, together with Zan Armstrong and others at OpenAI and Google, published "Activation Atlas" in Distill, which used feature inversion across millions of activations to produce a navigable map of features a vision network has learned.^[17]

Across these years Olah also wrote prolifically on his personal site, colah.github.io. The post "Neural Networks, Manifolds, and Topology" (April 2014) framed deep classifiers in terms of homeomorphisms of input space.^[18] "Visualizing MNIST: An Exploration of Dimensionality Reduction" (October 2014) introduced a wide audience to t-SNE and related techniques.^[19] "Understanding LSTM Networks" (27 August 2015) became one of the most cited explanatory blog posts in machine learning; according to Olah's Google Scholar profile the post has accrued more than three thousand citations and remains assigned reading in many graduate ML courses.^[20]^[21]

Founding of Distill (2017 to 2021)

On 20 March 2017, Olah and Shan Carter, then both at Google Brain, announced the launch of Distill, "a new open science journal and ecosystem supporting human understanding of machine learning."^[7]^[22] Distill was developed in collaboration with Google, OpenAI, DeepMind, and YC Research, and operated as a peer-reviewed venue dedicated to articles that use interactive visualizations and explorable explanations rather than the standard static PDF format.^[8]^[7] The editors-in-chief were Carter, Olah, and Arvind Satyanarayan of MIT.^[8] An accompanying initiative, the Distill Prize for Clarity in Machine Learning, was endowed at $125,000 by Olah, Greg Brockman, Jeff Dean, DeepMind, and the Open Philanthropy Project; Olah personally contributed $25,000.^[23]

Distill published several of Olah's most influential articles, including "Research Debt" (2017) with Carter, which argued that the accumulation of un-distilled complexity in technical fields is a major and underappreciated cost to research progress.^[24] The journal published 39 peer-reviewed articles between 2016 and 2021, on topics ranging from attention mechanisms to graph neural networks.^[8]^[25]

On 2 July 2021, Olah, Nick Cammarata, Sam Greydanus, and Janelle Tam announced the "Distill Hiatus," explaining that the volunteer editorial team had experienced sustained burnout from the tension between mentoring submitters, running peer review, and authoring their own articles, and that Distill would pause publishing for at least one year (which has since extended indefinitely).^[26]^[8] Olah used the same post to commit to continuing the Circuits research line on a successor venue.^[26]

OpenAI: the Circuits thread (2018 to 2021)

Around 2018 Olah moved from Google Brain to OpenAI in San Francisco, where he founded and led a team initially called Clarity and later renamed Circuits, dedicated to mechanistic interpretability of vision and multimodal networks.^[10]^[27] The flagship output was the Circuits research thread, published as a series of invited Distill articles starting with "Zoom In: An Introduction to Circuits" by Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter on 10 March 2020.^[5]^[27] "Zoom In" laid out three speculative claims that have since defined the field: that neural-network features are meaningful units of analysis, that the connections between features form interpretable circuits, and that important features and circuits are universal across models.^[5]

The Circuits thread continued through 2020 and 2021 with articles such as "An Overview of Early Vision in InceptionV1," "Curve Detectors," "Curve Circuits," "Naturally Occurring Equivariance in Neural Networks," and "Understanding RL Vision," each focused on reverse-engineering a specific sub-circuit of trained models.^[5] In March 2021, the same group published "Multimodal Neurons in Artificial Neural Networks" in Distill, identifying neurons in OpenAI's CLIP model that respond to a single concept across photographs, drawings, and rendered text.^[28]

According to Olah's own account in a 2021 podcast interview, he left OpenAI in December 2021 to help start a new AI-safety-focused lab; that lab was Anthropic.^[10]

Anthropic and Transformer Circuits (2021 to present)

Olah is one of eight co-founders of Anthropic, an AI safety company that public records and his Anthropic profile list as founded in 2021 by Dario Amodei, Daniela Amodei, Jared Kaplan, Jack Clark, Tom Brown, Sam McCandlish, Ben Mann, and Olah, all formerly at OpenAI.^[29]^[2] At Anthropic he holds the title of co-founder and serves as the lead of the interpretability team; he is also identified in public press coverage as the company's head of research on interpretability of artificial intelligence.^[4]^[30]

Olah's work at Anthropic is published primarily on the Transformer Circuits Thread (transformer-circuits.pub), which Anthropic positioned as a continuation of the Circuits agenda from OpenAI. The first major release, "A Mathematical Framework for Transformer Circuits" (22 December 2021) by Nelson Elhage, Neel Nanda, Catherine Olsson, and others, with Olah as a senior author, introduced a tensor-product decomposition of attention and identified two-layer attention-only transformers as the simplest setting in which to study transformer circuits.^[31] In a follow-up paper, "In-context Learning and Induction Heads" (8 March 2022), Olsson, Elhage, Nanda, and colleagues, with Olah as a senior author, defined induction heads as attention heads that complete patterns of the form [A][B] ... [A] -> [B] and presented six lines of evidence that such heads are a major mechanism behind in-context learning in transformer models of many sizes.^[32]

In September 2022, Olah and a large team of Anthropic researchers and external collaborators (including Martin Wattenberg of Harvard) published "Toy Models of Superposition," which formalized the long-standing observation that individual neurons in trained networks often respond to multiple unrelated features.^[33]^[34] The paper introduces a small ReLU network trained on sparse synthetic data in which superposition (the storage of more features than dimensions) can be exhibited cleanly, identifies a phase transition between non-superposed and superposed regimes, and links the geometry of the resulting feature arrangements to uniform polytopes.^[33]^[12] The paper has since become a standard reference and is the conceptual basis for much of the sparse autoencoder work that followed.^[34]

In October 2023, the Anthropic interpretability team published "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning," with Trenton Bricken and Adly Templeton as lead authors and Olah as a senior author.^[35]^[36] The paper showed that training a sparse autoencoder on the residual-stream activations of a one-layer transformer could decompose a 512-dimensional layer into thousands of sparse, more interpretable features such as detectors for DNA sequences, HTTP requests, legal language, and Hebrew text.^[35] This established sparse autoencoders, also known as sparse dictionary learning, as a viable method for finding more interpretable directions in language-model activation space, and is the basis for the Towards Monosemanticity thread on the wiki.^[35]

The 2024 sequel, "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" (published 21 May 2024 on transformer-circuits.pub), applied the same method to a production model, the middle residual stream of Claude 3 Sonnet, and extracted on the order of 34 million features, including features for the Golden Gate Bridge, sycophantic praise, security vulnerabilities, and several abstract concepts that could be used to steer model behaviour.^[37]^[4] The paper, alongside the public "Golden Gate Claude" demonstration that followed, brought interpretability research into mainstream press coverage. TIME magazine subsequently named Olah to its TIME100 AI list in September 2024, citing the Scaling Monosemanticity work as a key reason.^[4]

Anthropic's interpretability program continued to expand through 2024 and 2025. In March 2025, Jack Lindsey, Wes Gurnee, and colleagues, with Olah and Joshua Batson among the senior authors, published "Circuit Tracing: Revealing Computational Graphs in Language Models" and the accompanying case study "On the Biology of a Large Language Model," which used cross-layer transcoders and backward Jacobian tracing to construct attribution graphs that follow multi-step reasoning, planning, and hallucination-suppression behaviour through Claude 3.5 Haiku.^[38] Anthropic open-sourced an associated circuit-tracing library in May 2025 with an interactive frontend hosted on Neuronpedia.^[39]

In March 2025, Anthropic announced a Series E funding round of $3.5 billion at a $61.5 billion post-money valuation, with the company specifically citing deepening its research in mechanistic interpretability and alignment as one of four planned uses of the funds; Olah's team is the locus of that work.^[40] The company's valuation climbed sharply over the following year: Anthropic closed a roughly $30 billion Series G round on 12 February 2026 at a $380 billion post-money valuation, led by GIC and Coatue, nearly doubling the $183 billion figure it had carried only months earlier.^[41]^[43]

On 25 May 2026, Olah was confirmed as one of the lay speakers at the Vatican launch of Pope Leo XIV's first encyclical, "Magnifica Humanitas," on safeguarding the human person in the time of artificial intelligence; the event marks the first time the co-founder of an AI company has spoken at the launch of a papal encyclical.^[41]^[42] Speaking at the Vatican, Olah argued that AI companies cannot be trusted to govern themselves, stating that "no matter how sincerely any of us intend to do the right thing ... we will always be influenced by those incentives," and called for external oversight from critics, scholars, and governments.^[44]

Research contributions

What is Chris Olah known for?

Olah's defining contribution is helping establish mechanistic interpretability as a distinct research subfield with shared concepts (features, circuits, universality), shared methodology (feature visualization, attribution, dictionary learning, circuit tracing), and shared empirical conjectures.^[3]^[5] The Circuits thread (2020 and after) introduced the working hypothesis that trained vision networks can be decomposed into a graph of interpretable features and the connections between them.^[5] The Transformer Circuits Thread extended the same framing to attention-only and full transformer language models.^[31]^[32]

The "Zoom In" article in particular laid out three foundational claims that have organised the agenda of the field ever since. The first, the features hypothesis, is that features are the fundamental unit of neural-network computation and can be rigorously studied. The second, the circuits hypothesis, is that features are connected by weights that themselves form interpretable circuits implementing identifiable algorithms. The third, the universality hypothesis, is that analogous features and circuits arise across different architectures and training runs.^[5] Subsequent work by Olah and others has produced empirical support for all three claims in vision networks (for instance, curve detectors in InceptionV1) and increasingly in language models (for instance, induction heads in two-layer transformers).^[5]^[32]

Why does interpretability matter to Olah?

In his framing of why interpretability matters, Olah has consistently linked the research to AI safety. In the TIME100 AI profile he argued that genuine understanding of these systems could let researchers "go and say when these models are actually safe, or whether they just appear safe."^[4] The Transformer Circuits Thread describes Anthropic's interpretability programme as "an effort to reverse engineer the algorithms learned by neural networks into human-understandable algorithms," with the longer-term goal of providing audit-grade evidence about model behaviour.^[31]

Superposition and sparse dictionary learning

A second strand of contributions concerns superposition, the phenomenon in which neural networks encode more features than they have neurons by representing those features as overlapping non-orthogonal directions in activation space.^[33] "Toy Models of Superposition" (2022) made the phenomenon mathematically tractable in a small setting, and the 2023 to 2024 monosemanticity papers used sparse autoencoders to recover monosemantic feature directions from real models.^[33]^[35]^[37] Together these papers shaped the dominant empirical methodology used by interpretability groups outside Anthropic, including at Google DeepMind, EleutherAI, and academic labs.^[36]

The 2022 "Toy Models" paper was particularly influential because it gave a clean experimental setting in which the trade-off between feature interference and feature count could be controlled by the sparsity of the underlying data. The paper showed a phase transition between a regime where the network simply stores the most important features in orthogonal directions and a regime where it packs many features into overlapping directions, accepting some interference because the inputs are sparse enough that interference rarely fires.^[12]^[33] It also drew a connection to the geometry of uniform polytopes: the optimal arrangements of features in a low-dimensional bottleneck turn out to be configurations such as pentagons, tetrahedra, and other regular shapes.^[33]

"Towards Monosemanticity" (2023) operationalised these ideas by training an over-complete sparse autoencoder on the residual stream of a one-layer transformer, decomposing a 512-dimensional layer into roughly four thousand sparse, more interpretable features such as detectors for DNA sequences, HTTP requests, legal language, and Hebrew text.^[35] Anthropic researchers reported that human raters could assign single-concept descriptions to a clear majority of the features, in contrast to the polysemantic and largely uninterpretable individual neurons of the underlying transformer.^[35]^[36] The paper also introduced practical training advice and the notion of "feature splitting," in which scaling the autoencoder dictionary causes a single broad feature to split into several more specific features, supporting an interpretation of features as a hierarchical concept space.^[35]

How far does the method scale? "Scaling Monosemanticity" (2024) showed that the same technique could be applied to a production model, Claude 3 Sonnet, extracting up to 34 million features from the residual stream of a middle layer.^[37] The paper reports features that activate for highly specific stimuli, ranging from the Golden Gate Bridge to bugs in code, and demonstrates that clamping a feature to an unusually high value can cause the model to manifest the associated concept in its outputs.^[37] An accompanying live demonstration in May 2024, "Golden Gate Claude," in which Anthropic deployed a version of Claude 3 Sonnet with the Golden Gate Bridge feature pinned on, brought the work into mainstream news coverage and was cited in TIME's 2024 selection of Olah.^[4]

Visualization and feature attribution

Olah's earlier work, particularly DeepDream (2015), "Feature Visualization" (2017), "The Building Blocks of Interpretability" (2018), and "Activation Atlas" (2019), established the visual language and tooling that the interpretability field still uses.^[13]^[15]^[17] These articles are also notable for their unusual format: long, interactive, image-heavy explorable explanations published on Distill rather than in conference proceedings.^[7]

Distillation and research communication

Through Distill and "Research Debt" (2017), Olah and his collaborators argued that clearer, better-illustrated technical writing is itself a research contribution rather than a side activity, and built tooling and a prize to encourage that style of work.^[24]^[23] Olah has cited research communication as central to his career and has identified the intersection of machine learning and drawing as the niche in which he aimed to be most effective.^[10]

"Research Debt" introduced a vocabulary that has since been adopted by parts of the broader research community. Olah and Carter define the central term plainly: "Research debt is the accumulation of missing interpretive labor," the build-up of poorly explained concepts, bad notation, and missing intermediate explanations that makes new entrants slower to onboard and harder to integrate.^[24] Distillation, in Olah's framing, is the opposite process: clearing that debt by producing high-quality, often interactive explanations that compress and clarify earlier work.^[24] The Distill Prize for Clarity in Machine Learning was created to reward such work directly, with grants of up to $10,000 per recipient drawn from the journal's $125,000 endowment.^[23]

Olah's own pedagogical posts on colah.github.io have, in aggregate, served as informal textbooks for cohorts of ML practitioners. "Understanding LSTM Networks" in particular is assigned in graduate machine-learning courses at universities including MIT and Stanford and has been republished or mirrored in many forms.^[11]^[20] Olah has been explicit that he sees this writing as part of his research output rather than ancillary to it.^[9]^[24]

Long-term agenda

Olah has described his long-term goal as building "a kind of microscope" for neural networks: tools and concepts powerful enough that researchers can routinely look inside a trained model and read off its algorithms.^[5]^[36] He has framed this as an alternative to purely behavioural evaluation of AI systems, arguing that behaviour alone cannot distinguish models that are aligned from models that merely appear aligned under the available test set.^[4]^[6] The framing has propagated through Anthropic's public communications; the company's 2025 Series E announcement names mechanistic interpretability and alignment as one of four investment priorities, and Anthropic CEO Dario Amodei has cited interpretability research as a central reason Anthropic exists.^[40]^[29]

When did Chris Olah work where? (timeline)

Period	Role and affiliation	Highlights
2010	Graduated The Abelard School, Toronto (AP National Scholar)^[11]	Joined hacklab.to (2009); audited university courses
2012	Thiel Fellow ($100,000 grant)^[9]^[10]	Left University of Toronto after one year; no degree
2014 to 2018	Researcher, Google Brain^[5]^[13]	DeepDream (2015); Understanding LSTM Networks (2015); Feature Visualization (2017)
2017 to 2021	Co-founder and editor-in-chief, Distill^[7]^[8]	Research Debt (2017); 39 peer-reviewed articles; hiatus July 2021
2018 to 2021	Lead, Clarity/Circuits team, OpenAI^[10]^[27]	Zoom In: An Introduction to Circuits (2020)
2021 to present	Co-founder and interpretability lead, Anthropic^[2]^[4]	Toy Models of Superposition (2022); Towards/Scaling Monosemanticity (2023 to 2024)

Awards and recognition

Thiel Fellowship, 2012, awarded by the Thiel Foundation to recipients under 20 with a grant of $100,000 to pursue work outside the university system.^[9]^[10]
TIME100 AI, 2024, naming Olah as one of the 100 most influential people in artificial intelligence; the citation highlighted the Scaling Monosemanticity work at Anthropic.^[4]
Invited speaker, launch of Pope Leo XIV's encyclical "Magnifica Humanitas" on artificial intelligence at the Vatican Synod Hall, 25 May 2026, the first time an AI-company co-founder has appeared in that role.^[41]^[42]
Co-funded and co-administered the Distill Prize for Clarity in Machine Learning (from 2018), an annual award for outstanding research communication.^[23]

According to his Google Scholar profile, Olah's published work had accumulated more than 110,000 citations and an h-index of 53 as of early 2026, with the "Understanding LSTM Networks" blog post alone cited more than 3,600 times.^[20]

Selected publications

The following list emphasises articles in which Olah was a principal author or led the research direction. Dates refer to the public release date.

Olah, C., "Neural Networks, Manifolds, and Topology," colah.github.io, April 2014.^[18]
Olah, C., "Visualizing MNIST: An Exploration of Dimensionality Reduction," colah.github.io, October 2014.^[19]
Olah, C., "Understanding LSTM Networks," colah.github.io, 27 August 2015.^[20]
Mordvintsev, A., Olah, C., and Tyka, M., "Inceptionism: Going Deeper into Neural Networks," Google Research Blog, June 2015. Open-source DeepDream code released 1 July 2015.^[13]
Olah, C., Mordvintsev, A., and Schubert, L., "Feature Visualization," Distill, 7 November 2017.^[14]^[15]
Olah, C., and Carter, S., "Research Debt," Distill, 27 March 2017.^[24]
Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A., "The Building Blocks of Interpretability," Distill, 6 March 2018.^[15]
Carter, S., Armstrong, Z., Schubert, L., Johnson, I., and Olah, C., "Activation Atlas," Distill, 6 March 2019.^[17]
Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., and Carter, S., "Zoom In: An Introduction to Circuits," Distill, 10 March 2020.^[5]
Goh, G., Cammarata, N., Voss, C., Carter, S., Petrov, M., Schubert, L., Radford, A., and Olah, C., "Multimodal Neurons in Artificial Neural Networks," Distill, 4 March 2021.^[28]
Elhage, N., Nanda, N., Olsson, C., et al., with Olah, C. as senior author, "A Mathematical Framework for Transformer Circuits," Transformer Circuits Thread, 22 December 2021.^[31]
Olsson, C., Elhage, N., Nanda, N., et al., with Olah, C. as senior author, "In-context Learning and Induction Heads," Transformer Circuits Thread, 8 March 2022; arXiv:2209.11895.^[32]
Elhage, N., Hume, T., Olsson, C., et al., and Olah, C., "Toy Models of Superposition," Transformer Circuits Thread, September 2022; arXiv:2209.10652.^[33]^[12]
Bricken, T., Templeton, A., et al., with Olah, C. as senior author, "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning," Transformer Circuits Thread, 5 October 2023.^[35]
Templeton, A., Conerly, T., Marcus, J., et al., with Olah, C. as senior author, "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet," Transformer Circuits Thread, 21 May 2024.^[37]
Lindsey, J., Gurnee, W., Ameisen, E., et al., with Olah, C. and Batson, J. as senior authors, "On the Biology of a Large Language Model," Transformer Circuits Thread, March 2025.^[38]

Selected blog posts

Although Olah is not primarily an academic author, his personal site colah.github.io has been an unusually influential venue for ML pedagogy. Posts referenced in his own CV and cited in subsequent research include "Calculus on Computational Graphs: Backpropagation" (August 2015), "Conv Nets: A Modular Perspective" (July 2014), "Attention and Augmented Recurrent Neural Networks" (with Shan Carter, Distill, September 2016), "Visualizing Representations: Deep Learning and Human Beings" (January 2015), and "Do I Need to Go to University?" (May 2020).^[1]^[9]^[25]^[18]^[19]^[20]

References

Olah, C., "About," colah.github.io, undated. https://colah.github.io/about.html. Accessed 2026-05-25. ↩
Anthropic, "Anthropic raises Series E at $61.5B post-money valuation," anthropic.com, 2025-03-03. https://www.anthropic.com/news/anthropic-raises-series-e-at-usd61-5b-post-money-valuation. Accessed 2026-05-25. ↩
Wikipedia contributors, "Chris Olah," en.wikipedia.org, undated. https://en.wikipedia.org/wiki/Chris_Olah. Accessed 2026-05-25. ↩
TIME magazine, "Chris Olah: The 100 Most Influential People in AI 2024," time.com, 2024-09-05. https://time.com/collections/time100-ai-2024/7012873/chris-olah/. Accessed 2026-05-25. ↩
Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., and Carter, S., "Zoom In: An Introduction to Circuits," Distill, 2020-03-10. https://distill.pub/2020/circuits/zoom-in/. Accessed 2026-05-25. ↩
80,000 Hours, "Chris Olah on what the hell is going on inside neural networks," podcast episode, 80000hours.org, 2021-08-04. https://80000hours.org/podcast/episodes/chris-olah-interpretability-research/. Accessed 2026-05-25. ↩
Carter, S., and Olah, C., "Distill: Supporting Clarity in Machine Learning," Google Research Blog, 2017-03-20. https://research.google/blog/distill-supporting-clarity-in-machine-learning/. Accessed 2026-05-25. ↩
Wikipedia contributors, "Distill (journal)," en.wikipedia.org, undated. https://en.wikipedia.org/wiki/Distill_(journal). Accessed 2026-05-25. ↩
Olah, C., "Do I Need to Go to University?", colah.github.io, 2020-05. https://colah.github.io/posts/2020-05-University/. Accessed 2026-05-25. ↩
80,000 Hours, "Chris Olah on working at top AI labs without an undergrad degree," podcast episode, 80000hours.org, 2021-01-04. https://80000hours.org/podcast/episodes/chris-olah-unconventional-career-path/. Accessed 2026-05-25. ↩
Grokipedia, "Chris Olah," grokipedia.com, undated. https://grokipedia.com/page/Chris_Olah. Accessed 2026-05-25. ↩
Elhage, N., Hume, T., Olsson, C., et al., "Toy Models of Superposition," arXiv:2209.10652, 2022-09-21. https://arxiv.org/abs/2209.10652. Accessed 2026-05-25. ↩
Mordvintsev, A., Olah, C., and Tyka, M., "Inceptionism: Going Deeper into Neural Networks," Google Research Blog, 2015-06-17. https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html. Accessed 2026-05-25. ↩
Know Your Meme, "Google DeepDream," knowyourmeme.com, undated. https://knowyourmeme.com/memes/subcultures/google-deepdream. Accessed 2026-05-25. ↩
Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A., "The Building Blocks of Interpretability," Distill, 2018-03-06. https://distill.pub/2018/building-blocks/. Accessed 2026-05-25. ↩
Google Open Source Blog, "The Building Blocks of Interpretability," opensource.googleblog.com, 2018-03-06. https://opensource.googleblog.com/2018/03/the-building-blocks-of-interpretability.html. Accessed 2026-05-25. ↩
Carter, S., Armstrong, Z., Schubert, L., Johnson, I., and Olah, C., "Activation Atlas," Distill, 2019-03-06. https://distill.pub/2019/activation-atlas/. Accessed 2026-05-25. ↩
Olah, C., "Neural Networks, Manifolds, and Topology," colah.github.io, 2014-04. https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/. Accessed 2026-05-25. ↩
Olah, C., "Visualizing MNIST: An Exploration of Dimensionality Reduction," colah.github.io, 2014-10-09. https://colah.github.io/posts/2014-10-Visualizing-MNIST/. Accessed 2026-05-25. ↩
Olah, C., "Understanding LSTM Networks," colah.github.io, 2015-08-27. https://colah.github.io/posts/2015-08-Understanding-LSTMs/. Accessed 2026-05-25. ↩
Google Scholar, "Christopher Olah," scholar.google.com, undated. https://scholar.google.com/citations?user=6dskOSUAAAAJ&hl=en. Accessed 2026-05-25. ↩
Olah, C., "Distill," colah.github.io, 2017-03. https://colah.github.io/posts/2017-03-Distill/. Accessed 2026-05-25. ↩
Distill, "Distill Prize for Clarity in Machine Learning," distill.pub, undated. https://distill.pub/prize/. Accessed 2026-05-25. ↩
Olah, C., and Carter, S., "Research Debt," Distill, 2017-03-27. https://distill.pub/2017/research-debt/. Accessed 2026-05-25. ↩
Distill, "Latest articles about machine learning," distill.pub, undated. https://distill.pub/. Accessed 2026-05-25. ↩
Olah, C., Cammarata, N., Greydanus, S., and Tam, J., "Distill Hiatus," Distill, 2021-07-02. https://distill.pub/2021/distill-hiatus/. Accessed 2026-05-25. ↩
Distill, "Thread: Circuits," distill.pub, 2020-2021. https://distill.pub/2020/circuits/. Accessed 2026-05-25. ↩
Goh, G., Cammarata, N., Voss, C., Carter, S., Petrov, M., Schubert, L., Radford, A., and Olah, C., "Multimodal Neurons in Artificial Neural Networks," Distill, 2021-03-04. https://distill.pub/2021/multimodal-neurons/. Accessed 2026-05-25. ↩
Wikipedia contributors, "Anthropic," en.wikipedia.org, undated. https://en.wikipedia.org/wiki/Anthropic. Accessed 2026-05-25. ↩
PBS NewsHour, "Pope Leo XIV to launch his first encyclical, a document on artificial intelligence, with Anthropic's co-founder," pbs.org, 2026-05-18. https://www.pbs.org/newshour/world/pope-leo-xiv-to-launch-his-first-encylical-a-document-on-artificial-intelligence-with-anthropics-co-founder. Accessed 2026-05-25. ↩
Elhage, N., Nanda, N., Olsson, C., Henighan, T., et al., "A Mathematical Framework for Transformer Circuits," Transformer Circuits Thread, 2021-12-22. https://transformer-circuits.pub/2021/framework/index.html. Accessed 2026-05-25. ↩
Olsson, C., Elhage, N., Nanda, N., et al., "In-context Learning and Induction Heads," arXiv:2209.11895, 2022-09-24. https://arxiv.org/abs/2209.11895. Accessed 2026-05-25. ↩
Anthropic, "Toy Models of Superposition," anthropic.com, 2022-09-14. https://www.anthropic.com/research/toy-models-of-superposition. Accessed 2026-05-25. ↩
Elhage, N., et al., "Toy Models of Superposition," Transformer Circuits Thread, 2022-09. https://transformer-circuits.pub/2022/toy_model/index.html. Accessed 2026-05-25. ↩
Bricken, T., Templeton, A., Batson, J., et al., "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning," Transformer Circuits Thread, 2023-10-05. https://transformer-circuits.pub/2023/monosemantic-features. Accessed 2026-05-25. ↩
Anthropic, "The engineering challenges of scaling interpretability," anthropic.com, 2024. https://www.anthropic.com/research/engineering-challenges-interpretability. Accessed 2026-05-25. ↩
Templeton, A., Conerly, T., Marcus, J., et al., "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet," Transformer Circuits Thread, 2024-05-21. https://transformer-circuits.pub/2024/scaling-monosemanticity/. Accessed 2026-05-25. ↩
Lindsey, J., Gurnee, W., Ameisen, E., et al., "On the Biology of a Large Language Model," Transformer Circuits Thread, 2025-03. https://transformer-circuits.pub/2025/attribution-graphs/biology.html. Accessed 2026-05-25. ↩
InfoQ, "Anthropic Open-Sources Tool to Trace the Thoughts of Large Language Models," infoq.com, 2025-06. https://www.infoq.com/news/2025/06/anthropic-circuit-tracing/. Accessed 2026-05-25. ↩
Anthropic, "Anthropic raises Series E at $61.5B post-money valuation," anthropic.com, 2025-03-03. https://www.anthropic.com/news/anthropic-raises-series-e-at-usd61-5b-post-money-valuation. Accessed 2026-05-25. ↩
TheNextWeb, "Pope Leo XIV to launch AI encyclical Magnifica Humanitas on 25 May with Anthropic co-founder Christopher Olah," thenextweb.com, 2026-05. https://thenextweb.com/news/pope-leo-encyclical-magnifica-humanitas-anthropic-olah. Accessed 2026-05-25. ↩
Vatican News, "Pope Leo XIV's first encyclical Magnifica humanitas to be published May 25," vaticannews.va, 2026-05. https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-first-encyclical-magnifica-humanitas.html. Accessed 2026-05-25. ↩
CNBC, "Anthropic signs term sheet for $10 billion funding round at $350 billion valuation," cnbc.com, 2026-01-07. https://www.cnbc.com/2026/01/07/anthropic-funding-term-sheet-valuation.html. Accessed 2026-06-23. ↩
Fortune, "Who is Chris Olah? The atheist Anthropic cofounder the Pope chose to sit beside him at the Vatican," fortune.com, 2026-06-03. https://fortune.com/2026/06/03/who-chris-olah-anthropic-cofounder-atheist-pope-leo-vatican-ai/. Accessed 2026-06-23. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Benjamin "Ben" Mann Circuit discovery Constitutional AI Daniela Amodei Mechanistic interpretability OpenAI Microscope Peter Thiel TensorFlow Playground Towards Monosemanticity Toy Models of Superposition

Early life and education

Career

Google Brain (2014 to 2018)

Founding of Distill (2017 to 2021)

OpenAI: the Circuits thread (2018 to 2021)

Anthropic and Transformer Circuits (2021 to present)

Research contributions

What is Chris Olah known for?

Why does interpretability matter to Olah?

Superposition and sparse dictionary learning

Visualization and feature attribution

Distillation and research communication

Long-term agenda

When did Chris Olah work where? (timeline)

Awards and recognition

Selected publications

Selected blog posts

See also

References

Improve this article

Related Articles

Attribution Graphs

Crosscoder

Golden Gate Claude

Towards Monosemanticity

Scaling Monosemanticity

On the Biology of a Large Language Model

What links here

Related Articles

Attribution Graphs

Crosscoder

Golden Gate Claude

Towards Monosemanticity

Scaling Monosemanticity

On the Biology of a Large Language Model

What links here