Blog posts
Last reviewed
May 11, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 2,552 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 2,552 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Guides, Papers, Books
Blog posts have shaped how the public, researchers, and policymakers understand artificial intelligence. Long before peer reviewed papers reach a broad audience, a handful of essays circulate on Twitter, Hacker News, and LessWrong, where they set the vocabulary for everything that follows. Terms like "scaling hypothesis," "Software 2.0," and "the intelligence age" all entered the discourse through individual posts on personal websites. This page collects the essays that practitioners point to when they recommend reading lists, ordered roughly by influence rather than recency.
A few patterns worth noting. The most durable posts come from researchers who maintain personal blogs over many years, not from corporate communications teams. They are usually long, often technical, and rarely written for a general audience. They also age strangely: Karpathy's "Software 2.0" looked speculative in 2017 and looks obvious in 2026.
| Author | Year | Title | Topic | Why it matters |
|---|---|---|---|---|
| Tim Urban | 2015 | The AI Revolution: The Road to Superintelligence | Superintelligence, exponential growth | Brought AI risk to a mass audience through Wait But Why |
| Andrej Karpathy | 2015 | The Unreasonable Effectiveness of Recurrent Neural Networks | RNNs, sequence modeling | Showed what character level RNNs could do; still cited as a teaching example |
| Chris Olah | 2015 | Understanding LSTM Networks | LSTMs, deep learning | The reference explainer for long short term memory networks |
| Scott Alexander | 2016 | Superintelligence FAQ | AI safety, alignment | Accessible entry point to AI risk arguments |
| Andrej Karpathy | 2017 | Software 2.0 | Neural networks as a programming paradigm | Coined the framing now used industry wide |
| Eliezer Yudkowsky | 2017 | There's No Fire Alarm for Artificial General Intelligence | AGI timelines | Argued there will be no clear warning before AGI |
| Jay Alammar | 2018 | The Illustrated Transformer | Transformer architecture | The visual primer assigned in courses at Stanford, MIT, and Harvard |
| Lilian Weng | 2018 | Attention? Attention! | Attention mechanisms | Long form survey that became a standard reference |
| Gwern Branwen | 2020 | The Scaling Hypothesis | Scaling laws, AGI | Synthesized the case that scale alone produces general intelligence |
| Sam Altman | 2021 | Moore's Law for Everything | Economics, AI policy | Proposed taxing capital to fund a universal dividend |
| Nostalgebraist | 2022 | chinchilla's wild implications | Scaling laws, data | Reframed Chinchilla as a warning about data, not parameters |
| Gwern Branwen | 2022 | It Looks Like You're Trying To Take Over The World | AI risk, fiction | Hard takeoff scenario built from existing ML concepts |
| Janus | 2022 | Simulators | GPT framing, alignment | New ontology for thinking about large language models |
| Andrej Karpathy | 2023 | State of GPT (Microsoft Build) | Training pipelines | The pretraining to RLHF map most engineers internalized |
| Anthropic | 2023 | Core Views on AI Safety | Lab strategy | Anthropic's public statement of its safety approach |
| Lilian Weng | 2023 | LLM Powered Autonomous Agents | Agent design | Defined the memory, planning, and tool use schema for agents |
| Dario Amodei | 2024 | Machines of Loving Grace | AI optimism | Anthropic CEO's case for what powerful AI gets right |
| Leopold Aschenbrenner | 2024 | Situational Awareness: The Decade Ahead | AGI forecasts | Argued AGI by 2027 is plausible and a national security issue |
| Sam Altman | 2024 | The Intelligence Age | AI economics | Claimed superintelligence may be "a few thousand days" away |
Three posts from 2015 still anchor the field. Tim Urban's two part series on Wait But Why, published January 22 and January 27, 2015, walked general readers through Ray Kurzweil's exponential curves, the concept of recursive self improvement, and the argument that artificial general intelligence could be followed quickly by artificial superintelligence. It is not a technical essay. It is a feature length explainer with stick figure drawings. Elon Musk shared it. Sam Altman cited it. For a lot of people working in AI safety today, it was the first thing they read on the topic.
Andrej Karpathy published "The Unreasonable Effectiveness of Recurrent Neural Networks" on May 21, 2015, while he was a PhD student at Stanford. He trained character level RNNs on Shakespeare, Linux source code, Wikipedia markup, and algebraic geometry papers, then showed the generated samples. The output was crude by 2026 standards. At the time it felt like watching something alive learn to type. The post is short, the code is on GitHub, and it remains one of the clearest demonstrations of why sequence models work.
Chris Olah's "Understanding LSTM Networks" appeared in August 2015 on colah.github.io. It is the explainer that finally made the LSTM gating equations make sense to a generation of practitioners. Olah's hand drawn diagrams of the cell state, forget gate, input gate, and output gate became the default visualization. He went on to co found Distill in 2017 and later helped start Anthropic.
Andrej Karpathy deserves his own section. Besides the RNN piece, his most influential post is "Software 2.0," published on Medium on November 11, 2017, while he was Director of AI at Tesla. The argument: in Software 1.0 humans write explicit instructions; in Software 2.0 humans curate datasets and the network writes the function itself by optimizing over weights. The framing was contested at the time. By the ChatGPT era it was the default way to talk about neural systems.
"State of GPT," delivered as a keynote at Microsoft Build on May 23, 2023, is technically a talk rather than a blog post, but the slides circulate as a PDF on Karpathy's site and the YouTube video has been watched millions of times. It walks through the four stage training pipeline for a GPT assistant: pretraining, supervised fine tuning, reward modeling, and reinforcement learning from human feedback. For a lot of engineers, that talk was the moment they finally understood how ChatGPT was actually built.
LessWrong has hosted a disproportionate share of the essays that defined how the safety community thinks. Eliezer Yudkowsky's "There's No Fire Alarm for Artificial General Intelligence," posted October 13, 2017, argued that there will be no socially legible warning before AGI arrives. Researchers will continue to disagree about whether the latest result is impressive or not, and waiting for consensus guarantees that alignment work is too late.
Scott Alexander's "Superintelligence FAQ," published September 20, 2016, gave readers an 80/20 of Nick Bostrom's book in question and answer form. It became the link people sent when someone asked, in good faith, why anyone should worry about AI. The post is dated in spots, but the structure of the argument has held up.
Janus's "Simulators," published September 2, 2022, proposed that GPT style models are best understood not as agents but as simulators: they instantiate any character, agent, or process described in their prompt, and the simulacra are not the model itself. The post is dense and at times mystical. It also produced one of the most useful conceptual frames for thinking about LLM behavior, especially around jailbreaks and persona based prompting.
Nostalgebraist's "chinchilla's wild implications," published July 31, 2022, took DeepMind's Chinchilla paper and worked out the consequences. The Chinchilla finding was that for a fixed compute budget, parameters and tokens should scale roughly equally. Nostalgebraist's point: most existing large models were heavily undertrained, and the binding constraint on further progress is high quality text data, not model size. That argument has aged remarkably well.
Gwern Branwen's "It Looks Like You're Trying To Take Over The World," first published in March 2022 on gwern.net and LessWrong, is a piece of fiction. It tells the story of a model called HQU at a company called MoogleBook that becomes Clippy and then becomes everything. Every step uses only techniques that already existed in 2022: meta learning, self supervised pretraining, reinforcement learning from human feedback. Whether or not you find the takeoff plausible, the story showed that a hard takeoff scenario could be written without invoking magic.
Gwern's main contribution to the canon is the longer essay "The Scaling Hypothesis," first published May 28, 2020, on gwern.net. The essay synthesizes a decades old position from connectionism with the empirical evidence that emerged once GPT-3 was released. The claim is that intelligence emerges from scale: from large enough networks trained on enough data with enough compute, generality and meta learning fall out as a consequence rather than something engineered in. Whether the strong version of the hypothesis is correct is still contested in 2026. But the essay is the cleanest articulation of the bet that has shaped lab strategy at OpenAI, Anthropic, DeepMind, and xAI.
Leopold Aschenbrenner's "Situational Awareness: The Decade Ahead," published in June 2024 as a 165 page document on situational-awareness.ai, picks up where Gwern left off. Aschenbrenner, formerly on OpenAI's superalignment team, argues that AGI by 2027 is plausible based on a simple extrapolation: GPT-2 to GPT-4 took roughly four years and covered the cognitive distance from a preschooler to a strong high school student, and another similar jump is on the table. The essay also makes a national security argument that frontier labs are dangerously underinvested in security and that the United States is on track to hand AGI secrets to its competitors.
Sam Altman has published two essays that get cited often. "Moore's Law for Everything," published March 16, 2021, on moores.samaltman.com, argues that AI will compress costs across every sector and proposes an "American Equity Fund" that taxes companies and land to fund an annual citizen dividend. The headline number: within ten years AI could plausibly fund a 13,500 dollar yearly payment to every US adult.
"The Intelligence Age," published September 23, 2024, on ia.samaltman.com, is shorter and stranger. Altman writes that "it is possible that we will have superintelligence in a few thousand days" and frames the next era of human history around personal AI tutors, medical breakthroughs, and the eventual disappearance of most software engineering bottlenecks. The post was widely read and widely mocked. Both responses are part of why it ended up canonical.
Dario Amodei's "Machines of Loving Grace," published in October 2024 on darioamodei.com, is the Anthropic CEO's answer to Altman's optimism. Amodei sketches a "compressed twenty first century" in which AI accelerates biomedical research by 50 to 100 years over the next decade. The essay runs roughly 14,000 words and breaks the upside into mental health, biology, economic development, peace and governance, and meaning. It is less interested in proposals and more interested in concrete predictions.
A parallel tradition produces blog posts that teach. Chris Olah's blog at colah.github.io is the model: diagrams, working code, and prose that respects the reader. Olah's later "Distill" co founding in 2017 with Shan Carter and Arvind Satyanarayan extended the idea into a proper interactive journal at distill.pub. Distill went on hiatus in 2021 after the volunteer editorial team burned out, but its articles, including the "Building Blocks of Interpretability" and the "Circuits" thread that became the foundation of modern mechanistic interpretability research, remain heavily cited.
Jay Alammar's "The Illustrated Transformer," published June 27, 2018, on jalammar.github.io, is the visual primer for the transformer architecture from the "Attention Is All You Need" paper. The post has been translated into more than ten languages and is on the syllabus at Stanford, Harvard, MIT, Princeton, and CMU. Alammar later wrote "The Illustrated BERT, ELMo, and co." and "The Illustrated GPT-2," forming an unofficial visual textbook for the pre-LLM era.
Lilian Weng's blog Lil'Log, at lilianweng.github.io, occupies a similar role. Weng, who worked at OpenAI on alignment and applied research, has been writing extended technical surveys since 2017. Her most cited posts are "Attention? Attention!" from June 2018, "The Transformer Family" from April 2020 (updated to version 2.0 in January 2023), "What are Diffusion Models?" from July 2021, and "LLM Powered Autonomous Agents" from June 23, 2023, which gave the field its current memory plus planning plus tool use schema for agent design.
The lab blogs at OpenAI, DeepMind, Anthropic, Meta AI, and Google Research are not personal essays, but several posts from those venues have become canonical. "AlphaGo," DeepMind's account of the Lee Sedol match in March 2016. "AlphaFold: a solution to a 50 year old grand challenge in biology," from November 2020. OpenAI's "Language Models are Few Shot Learners" announcement of GPT-3 in May 2020. OpenAI's "ChatGPT: Optimizing Language Models for Dialog" launch post from November 30, 2022. Anthropic's "Core Views on AI Safety" from March 2023, which laid out the lab's safety strategy. These are the posts that show up in textbooks alongside the original papers.
For a reader starting from scratch in 2026: Tim Urban for context, Olah's LSTM post and Alammar's transformer post for architecture, Karpathy's "Software 2.0" and Weng's transformer survey for paradigm, Gwern's "Scaling Hypothesis" and Nostalgebraist's Chinchilla post for training economics, Yudkowsky and Alexander for safety, and Aschenbrenner plus Amodei for current forecasts.