Blog posts

Artificial Intelligence

13 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

23 citations

Revision

v3 · 2,552 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

See also: Guides, Papers, Books

Blog posts have shaped how the public, researchers, and policymakers understand artificial intelligence. Long before peer reviewed papers reach a broad audience, a handful of essays circulate on Twitter, Hacker News, and LessWrong, where they set the vocabulary for everything that follows. Terms like "scaling hypothesis," "Software 2.0," and "the intelligence age" all entered the discourse through individual posts on personal websites. This page collects the essays that practitioners point to when they recommend reading lists, ordered roughly by influence rather than recency.

A few patterns worth noting. The most durable posts come from researchers who maintain personal blogs over many years, not from corporate communications teams. They are usually long, often technical, and rarely written for a general audience. They also age strangely: Karpathy's "Software 2.0" looked speculative in 2017 and looks obvious in 2026.^[6]

the canon

Author	Year	Title	Topic	Why it matters
Tim Urban	2015	The AI Revolution: The Road to Superintelligence	Superintelligence, exponential growth	Brought AI risk to a mass audience through Wait But Why^[1]
Andrej Karpathy	2015	The Unreasonable Effectiveness of Recurrent Neural Networks	RNNs, sequence modeling	Showed what character level RNNs could do; still cited as a teaching example^[3]
Chris Olah	2015	Understanding LSTM Networks	LSTMs, deep learning	The reference explainer for long short term memory networks^[4]
Scott Alexander	2016	Superintelligence FAQ	AI safety, alignment	Accessible entry point to AI risk arguments^[5]
Andrej Karpathy	2017	Software 2.0	Neural networks as a programming paradigm	Coined the framing now used industry wide^[6]
Eliezer Yudkowsky	2017	There's No Fire Alarm for Artificial General Intelligence	AGI timelines	Argued there will be no clear warning before AGI^[7]
Jay Alammar	2018	The Illustrated Transformer	Transformer architecture	The visual primer assigned in courses at Stanford, MIT, and Harvard^[8]
Lilian Weng	2018	Attention? Attention!	Attention mechanisms	Long form survey that became a standard reference^[9]
Gwern Branwen	2020	The Scaling Hypothesis	Scaling laws, AGI	Synthesized the case that scale alone produces general intelligence^[12]
Sam Altman	2021	Moore's Law for Everything	Economics, AI policy	Proposed taxing capital to fund a universal dividend^[13]
Nostalgebraist	2022	chinchilla's wild implications	Scaling laws, data	Reframed Chinchilla as a warning about data, not parameters^[14]
Gwern Branwen	2022	It Looks Like You're Trying To Take Over The World	AI risk, fiction	Hard takeoff scenario built from existing ML concepts^[15]
Janus	2022	Simulators	GPT framing, alignment	New ontology for thinking about large language models^[16]
Andrej Karpathy	2023	State of GPT (Microsoft Build)	Training pipelines	The pretraining to RLHF map most engineers internalized^[17]
Anthropic	2023	Core Views on AI Safety	Lab strategy	Anthropic's public statement of its safety approach^[18]
Lilian Weng	2023	LLM Powered Autonomous Agents	Agent design	Defined the memory, planning, and tool use schema for agents^[19]
Dario Amodei	2024	Machines of Loving Grace	AI optimism	Anthropic CEO's case for what powerful AI gets right^[21]
Leopold Aschenbrenner	2024	Situational Awareness: The Decade Ahead	AGI forecasts	Argued AGI by 2027 is plausible and a national security issue^[20]
Sam Altman	2024	The Intelligence Age	AI economics	Claimed superintelligence may be "a few thousand days" away^[22]

the 2015 wave

Three posts from 2015 still anchor the field. Tim Urban's two part series on Wait But Why, published January 22 and January 27, 2015,^[1]^[2] walked general readers through Ray Kurzweil's exponential curves, the concept of recursive self improvement, and the argument that artificial general intelligence could be followed quickly by artificial superintelligence. It is not a technical essay. It is a feature length explainer with stick figure drawings. Elon Musk shared it. Sam Altman cited it. For a lot of people working in AI safety today, it was the first thing they read on the topic.

Andrej Karpathy published "The Unreasonable Effectiveness of Recurrent Neural Networks" on May 21, 2015, while he was a PhD student at Stanford.^[3] He trained character level RNNs on Shakespeare, Linux source code, Wikipedia markup, and algebraic geometry papers, then showed the generated samples. The output was crude by 2026 standards. At the time it felt like watching something alive learn to type. The post is short, the code is on GitHub, and it remains one of the clearest demonstrations of why sequence models work.

Chris Olah's "Understanding LSTM Networks" appeared in August 2015 on colah.github.io.^[4] It is the explainer that finally made the LSTM gating equations make sense to a generation of practitioners. Olah's hand drawn diagrams of the cell state, forget gate, input gate, and output gate became the default visualization. He went on to co found Distill in 2017 and later helped start Anthropic.

karpathy's posts

Andrej Karpathy deserves his own section. Besides the RNN piece, his most influential post is "Software 2.0," published on Medium on November 11, 2017, while he was Director of AI at Tesla.^[6] The argument: in Software 1.0 humans write explicit instructions; in Software 2.0 humans curate datasets and the network writes the function itself by optimizing over weights. The framing was contested at the time. By the ChatGPT era it was the default way to talk about neural systems.

"State of GPT," delivered as a keynote at Microsoft Build on May 23, 2023,^[17] is technically a talk rather than a blog post, but the slides circulate as a PDF on Karpathy's site and the YouTube video has been watched millions of times. It walks through the four stage training pipeline for a GPT assistant: pretraining, supervised fine tuning, reward modeling, and reinforcement learning from human feedback. For a lot of engineers, that talk was the moment they finally understood how ChatGPT was actually built.

lesswrong and the safety canon

LessWrong has hosted a disproportionate share of the essays that defined how the safety community thinks. Eliezer Yudkowsky's "There's No Fire Alarm for Artificial General Intelligence," posted October 13, 2017,^[7] argued that there will be no socially legible warning before AGI arrives. Researchers will continue to disagree about whether the latest result is impressive or not, and waiting for consensus guarantees that alignment work is too late.

Scott Alexander's "Superintelligence FAQ," published September 20, 2016,^[5] gave readers an 80/20 of Nick Bostrom's book in question and answer form. It became the link people sent when someone asked, in good faith, why anyone should worry about AI. The post is dated in spots, but the structure of the argument has held up.

Janus's "Simulators," published September 2, 2022,^[16] proposed that GPT style models are best understood not as agents but as simulators: they instantiate any character, agent, or process described in their prompt, and the simulacra are not the model itself. The post is dense and at times mystical. It also produced one of the most useful conceptual frames for thinking about LLM behavior, especially around jailbreaks and persona based prompting.

Nostalgebraist's "chinchilla's wild implications," published July 31, 2022,^[14] took DeepMind's Chinchilla paper and worked out the consequences. The Chinchilla finding was that for a fixed compute budget, parameters and tokens should scale roughly equally. Nostalgebraist's point: most existing large models were heavily undertrained, and the binding constraint on further progress is high quality text data, not model size. That argument has aged remarkably well.

Gwern Branwen's "It Looks Like You're Trying To Take Over The World," first published in March 2022 on gwern.net and LessWrong,^[15] is a piece of fiction. It tells the story of a model called HQU at a company called MoogleBook that becomes Clippy and then becomes everything. Every step uses only techniques that already existed in 2022: meta learning, self supervised pretraining, reinforcement learning from human feedback. Whether or not you find the takeoff plausible, the story showed that a hard takeoff scenario could be written without invoking magic.

the scaling hypothesis tradition

Gwern's main contribution to the canon is the longer essay "The Scaling Hypothesis," first published May 28, 2020, on gwern.net.^[12] The essay synthesizes a decades old position from connectionism with the empirical evidence that emerged once GPT-3 was released. The claim is that intelligence emerges from scale: from large enough networks trained on enough data with enough compute, generality and meta learning fall out as a consequence rather than something engineered in. Whether the strong version of the hypothesis is correct is still contested in 2026. But the essay is the cleanest articulation of the bet that has shaped lab strategy at OpenAI, Anthropic, DeepMind, and xAI.

Leopold Aschenbrenner's "Situational Awareness: The Decade Ahead," published in June 2024 as a 165 page document on situational-awareness.ai,^[20] picks up where Gwern left off. Aschenbrenner, formerly on OpenAI's superalignment team, argues that AGI by 2027 is plausible based on a simple extrapolation: GPT-2 to GPT-4 took roughly four years and covered the cognitive distance from a preschooler to a strong high school student, and another similar jump is on the table. The essay also makes a national security argument that frontier labs are dangerously underinvested in security and that the United States is on track to hand AGI secrets to its competitors.

sam altman's manifestos

Sam Altman has published two essays that get cited often. "Moore's Law for Everything," published March 16, 2021, on moores.samaltman.com,^[13] argues that AI will compress costs across every sector and proposes an "American Equity Fund" that taxes companies and land to fund an annual citizen dividend. The headline number: within ten years AI could plausibly fund a 13,500 dollar yearly payment to every US adult.

"The Intelligence Age," published September 23, 2024, on ia.samaltman.com,^[22] is shorter and stranger. Altman writes that "it is possible that we will have superintelligence in a few thousand days" and frames the next era of human history around personal AI tutors, medical breakthroughs, and the eventual disappearance of most software engineering bottlenecks. The post was widely read and widely mocked. Both responses are part of why it ended up canonical.

Dario Amodei's "Machines of Loving Grace," published in October 2024 on darioamodei.com,^[21] is the Anthropic CEO's answer to Altman's optimism. Amodei sketches a "compressed twenty first century" in which AI accelerates biomedical research by 50 to 100 years over the next decade. The essay runs roughly 14,000 words and breaks the upside into mental health, biology, economic development, peace and governance, and meaning. It is less interested in proposals and more interested in concrete predictions.

the explainer tradition

A parallel tradition produces blog posts that teach. Chris Olah's blog at colah.github.io is the model: diagrams, working code, and prose that respects the reader. Olah's later "Distill" co founding in 2017 with Shan Carter and Arvind Satyanarayan extended the idea into a proper interactive journal at distill.pub.^[10] Distill went on hiatus in 2021 after the volunteer editorial team burned out,^[11] but its articles, including the "Building Blocks of Interpretability" and the "Circuits" thread that became the foundation of modern mechanistic interpretability research, remain heavily cited.

Jay Alammar's "The Illustrated Transformer," published June 27, 2018, on jalammar.github.io,^[8] is the visual primer for the transformer architecture from the "Attention Is All You Need" paper. The post has been translated into more than ten languages and is on the syllabus at Stanford, Harvard, MIT, Princeton, and CMU. Alammar later wrote "The Illustrated BERT, ELMo, and co." and "The Illustrated GPT-2," forming an unofficial visual textbook for the pre-LLM era.

Lilian Weng's blog Lil'Log, at lilianweng.github.io, occupies a similar role. Weng, who worked at OpenAI on alignment and applied research, has been writing extended technical surveys since 2017. Her most cited posts are "Attention? Attention!" from June 2018,^[9] "The Transformer Family" from April 2020 (updated to version 2.0 in January 2023), "What are Diffusion Models?" from July 2021, and "LLM Powered Autonomous Agents" from June 23, 2023,^[19] which gave the field its current memory plus planning plus tool use schema for agent design.

corporate research blogs

The lab blogs at OpenAI, DeepMind, Anthropic, Meta AI, and Google Research are not personal essays, but several posts from those venues have become canonical. "AlphaGo," DeepMind's account of the Lee Sedol match in March 2016. "AlphaFold: a solution to a 50 year old grand challenge in biology," from November 2020. OpenAI's "Language Models are Few Shot Learners" announcement of GPT-3 in May 2020. OpenAI's "ChatGPT: Optimizing Language Models for Dialog" launch post from November 30, 2022. Anthropic's "Core Views on AI Safety" from March 2023, which laid out the lab's safety strategy.^[18] These are the posts that show up in textbooks alongside the original papers.

reading order

For a reader starting from scratch in 2026: Tim Urban for context, Olah's LSTM post and Alammar's transformer post for architecture, Karpathy's "Software 2.0" and Weng's transformer survey for paradigm, Gwern's "Scaling Hypothesis" and Nostalgebraist's Chinchilla post for training economics, Yudkowsky and Alexander for safety, and Aschenbrenner plus Amodei for current forecasts.

references

Tim Urban, "The AI Revolution: The Road to Superintelligence," Wait But Why, January 22, 2015. https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html ↩
Tim Urban, "The AI Revolution: Our Immortality or Extinction," Wait But Why, January 27, 2015. https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html ↩
Andrej Karpathy, "The Unreasonable Effectiveness of Recurrent Neural Networks," karpathy.github.io, May 21, 2015. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ ↩
Chris Olah, "Understanding LSTM Networks," colah.github.io, August 27, 2015. https://colah.github.io/posts/2015-08-Understanding-LSTMs/ ↩
Scott Alexander, "Superintelligence FAQ," LessWrong, September 20, 2016. https://www.lesswrong.com/posts/LTtNXM9shNM9AC2mp/superintelligence-faq ↩
Andrej Karpathy, "Software 2.0," Medium, November 11, 2017. https://karpathy.medium.com/software-2-0-a64152b37c35 ↩
Eliezer Yudkowsky, "There's No Fire Alarm for Artificial General Intelligence," LessWrong, October 13, 2017. https://www.lesswrong.com/posts/BEtzRE2M5m9YEAQpX/there-s-no-fire-alarm-for-artificial-general-intelligence ↩
Jay Alammar, "The Illustrated Transformer," jalammar.github.io, June 27, 2018. https://jalammar.github.io/illustrated-transformer/ ↩
Lilian Weng, "Attention? Attention!," Lil'Log, June 24, 2018. https://lilianweng.github.io/posts/2018-06-24-attention/ ↩
Distill, "Distill: a journal for understanding machine learning," launched March 2017. https://distill.pub/about/ ↩
Distill, "Distill Hiatus," distill.pub, July 2, 2021. https://distill.pub/2021/distill-hiatus/ ↩
Gwern Branwen, "The Scaling Hypothesis," gwern.net, May 28, 2020. https://gwern.net/scaling-hypothesis ↩
Sam Altman, "Moore's Law for Everything," moores.samaltman.com, March 16, 2021. https://moores.samaltman.com/ ↩
Nostalgebraist, "chinchilla's wild implications," LessWrong, July 31, 2022. https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications ↩
Gwern Branwen, "It Looks Like You're Trying To Take Over The World," gwern.net, March 2022. https://gwern.net/fiction/clippy ↩
Janus, "Simulators," LessWrong, September 2, 2022. https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators ↩
Andrej Karpathy, "State of GPT," Microsoft Build keynote, May 23, 2023. https://karpathy.ai/stateofgpt.pdf ↩
Anthropic, "Core Views on AI Safety: When, Why, What, and How," anthropic.com, March 8, 2023. https://www.anthropic.com/news/core-views-on-ai-safety ↩
Lilian Weng, "LLM Powered Autonomous Agents," Lil'Log, June 23, 2023. https://lilianweng.github.io/posts/2023-06-23-agent/ ↩
Leopold Aschenbrenner, "Situational Awareness: The Decade Ahead," situational-awareness.ai, June 2024. https://situational-awareness.ai/ ↩
Dario Amodei, "Machines of Loving Grace," darioamodei.com, October 2024. https://www.darioamodei.com/essay/machines-of-loving-grace ↩
Sam Altman, "The Intelligence Age," ia.samaltman.com, September 23, 2024. https://ia.samaltman.com/ ↩
Roon, "Text Is the Universal Interface," Scale AI Blog, September 2022. https://scale.com/blog/text-universal-interface

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Books Guides

the canon

the 2015 wave

karpathy's posts

lesswrong and the safety canon

the scaling hypothesis tradition

sam altman's manifestos

the explainer tradition

corporate research blogs

reading order

see also

references

Improve this article

Related Articles

A*

LLM Anxiety

AI in transportation

AI Anxiety

AI Monarchy

AI Parasite

What links here

Related Articles

A*

LLM Anxiety

AI in transportation

AI Anxiety

AI Monarchy

AI Parasite

What links here