Andrej Karpathy

AI Companies Deep Learning OpenAI People

33 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

43 citations

Revision

v7 · 6,592 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Andrej Karpathy (born October 23, 1986) is a Slovak-Canadian computer scientist, AI researcher, educator, and entrepreneur who was a founding member of OpenAI, served as Tesla's Senior Director of AI and head of Autopilot Vision from 2017 to 2022, and coined the terms "vibe coding" and "Software 2.0." He is one of the most recognized figures in the artificial intelligence community, known for his ability to explain complex deep learning concepts in accessible terms and for his influential roles at OpenAI and Tesla. Karpathy returned to OpenAI from 2023 to 2024 to build a team working on midtraining and synthetic data generation, then worked independently as an educator, content creator, and founder of Eureka Labs, an AI-native education startup. On May 19, 2026, he announced that he had joined Anthropic to build a team using Claude to accelerate pre-training research, reporting to Nick Joseph ^[1]^[29]^[40]^[41].

Karpathy's academic contributions include co-creating the Stanford course CS231n: Convolutional Neural Networks for Visual Recognition, which became one of the most popular computer science courses at the university and introduced thousands of students to deep learning. His open-source implementations of language models, particularly nanoGPT, micrograd, makemore, and llm.c, have been widely used as educational tools. In February 2025, he coined the term "vibe coding" to describe a new style of programming in which developers rely on AI assistants to generate code from natural language descriptions rather than writing it manually. The term entered mainstream vocabulary rapidly and was named the Collins English Dictionary Word of the Year for 2025 ^[2]^[19]. In June 2025, his talk "Software Is Changing (Again)" at the inaugural Y Combinator AI Startup School introduced the framework of "Software 3.0" and the "decade of agents," and became one of the most-discussed AI talks of the year ^[30]. In 2024, TIME magazine named Karpathy to its list of the 100 Most Influential People in AI ^[3].

Early life and education

Andrej Karpathy was born on October 23, 1986, in Bratislava, Czechoslovakia (now Slovakia). His family moved to Toronto, Canada, when he was 15 years old. He completed his secondary education in Toronto and went on to study at the University of Toronto, where he earned a bachelor's degree in Computer Science and Physics, with a minor in Mathematics, in 2009. While an undergraduate at the University of Toronto, Karpathy attended Geoffrey Hinton's class and participated in Hinton's reading groups, an experience that first exposed him to neural networks and deep learning and proved formative for his career ^[1]^[4].

Karpathy continued his graduate studies at the University of British Columbia (UBC), where he received a master's degree in Computer Science in 2011. His master's research, supervised by Michiel van de Panne, focused on curriculum learning for physically simulated characters. The work explored how simulated agents could acquire complex motor skills through staged, incremental learning, drawing inspiration from how humans and animals develop physical abilities in nature. His thesis, titled "Staged Learning of Agile Motor Skills," applied these techniques to planar characters learning skills such as hopping, flipping, rolling, and continuous acrobatic movements ^[5].

Karpathy then moved to Stanford University for his PhD, which he completed in 2015 under the supervision of Fei-Fei Li, one of the leading researchers in computer vision and the creator of the ImageNet dataset. His doctoral thesis, titled "Connecting Images and Natural Language," focused on the intersection of computer vision and natural language processing, developing deep learning models that could generate natural language descriptions of images and video content. The thesis synthesized several lines of research into scalable neural network architectures for processing visual-linguistic data, including image captioning, dense captioning, and video understanding ^[6].

Milestone	Year	Details
Born	1986	Bratislava, Czechoslovakia (now Slovakia)
Moved to Canada	~2001	Settled in Toronto
BSc Computer Science and Physics (minor in Mathematics), University of Toronto	2009	Attended Geoffrey Hinton's class and reading groups
MSc Computer Science, University of British Columbia	2011	Advisor: Michiel van de Panne; thesis on physically simulated characters
PhD Computer Science, Stanford University	2015	Advisor: Fei-Fei Li; thesis: "Connecting Images and Natural Language"

Academic contributions

CS231n: Convolutional Neural Networks for Visual Recognition

During his PhD at Stanford, Karpathy designed and became the lead instructor of CS231n, a course on convolutional neural networks for visual recognition. The course was one of the first dedicated deep learning courses offered at a major university, and it quickly grew from approximately 150 students in its first offering in 2015 to over 750 students by 2017, making it one of the largest classes at Stanford ^[6].

CS231n's lecture videos, assignments, and notes were made freely available online, where they reached an audience far beyond Stanford's campus. The course became a de facto entry point into deep learning for an entire generation of AI practitioners and is frequently cited as a formative influence by researchers and engineers working in the field today. The course emphasized building intuition for how neural networks learn, with assignments that required students to implement core components (backpropagation, convolutional layers, batch normalization) from scratch rather than relying on high-level frameworks. Karpathy co-designed the course alongside Fei-Fei Li, and lecture videos have been viewed more than 800,000 times online ^[3]^[6].

Research publications

Karpathy's academic research focused on connecting visual and linguistic understanding. His publication record spans computer vision, video classification, image captioning, and recurrent neural networks. Key papers include:

Large-Scale Video Classification with Convolutional Neural Networks (CVPR 2014). Karpathy, along with George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Fei-Fei Li, presented one of the first large-scale studies applying CNNs to video classification. The paper introduced the Sports-1M dataset, comprising over 1.1 million YouTube videos across 487 sports categories. The work explored multiple temporal fusion strategies for extending CNNs to process video data and demonstrated significant improvements over feature-based baselines. The paper has received thousands of citations and became a foundational reference in video understanding research ^[7].

Deep Visual-Semantic Alignments for Generating Image Descriptions (CVPR 2015). This paper presented a model that could generate natural language descriptions of image regions by aligning fragments of sentences with the visual content they describe. Using a joint embedding space and a multimodal RNN decoder, the system achieved strong results on image-sentence retrieval and image captioning benchmarks including Flickr30k and MS COCO ^[8].

ImageNet Large Scale Visual Recognition Challenge (International Journal of Computer Vision, 2015). Co-authored with Olga Russakovsky, Jia Deng, and others from the ImageNet team, this paper provided a comprehensive description of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) benchmark. Karpathy contributed analysis comparing human performance with CNN performance on ImageNet classification. His experiments revealed that humans struggled disproportionately with fine-grained recognition categories (such as distinguishing between more than 120 dog breeds), and that overall human error and top CNN error rates were converging. This paper has been cited over 30,000 times and remains one of the most influential papers in the history of computer vision ^[9].

Visualizing and Understanding Recurrent Neural Networks (2015). This work provided tools for interpreting what recurrent neural networks learn about the structure of text, opening a window into the internal representations of sequence models ^[10].

DenseCap: Fully Convolutional Localization Networks for Dense Captioning (CVPR 2016). Co-authored with Justin Johnson and Fei-Fei Li, this paper introduced the dense captioning task, which requires a model to both localize and describe salient regions in images using natural language. The proposed Fully Convolutional Localization Network (FCLN) could process an image in a single forward pass, required no external region proposals, and was trained end-to-end. The system was evaluated on the Visual Genome dataset, which contains 94,000 images and 4.1 million region-grounded captions ^[11].

Paper	Venue	Year	Key Contribution
Large-Scale Video Classification with CNNs	CVPR	2014	Sports-1M dataset; temporal fusion strategies for video CNNs
Deep Visual-Semantic Alignments	CVPR	2015	Image captioning via joint visual-linguistic embeddings
ImageNet Large Scale Visual Recognition Challenge	IJCV	2015	Benchmark description; human vs. CNN performance analysis
Visualizing and Understanding RNNs	arXiv	2015	Interpretability tools for recurrent neural networks
DenseCap	CVPR	2016	Dense captioning with fully convolutional localization

The Unreasonable Effectiveness of Recurrent Neural Networks

In May 2015, Karpathy published "The Unreasonable Effectiveness of Recurrent Neural Networks" on his personal blog. The post became one of the most widely read introductions to RNNs in the deep learning community. It demonstrated how character-level language models trained on various text corpora could learn to generate Shakespeare, Wikipedia articles, LaTeX source code, and Linux kernel code. The post accompanied the release of char-rnn, an open-source implementation of multi-layer LSTM character-level language models written in Torch (Lua). The char-rnn repository allowed users to train character-level models on any text dataset and sample new text that mimicked the style and structure of the training data. The project became one of the early viral open-source deep learning tools, inspiring numerous reimplementations and adaptations ^[12].

Career

OpenAI (2015-2017)

Karpathy was among the founding members of OpenAI when the organization was announced in December 2015. The founding team included Sam Altman, Elon Musk, Ilya Sutskever, Greg Brockman, Wojciech Zaremba, John Schulman, and several others. At OpenAI, Karpathy contributed to research on generative models, reinforcement learning, and the early explorations of language modeling that would eventually lead to the GPT series of models. His time at OpenAI overlapped with a period of foundational research at the organization, when the team was relatively small and focused on publishing open research ^[1].

A signature project of this period was Karpathy's work on Mini World of Bits, part of OpenAI's broader Universe platform announced in December 2016. World of Bits proposed training agents that perceived web pages as pixels and interacted with them through a simulated keyboard and mouse, an early formulation of the digital agent paradigm later realized by products such as OpenAI's Operator. Karpathy has retrospectively described the effort as conceptually correct but "incorrectly sequenced," because the project predated the large language models and pretraining advances that would make capable browser agents practical ^[29]^[31].

Tesla (2017-2022)

In June 2017, Karpathy left OpenAI to join Tesla as Director of AI, reporting directly to Elon Musk. He was subsequently promoted to Senior Director of AI and head of the Autopilot Vision team. At Tesla, his responsibilities encompassed the full stack of the company's autonomous driving AI: neural network architecture design, training infrastructure, data labeling operations, and the deployment of Autopilot and Full Self-Driving (FSD) features to Tesla's global fleet of vehicles. According to his own description of the role, this scope included "all in-house data labeling, neural network training and deployment on Tesla's custom inference chip," and toward the end of his tenure he also worked briefly on the Tesla Optimus humanoid robot program ^[13]^[29].

What was Karpathy's vision-only approach to self-driving?

One of Karpathy's most significant and controversial contributions at Tesla was championing the "vision-only" approach to autonomous driving. While most other companies developing self-driving technology (including Waymo, Cruise, and most academic research groups) relied on a combination of cameras, lidar, radar, and high-definition maps, Tesla under Karpathy's technical leadership pursued a strategy that used cameras as the primary sensor, augmented by neural networks that interpreted the visual data directly. In 2021, Tesla went further and removed radar from its newer vehicles, relying entirely on camera-based perception ^[13].

Karpathy argued that the human visual system demonstrates that cameras alone provide sufficient information for driving, and that the key challenge was building neural networks capable of extracting the necessary information from image data. This approach required building massive data pipelines, which Karpathy referred to as the "data engine": a system for automatically identifying edge cases in the fleet's driving data, labeling them, and using them to retrain and improve the neural networks in a continuous feedback loop ^[13].

The vision-only approach generated significant debate within the autonomous driving community. Critics argued that relying solely on cameras introduced unnecessary risk, particularly in adverse conditions (rain, fog, glare) where cameras perform poorly. Supporters pointed to the cost advantages and the theoretical sufficiency of visual information. As of 2026, Tesla continues to use the vision-centric approach for its FSD system, though the company has reintroduced radar on some models ^[13].

Tesla AI Day, Dojo, and Optimus

Karpathy delivered technical presentations at Tesla's AI Day events in August 2021 and September 2022, providing unusually detailed looks at the company's neural network architectures, training infrastructure, and data pipeline. These presentations were widely watched and discussed in the AI community, as they offered a rare window into the engineering of a production-scale AI system processing data from millions of vehicles on the road. His explanations of Tesla's multi-camera "BEV" (bird's-eye view) neural network architecture and the auto-labeling pipeline were particularly well received, and his team's work was the primary consumer of Tesla's in-house Dojo training supercomputer and D1 chip program ^[13]^[32].

At AI Day 2022 (September 30, 2022), Tesla presented the first untethered prototype of the Optimus humanoid robot, whose perception stack was largely a port of the Autopilot computer vision and occupancy networks that Karpathy's team had developed. Karpathy's own retrospective bio notes that he worked "briefly" on Tesla Optimus during this period, before his departure ^[29]^[32].

Why did Karpathy leave Tesla?

Karpathy took a four-month sabbatical from Tesla in early 2022 and announced his departure from the company in July 2022, posting on X (then Twitter) that "it's been a great pleasure to help Tesla towards its goals over the last 5 years and a difficult decision to part ways." He indicated that he wanted to spend time on personal projects, including education and content creation ^[14]^[33].

Independent period (2022-2023)

After leaving Tesla, Karpathy entered a period of independent work focused primarily on education and open-source software. During this time, he produced several notable projects:

minGPT and nanoGPT. Karpathy released minGPT (2022) and its successor nanoGPT (2023), minimal implementations of the GPT architecture in PyTorch. These projects distilled the core ideas behind Transformer-based language models into clean, readable codebases of a few hundred lines. nanoGPT, in particular, became extremely popular on GitHub, earning tens of thousands of stars. It was designed to be simple enough for a single person to understand completely while still being capable of training a functional language model on consumer hardware. The nanoGPT repository could reproduce GPT-2 (124M parameters) on OpenWebText, running on a single 8xA100 40GB node in about four days ^[15].

Neural Networks: Zero to Hero. Karpathy launched a YouTube channel focused on deep learning education and built it around a structured lecture series called "Neural Networks: Zero to Hero." The series walks viewers through building neural networks from scratch, opening with the micrograd autograd engine, progressing through the five-part "makemore" sequence (bigrams, MLP, batch normalization, manual backpropagation, and a WaveNet-style convolutional architecture), and culminating in a full from-scratch implementation of a GPT-style language model and tokenizer. The companion GitHub repository (karpathy/nn-zero-to-hero) hosts the Jupyter notebooks for every lecture and has accumulated tens of thousands of stars. His longer-form videos, including "Let's reproduce GPT-2 (124M)" (2024), "Deep Dive into LLMs like ChatGPT" (3 hours 31 minutes, released February 2025), and "How I Use LLMs" (2 hours 11 minutes), attracted large audiences. As of early 2026, his YouTube channel has over 1 million subscribers ^[16]^[34].

Return to OpenAI (2023-2024)

On February 9, 2023, Karpathy announced that he was returning to OpenAI. On his personal site he later summarised the role as: "I came back to OpenAI where I built a new team working on midtraining and synthetic data generation." His second stint at the company lasted roughly one year ^[1]^[29].

Karpathy departed OpenAI for the second time on February 13, 2024. In a post on X, he wrote: "Nothing 'happened' and it's not a result of any particular event, issue or drama. Actually, being at OpenAI over the last ~year has been really great." He described his departure as a natural decision to pursue personal projects and expressed no ill will toward the organization ^[17].

Independent work and Eureka Labs (2024-present)

Since leaving OpenAI in February 2024, Karpathy has focused on building Eureka Labs and expanding his presence as a public educator and thought leader in AI.

What is Eureka Labs?

On July 16, 2024, Karpathy announced the founding of Eureka Labs, an AI-native education startup incorporated in Delaware in June 2024. The company's central premise, summarised on its homepage, is that it is "building a new kind of school that is AI native" in which expert-designed course materials are delivered through an "AI Teaching Assistant" that personalizes the experience for each learner. Eureka Labs' first planned course is "LLM101n: Let's Build a Storyteller," an undergraduate-level course in which students build a working language model from scratch, guided by an AI teaching assistant; the course material is being developed in the open on GitHub, though as of early 2026 the full course has not yet launched ^[18]^[35].

Karpathy has described his vision for Eureka Labs as creating a new kind of educational institution where the core course content is designed by human domain experts but the delivery, exercises, feedback, and personalization are handled by AI. He sees this as addressing the fundamental bottleneck in education: the scarcity of excellent teachers. The startup plans to run both digital and physical cohorts of students going through the materials together ^[18].

What is vibe coding?

On February 2, 2025, Karpathy posted on X: "There's a new kind of coding I call 'vibe coding,' where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good" ^[2].

The term described an approach to software development in which the programmer provides high-level natural language instructions to an AI coding assistant and accepts the generated code without carefully reviewing or understanding every line. Karpathy noted that he used voice input (via SuperWhisper) to describe what he wanted, accepted all suggestions from the AI, and re-prompted when something looked off rather than manually debugging. He later reflected that the tweet was a casual "shower of thoughts" post that he simply fired off, and that after 17 years on Twitter he still could not predict which posts would go viral ^[2].

The concept struck a nerve. Within weeks, "vibe coding" had become one of the most discussed terms in the technology world. The post was viewed over 4.5 million times. It inspired debate about the future of software engineering, the role of code literacy, and whether AI-assisted development represented a democratization of programming or a dangerous abdication of understanding. In November 2025, Collins English Dictionary named "vibe coding" its Word of the Year for 2025, defining it as "the use of artificial intelligence prompted by natural language to assist with the writing of computer code." Collins managing director Alex Beecroft said the choice "perfectly captures how language is evolving alongside technology" and "signals a major shift in software development, where AI is making coding more accessible" ^[19].

By early 2026, Karpathy himself noted that the concept had already evolved. As language models grew more capable, what had initially seemed like a casual, experimental approach was becoming a standard professional workflow, though typically with more oversight and scrutiny than the original "vibe coding" philosophy implied. Karpathy suggested that the era of pure vibe coding was already giving way to more structured forms of AI-assisted development ^[20].

"Software Is Changing (Again)" at YC AI Startup School

On June 16, 2025, Karpathy delivered the keynote "Software Is Changing (Again)" at the inaugural Y Combinator AI Startup School in San Francisco, speaking to roughly 2,500 AI students and founders. The talk extended his "Software 2.0" framework into a third paradigm, dubbed "Software 3.0," in which prompts written in English become the source code, neural network weights are the runtime substrate, and large language models themselves are treated as "a new kind of computer." Karpathy argued that LLMs simultaneously resemble utilities, fabrication plants, and operating systems, but most closely behave like 1960s-era mainframes that users access through APIs, with text-based "terminals" and only the beginnings of a graphical UI ^[30]^[36].

Two of the talk's most-quoted predictions framed much of the subsequent year of industry discussion: that the next ten years should be understood as the "decade of agents" rather than the "year of agents," and that practical software products will sit on an "autonomy slider" between fully manual and fully autonomous operation, with tools such as Cursor and Perplexity as canonical examples of partial-autonomy applications. Addressing the 2025 "year of agents" claim directly, Karpathy cautioned, "this is the decade of agents, and this is going to be quite some time," adding that "we need humans in the loop" and "we need to do this carefully" ^[30]^[36]. He illustrated the autonomy slider with an Iron Man analogy, noting that the suit is "both an augmentation that Tony Stark can drive and also an agent," and called for new infrastructure aimed explicitly at LLM consumers, including standardized llms.txt files for websites and documentation written for models as much as for humans. A live recording was published by Y Combinator and became one of the most-watched AI talks of 2025 ^[30]^[36].

Dwarkesh Patel podcast (October 2025)

On October 17, 2025, Karpathy appeared on the Dwarkesh Podcast for a 2-hour, 25-minute conversation titled "AGI is still a decade away," which quickly became one of the most widely circulated AI interviews of the year. Karpathy argued that, despite his daily use of impressive prototype agents, AGI is still roughly ten years away, that current agents fail because they are not sufficiently intelligent, multimodal, or capable of continual learning and memory, and that reinforcement learning is "terrible" relative to how humans actually learn. Describing why he considers RL so inefficient, he said, "Reinforcement learning is terrible. It just so happens that everything that we had before is much worse," and characterized the method as "sucking supervision through a straw" because a single scalar reward is broadcast across every token of a long trajectory. He also predicted that, once it arrives, AGI is likely to "blend into" the historical ~2% annual GDP growth trend rather than produce a sudden discontinuity ^[27]^[37].

The interview prompted extensive commentary in the financial and technology press, with outlets such as Fortune characterizing it as a notable counterweight to more aggressive AGI timelines from peers and labs ^[38].

Anthropic (2026-present)

When did Karpathy join Anthropic, and what is he working on?

On May 19, 2026, Karpathy announced that he had joined Anthropic, writing on X: "I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D" ^[40]^[42]. According to Anthropic, he started that week and is building a new team focused on using Claude to accelerate pre-training research, the large-scale training stage that gives the company's models their core knowledge and capabilities. He reports to Nick Joseph, who leads pre-training at Anthropic ^[40]^[41].

The hire was widely covered as part of an intensifying competition for senior AI researchers between Anthropic and rivals such as OpenAI and Google. Reporters framed it as a signal that Anthropic believes AI-assisted research, rather than pure compute, is how it intends to stay competitive, and noted that the mandate, applying current Claude models to the research that informs the next generation of models, fit Karpathy's longstanding interest in AI-assisted research, echoing his March 2026 autoresearch project ^[40]^[41].

Karpathy framed the move as a return to hands-on research rather than an exit from education. He said he remains "deeply passionate about education" and plans to resume his work on Eureka Labs "in time," and did not describe the startup as shut down ^[40].

Open-source projects

Karpathy's open-source work is distinctive for its pedagogical intent. Rather than building production-ready software, he deliberately creates minimal implementations that sacrifice features for clarity, making it possible for learners to read and understand entire systems.

Project	Year	Language	Description	GitHub Stars
char-rnn	2015	Lua (Torch)	Multi-layer LSTM character-level language models	11k+
micrograd	2020	Python	Scalar-valued autograd engine and neural network library (~150 lines)	15k+
minGPT	2022	Python (PyTorch)	Minimal GPT implementation (~300 lines)	20k+
nanoGPT	2023	Python (PyTorch)	Simplified GPT training codebase; reproduces GPT-2 (124M)	40k+
nn-zero-to-hero	2022-2024	Python (Jupyter)	Lecture notebooks for the "Neural Networks: Zero to Hero" YouTube series (micrograd, makemore parts 1-5, GPT, tokenizer)	14k+
makemore	2022	Python (PyTorch)	Character-level autoregressive language model used as the teaching scaffold across the Zero to Hero series	included in nn-zero-to-hero
minbpe	2024	Python	Minimal Byte Pair Encoding tokenizer implementation	12k+
llm.c	2024	C/CUDA	GPT-2 training in ~1,000 lines of C with no dependencies	27k+
LLM101n	2024	Python (course)	Open course materials for Eureka Labs' "Let's Build a Storyteller" class (repository archived until ready)	35k+
nanochat	2025	Python (PyTorch)	Full-stack ChatGPT clone pipeline (~8,000 lines); train for ~$100 in ~4 hours	8k+
microgpt	2026	Python	Single-file 200-line GPT (4,192 parameters): tokenizer, autograd, training, inference, no dependencies	New
autoresearch	2026	Python	Autonomous AI agent loop for running ML experiments (~630 lines)	New

micrograd

Released in 2020, micrograd implements a complete scalar-valued autograd engine and a small neural network library in roughly 150 lines of Python, making it possible for beginners to understand the entire mechanism of backpropagation by reading a single file. The project deliberately operates at the scalar level rather than using tensors, prioritizing pedagogical clarity over computational efficiency. micrograd is also the opening lecture of the "Neural Networks: Zero to Hero" series, where Karpathy reconstructs the library line by line on video ^[21]^[34].

makemore and Neural Networks: Zero to Hero

makemore is a character-level autoregressive language model that Karpathy uses as a recurring teaching scaffold across the "Neural Networks: Zero to Hero" lecture series. Starting from a simple bigram counts table and a single-layer linear model, the makemore notebooks progressively introduce a multi-layer perceptron, manual implementation of batch normalization, a "becoming a backprop ninja" lecture in which students hand-derive every gradient through a 2-layer MLP, and a WaveNet-style hierarchical convolutional model. By the end of the series, students have built a GPT-style Transformer from scratch in PyTorch and a Byte Pair Encoding tokenizer to go with it. The accompanying repository, nn-zero-to-hero, hosts the Jupyter notebooks for every lecture ^[34].

llm.c

In 2024, Karpathy released llm.c, an implementation of GPT-2 training in approximately 1,000 lines of C code with no external dependencies beyond the C standard library and CUDA. The project demonstrated that the core computation of training a large language model, while typically wrapped in complex framework code, is fundamentally straightforward. The C implementation matched the output of the equivalent PyTorch reference code, making it clear that the essential algorithm could be expressed without a deep learning framework ^[22].

minbpe

Released in early 2024, minbpe provides a minimal, clean implementation of the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. The repository implements three tokenizer variants and supports training a vocabulary from text, encoding text to tokens, and decoding tokens back to text. Like Karpathy's other projects, minbpe is designed primarily for educational use ^[23].

nanochat

Released on October 13, 2025, nanochat is the spiritual successor to nanoGPT. While nanoGPT covered only pretraining, nanochat provides a full-stack pipeline in roughly 8,000 lines of PyTorch covering tokenization, base pretraining, mid-training on chat and tool-use data, supervised fine-tuning, optional reinforcement learning on GSM8K, evaluation, and serving through both a CLI and a ChatGPT-like web UI. Karpathy described it as a way to build "the best ChatGPT that $100 can buy," with the basic model training in roughly four hours on a single 8xH100 GPU node that rents for about $24 per hour, and larger configurations trainable for $300 to $1,000 over longer runs ^[24].

microgpt

Released on February 12, 2026, microgpt is a single file of about 200 lines of pure Python with zero dependencies that trains and runs a GPT. The file packages the full algorithmic content of a language model, a dataset of documents, a tokenizer, a custom autograd engine, a GPT-2-like architecture, the Adam optimizer, a training loop, and an inference loop, into a single readable script. The demonstration model has just 4,192 parameters and trains in about one minute on a MacBook, with the loss falling from roughly 3.3 at random initialization to about 2.37 over 1,000 steps. Karpathy described the project as "the culmination of multiple projects (micrograd, makemore, nanogpt, etc.) and a decade-long obsession to simplify LLMs to their bare essentials," calling the result "beautiful" ^[43].

autoresearch

In March 2026, Karpathy released autoresearch, a 630-line Python script that implements an autonomous AI agent loop for running machine learning experiments. The system gives an AI agent a small LLM training setup and lets it experiment autonomously: modifying code, training for a short period, checking whether results improved, keeping or discarding changes, and repeating the cycle. Pointed at nanochat, his already-optimized GPT-2 training codebase, the agent processed approximately 700 autonomous changes over two days and found roughly 20 additive improvements that transferred to larger models, reducing the "Time to GPT-2" leaderboard metric from 2.02 hours to 1.80 hours (an 11% efficiency gain). One change the agent surfaced that Karpathy had not caught was that the QK-Norm implementation was missing a scalar multiplier, which had been making attention too diffuse across heads. Karpathy described the project as a glimpse of how AI research labs will operate in the future, stating that the goal is "not to emulate a single PhD student" but "to emulate a research community of them" ^[25].

Contributions to AI education

Karpathy occupies an unusual position in the AI community: he is both a world-class practitioner (having led AI teams at two of the most prominent AI organizations in the world) and one of the field's most effective educators. His ability to distill complex technical concepts into clear explanations, whether in blog posts, YouTube videos, or open-source code, has made him enormously influential.

Medium	Reach	Notable works
Stanford CS231n	Thousands of in-person students; millions of online viewers	Lecture videos, assignments, course notes
Blog (karpathy.github.io)	Widely read in AI community	"The Unreasonable Effectiveness of Recurrent Neural Networks" (2015); "Software 2.0" (Medium, 2017)
YouTube	1 million+ subscribers	"Neural Networks: Zero to Hero," "Let's reproduce GPT-2 (124M)," "Deep Dive into LLMs like ChatGPT," "How I Use LLMs"
GitHub	Tens of thousands of stars across projects	char-rnn, micrograd, minGPT, nanoGPT, nn-zero-to-hero, minbpe, llm.c, LLM101n, nanochat, microgpt, autoresearch
X (Twitter)	~1.9 million followers	Coined "vibe coding"; regular commentary on AI developments
Conferences and podcasts	Multi-million views	"Software Is Changing (Again)" (YC AI Startup School, June 2025); Dwarkesh Podcast (October 2025)

Software 2.0 and Software 3.0

Karpathy has articulated an influential framework for understanding the evolution of software through three paradigms. "Software 1.0" refers to traditional code written explicitly by humans. "Software 2.0" is the paradigm in which neural networks are trained on data, with the "code" (the network's weights) being learned rather than written. Karpathy coined this term in a Medium blog post published on November 11, 2017, which argued that, in many domains, the active "software development" of the future would consist of curating, growing, cleaning, and labeling datasets rather than writing imperative code ^[26]^[39].

In his June 2025 YC AI Startup School keynote, Karpathy extended the framework with "Software 3.0," in which AI systems are guided by natural language prompts, the model itself becomes the runtime, and the prompt becomes the program. He summarized the progression as "1.0 is code, 2.0 is weights, 3.0 is prompts," and described LLMs as a new kind of computer that builders should treat with the same engineering discipline they once applied to CPUs and operating systems ^[30]^[36]. This framework has been widely adopted in discussions about the future of programming and the role of AI in software development.

Career timeline

Period	Role	Organization
2015-2017	Founding member, research scientist; Mini World of Bits / Universe	OpenAI
2017-2022	Senior Director of AI; head of Autopilot Vision; briefly Tesla Optimus	Tesla
2022-2023	Independent educator, open-source developer	Independent
2023-2024	Research lead, midtraining and synthetic data generation team	OpenAI (second stint)
2024-2026	Founder	Eureka Labs
2024-present	Educator and content creator	Independent (YouTube, GitHub, blog)
2026-present	Team lead, using Claude to accelerate pre-training research (reports to Nick Joseph)	Anthropic

Awards and recognition

TIME100 AI (2024): Named to TIME magazine's list of the 100 Most Influential People in AI. TIME noted that "his biggest impact on the world may come not from his research but from his role as one of the world's foremost educators on neural networks" ^[3].
Collins Word of the Year (2025): His coinage "vibe coding" was named the Collins English Dictionary Word of the Year for 2025 ^[19].

Views and influence

How far away is AGI, according to Karpathy?

Karpathy has expressed nuanced views on the trajectory of AI. In his October 2025 Dwarkesh Patel interview, he estimated that artificial general intelligence (AGI) is likely still roughly a decade away, pushing back against more aggressive timelines predicted by some of his peers. He argued that while current large language models are impressive, they lack key capabilities (robust reasoning, multimodal understanding, continual learning, persistent memory, and reliable factual knowledge) that would be needed for AGI, and that today's "AI agents" are still too cognitively shallow to operate reliably without human supervision ^[27]^[37].

On AI safety, Karpathy has taken a pragmatic rather than alarmist position. He has emphasized the importance of understanding what AI systems are actually doing (interpretability) and building robust engineering practices around AI deployment, rather than focusing primarily on speculative long-term risks ^[27].

In June 2025, commenting on claims that self-driving technology was a solved problem, Karpathy cautioned against such conclusions, warning that the gap between impressive demos and reliable, fully autonomous operation in all conditions remained substantial ^[28].

His influence extends well beyond his formal roles. Through his educational content, open-source projects, and public commentary, Karpathy has shaped how an entire generation of developers and researchers thinks about deep learning, language models, and the practice of AI engineering.

Personal life

Karpathy lives in the San Francisco Bay Area. He is active on X (formerly Twitter) and YouTube, where his posts on AI topics regularly reach millions of people. He has described himself as someone who learns best by building things from scratch, a philosophy that pervades both his educational approach and his personal research projects ^[1].

References

Wikipedia. "Andrej Karpathy." https://en.wikipedia.org/wiki/andrej_karpathy ↩
Karpathy, A. (2025, February 2). X post coining "vibe coding." https://x.com/karpathy/status/1886192184808149383 ↩
TIME. "Andrej Karpathy: The 100 Most Influential People in AI 2024." https://time.com/7012851/andrej-karpathy/ ↩
DeepLearning.AI. "Heroes of Deep Learning: Andrej Karpathy." https://www.deeplearning.ai/blog/hodl-andrej-karpathy/ ↩
Karpathy, A. (2011). "Staged Learning of Agile Motor Skills." MSc Thesis, University of British Columbia. https://open.library.ubc.ca/soa/cIRcle/collections/ubctheses/24/items/1.0051435 ↩
Stanford Computer Science. "Andrej Karpathy." https://cs.stanford.edu/people/karpathy/ ↩
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). "Large-Scale Video Classification with Convolutional Neural Networks." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://cs.stanford.edu/people/karpathy/deepvideo/ ↩
Karpathy, A., & Li, F.-F. (2015). "Deep Visual-Semantic Alignments for Generating Image Descriptions." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://cs.stanford.edu/people/karpathy/deepimagesent/ ↩
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., & Fei-Fei, L. (2015). "ImageNet Large Scale Visual Recognition Challenge." International Journal of Computer Vision, 115(3), 211-252. https://arxiv.org/abs/1409.0575 ↩
Karpathy, A., Johnson, J., & Li, F.-F. (2015). "Visualizing and Understanding Recurrent Neural Networks." arXiv:1506.02078. https://arxiv.org/abs/1506.02078 ↩
Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). "DenseCap: Fully Convolutional Localization Networks for Dense Captioning." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://arxiv.org/abs/1511.07571 ↩
Karpathy, A. (2015). "The Unreasonable Effectiveness of Recurrent Neural Networks." https://karpathy.github.io/2015/05/21/rnn-effectiveness/ ↩
TechCrunch. (2022). "Tesla loses top AI executive who led Autopilot vision team." https://techcrunch.com/2022/07/13/tesla-loses-top-ai-executive-who-led-autopilot-vision-team/ ↩
Karpathy, A. (2022, July 13). X post announcing departure from Tesla. https://x.com/karpathy/status/1547332900928012289 ↩
Karpathy, A. "nanoGPT." GitHub repository. https://github.com/karpathy/nanoGPT ↩
Social Blade. "Andrej Karpathy YouTube Statistics." https://socialblade.com/youtube/channel/UCXUPKJO5MZQN11PqgIvyuvQ ↩
TechCrunch. (2024). "Andrej Karpathy is leaving OpenAI again, but he says there was no drama." https://techcrunch.com/2024/02/13/andrej-karpathy-is-leaving-openai-again-but-he-says-there-was-no-drama/ ↩
TechCrunch. (2024). "After Tesla and OpenAI, Andrej Karpathy's startup aims to apply AI assistants to education." https://techcrunch.com/2024/07/16/after-tesla-and-openai-andrej-karpathys-startup-aims-to-apply-ai-assistants-to-education/ ↩
CNN. (2025). "'Vibe coding' named Collins Dictionary's Word of the Year." https://www.cnn.com/2025/11/06/tech/vibe-coding-collins-word-year-scli-intl ↩
The New Stack. (2026). "Vibe coding is passe. Karpathy has a new name for the future of software." https://thenewstack.io/vibe-coding-is-passe/ ↩
Karpathy, A. "micrograd." GitHub repository. https://github.com/karpathy/micrograd ↩
Karpathy, A. "llm.c." GitHub repository. https://github.com/karpathy/llm.c ↩
Karpathy, A. "minbpe." GitHub repository. https://github.com/karpathy/minbpe ↩
Karpathy, A. "nanochat." GitHub repository and launch discussion. https://github.com/karpathy/nanochat ↩
Fortune. (2026). "Why everyone is talking about Andrej Karpathy's autonomous AI research agent." https://fortune.com/2026/03/17/andrej-karpathy-loop-autonomous-ai-agents-future/ ↩
Karpathy, A. (2017, November 11). "Software 2.0." Medium. https://karpathy.medium.com/software-2-0-a64152b37c35 ↩
Dwarkesh Patel. "Andrej Karpathy: AGI is still a decade away." https://www.dwarkesh.com/p/andrej-karpathy ↩
Electrek. (2025). "Tesla's former head of AI warns against believing that self-driving is solved." https://electrek.co/2025/06/21/tesla-former-head-ai-warns-against-believing-self-driving-solved/ ↩
Karpathy, A. Personal site (career bio). https://karpathy.ai/ ↩
Y Combinator. (2025, June 16). "Andrej Karpathy: Software Is Changing (Again)." YC Startup Library. https://www.ycombinator.com/library/MW-andrej-karpathy-software-is-changing-again ↩
Shi, T., Karpathy, A., Fan, L., Hernandez, J., & Liang, P. (2017). "World of Bits: An Open-Domain Platform for Web-Based Agents." Proceedings of the 34th International Conference on Machine Learning (ICML). https://proceedings.mlr.press/v70/shi17a/shi17a.pdf ↩
Wikipedia. "Tesla Dojo." https://en.wikipedia.org/wiki/Tesla_Dojo ↩
Fortune. (2022, March 28). "Tesla AI chief Andrej Karpathy goes on sabbatical." https://fortune.com/2022/03/28/musk-start-selling-tesla-ai-powered-humanoid-optimus-robot-andrej-karpathy-sabbatical/ ↩
Karpathy, A. "nn-zero-to-hero." GitHub repository. https://github.com/karpathy/nn-zero-to-hero ↩
Eureka Labs. Company homepage. https://eurekalabs.ai/ ↩
Latent Space. (2025). "Andrej Karpathy on Software 3.0: Software in the Age of AI." https://www.latent.space/p/s3 ↩
Dwarkesh Patel. (2025, October 17). "The @karpathy interview" (timestamped agenda). X post. https://x.com/dwarkesh_sp/status/1979234976777539987 ↩
Fortune. (2025, October 21). "Did an OpenAI cofounder just pop the AI bubble? 'The models are not there.'" https://fortune.com/2025/10/21/andrej-karpathy-openai-ai-bubble-pop-dwarkesh-patel-interview/ ↩
Karpathy, A. "LLM101n: Let's build a Storyteller." GitHub repository. https://github.com/karpathy/LLM101n ↩
TechCrunch. (2026, May 19). "OpenAI co-founder Andrej Karpathy joins Anthropic's pre-training team." https://techcrunch.com/2026/05/19/openai-co-founder-andrej-karpathy-joins-anthropics-pre-training-team/ ↩
CNBC. (2026, May 19). "Anthropic hires OpenAI co-founder Andrej Karpathy, former Tesla AI leader." https://www.cnbc.com/2026/05/19/anthropic-hires-openai-cofounder-andrej-karpathy-former-tesla-ai-lead.html ↩
Karpathy, A. (2026, May 19). X post announcing he joined Anthropic. https://x.com/karpathy ↩
Karpathy, A. (2026, February 12). "microgpt." https://karpathy.github.io/2026/02/12/microgpt/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

6 revisions by 1 contributors · full history

Suggest edit