Alec Radford

OpenAI People

24 min read

Updated Jul 11, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 11, 2026

Fact-checked

In review queue

Sources

30 citations

Revision

v3 · 4,710 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Alec Radford is an American machine learning researcher who was the first author of the papers introducing the DCGAN generative adversarial network, the original Generative Pre-Trained Transformer (GPT-1), GPT-2, the CLIP vision-language model, and the Whisper speech recognition system, and a co-author of GPT-3, DALL-E, and the Proximal Policy Optimization (PPO) reinforcement learning algorithm.^[1]^[2]^[3]^[4]^[11]^[15]^[16]^[29] Widely regarded as one of the most influential individual contributors to modern generative AI, he worked at OpenAI from 2016 until late 2024; when his departure became public, chief executive Sam Altman called him "a genius at the level of einstein" and said it was "hard to overstate how much alec radford has contributed to the field, and how much of everyone's current progress traces back to his work."^[7]^[25]

A co-founder of the Boston-based startup Indico Data Solutions while he was an undergraduate at Franklin W. Olin College of Engineering, Radford left Indico in 2016 to join OpenAI, where he spent roughly eight years as a research scientist before leaving in December 2024 to pursue independent research.^[5]^[6]^[7] He keeps an unusually low public profile for someone of his influence, and much of his best-known code has been released openly under the GitHub handle Newmu.^[9] Since 2025 he has served as an adviser to Thinking Machines Lab, the AI startup founded by former OpenAI chief technology officer Mira Murati, and in April 2026 he co-released Talkie, a 13-billion-parameter language model trained only on text published before 1931.^[26]^[27]^[28]

Where did Alec Radford grow up and study?

Public reporting indicates that Radford grew up in Texas, attended Cistercian Preparatory School in Irving (graduating in 2011), and earned the rank of Eagle Scout.^[5] He then enrolled at Franklin W. Olin College of Engineering, a small undergraduate engineering school of roughly 400 students located in Needham, Massachusetts, just outside Boston.^[5]^[6] At Olin he met fellow students Slater Victoroff, Diana Yuan, and Madison May, with whom he would later co-found a machine learning startup.^[5]^[6] Radford did not complete his undergraduate degree on a normal schedule; he left Olin in August 2014 to work full time on the company.^[5]

Radford had also developed an extensive open source software footprint by this point in his life, publishing code under the GitHub handle @Newmu, which remains his primary personal code hosting account.^[9] (He is unrelated to the statistician Radford Neal, despite the similarity of names; Radford uses the handle Newmu rather than @radfordneal.)^[9]

What was Indico Data Solutions?

In 2012, while still an undergraduate at Olin, Radford co-founded Indico Data Solutions with Victoroff, Yuan, and May in a college dormitory.^[5]^[6] The company, generally known as Indico Data, aimed to commoditize machine learning APIs for sentiment analysis, text classification, and image tagging at a time when most production teams lacked in-house deep learning expertise.^[6] Indico raised early seed capital from General Catalyst's Rough Draft Ventures program in spring 2013 and was accepted into the Techstars Boston accelerator in August 2014, with a $3 million seed round closing later that year.^[5]^[10]

Within the small Indico research team, Radford and engineer Luke Metz (who joined in 2015) produced one of the most influential open source implementations of generative adversarial networks of the period. With Soumith Chintala at Facebook AI Research, they co-authored the paper "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks," posted to arXiv in November 2015, which introduced the DCGAN architecture and supplied the now widely used training recipes (strided convolutions, batch normalization, LeakyReLU, the Adam optimizer with a specific learning rate) for stable image GAN training.^[11] The accompanying reference implementation lives in Radford's Newmu/dcgan_code repository, which remains one of the most starred GAN code releases of the mid-2010s.^[9]

According to the Boston Globe's later reporting on the company, Radford left Indico for OpenAI in the Bay Area shortly after an April 2016 keynote by Nvidia chief executive Jensen Huang demonstrated GAN-generated faces in a manner that audiences and press initially attributed to Yann LeCun's lab rather than the Indico researchers, an episode Victoroff publicly described as a turning point.^[6] Indico itself continued operating as a Boston-based document intelligence company after Radford's departure, with Victoroff remaining as a long-tenured employee and then in other roles.^[6]

When did Alec Radford join OpenAI?

Radford joined OpenAI in 2016, joining the small founding research team in San Francisco.^[5]^[6]^[7] His GitHub bio publicly lists his affiliation as @openai and his location as San Francisco, California.^[9]

His first widely cited OpenAI publication, with co-authors Rafal Jozefowicz and Ilya Sutskever, was the 2017 paper "Learning to Generate Reviews and Discovering Sentiment," posted to arXiv on 5 April 2017.^[12] The paper trained a 4,096-unit multiplicative LSTM on roughly 82 million Amazon product reviews and reported that a single neuron in the model had spontaneously become a sentiment detector, achieving state of the art on the binary Stanford Sentiment Treebank task without explicit sentiment labels.^[12] The result, often referred to as the "sentiment neuron," became one of OpenAI's early high-profile demonstrations that scale plus next-token prediction on a large enough text corpus could elicit semantically meaningful internal structure without any task-specific supervision.^[12] Sutskever and others have since cited this finding as foreshadowing the direction subsequently pursued with the GPT series.^[12]

What did GPT-1 introduce?

On 11 June 2018, OpenAI published the paper "Improving Language Understanding by Generative Pre-Training," with Radford as first author and Karthik Narasimhan, Tim Salimans, and Sutskever as co-authors.^[1] The paper is the original GPT (Generative Pre-Trained Transformer) paper, retrospectively known as GPT-1.^[1] It proposed a two-stage recipe: first, generatively pre-train a Transformer decoder language model (12 layers, 768 hidden size, ~117M parameters) on the BookCorpus dataset by maximizing next-token likelihood; second, attach a small task-specific classification head and fine-tune the entire model discriminatively on each downstream task.^[1] The paper reported that this single architecture, with minimal task-specific machinery, improved the prior state of the art on 9 of 12 evaluated natural language understanding benchmarks.^[1] It was the first of a sequence of OpenAI papers that established the pre-train then adapt paradigm for generative pre-trained transformer models as a serious alternative to the bespoke per-task architectures common at the time.^[1]

OpenAI accompanied the paper with the blog post "Improving language understanding with unsupervised learning," dated 11 June 2018, which framed the work as part of a broader effort to use unsupervised learning on internet-scale text as a foundation for downstream language tasks.^[13]

GPT-1's specific design decisions, several of which Radford carried forward to subsequent models, included: a unidirectional (left-to-right) decoder-only Transformer rather than the encoder-decoder architecture used by sequence-to-sequence machine translation systems of the period; byte pair encoding for tokenization; the use of a learned position embedding rather than the sinusoidal positional encoding from the original Transformer paper; and an auxiliary language modeling loss retained during fine-tuning to act as a regularizer.^[1] The choice of an autoregressive decoder-only architecture, rather than the masked-language-model encoder architecture adopted by BERT later that year, was the architectural fork that defined the subsequent GPT trajectory.^[1]

What did GPT-2 demonstrate?

In February 2019 Radford was again first author on the GPT-2 paper "Language Models are Unsupervised Multitask Learners," co-authored with Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Sutskever.^[2] The paper trained a family of Transformer language models up to 1.5 billion parameters on a new dataset called WebText, constructed by following all outbound links from Reddit submissions with at least three karma and extracting roughly 8 million documents (about 40 GB of text).^[2] Its central empirical claim was that a sufficiently large language model trained purely on next-token prediction could, in a zero-shot setting and without any task-specific fine-tuning, achieve state of the art on 7 of 8 evaluated language modeling benchmarks and perform competitively on tasks such as reading comprehension, summarization, translation, and question answering.^[2] The paper popularized the framing of natural language tasks as conditioning on an appropriate textual prompt, which subsequently became standard practice for GPT-3 and later large language models.^[2]^[14]

OpenAI's accompanying blog post "Better Language Models and Their Implications," published on 14 February 2019, announced GPT-2 and laid out a staged release strategy that initially withheld the full 1.5B-parameter model on safety grounds, releasing only smaller 124M and 355M variants for community study while a full release decision was deferred.^[14] The full 1.5B GPT-2 model was eventually released later in 2019 after additional study of misuse risks.^[14] The staged release approach itself became a frequently cited case study in responsible-disclosure debates around generative models.^[14]

Architecturally, GPT-2 used the same decoder-only Transformer recipe as GPT-1 with several refinements: layer normalization was moved to the input of each sub-block, an additional layer normalization was added after the final self-attention block, the vocabulary was expanded, the context length was increased from 512 to 1024 tokens, and the batch size was scaled to 512.^[2] The largest configuration had 48 layers, a hidden size of 1600, and approximately 1.5 billion parameters, an order of magnitude larger than GPT-1.^[2] The accompanying technical report introduced the now-standard practice of presenting "zero-shot" downstream task results as a primary evaluation axis for large language models, a methodological choice that recurs throughout the subsequent GPT line and became the de facto evaluation paradigm for GPT-3 and later open and commercial models.^[2]

What else did Radford work on at OpenAI?

Beyond the sentiment neuron, Radford was a co-author of the 2016 NeurIPS paper "Improved Techniques for Training GANs," with Tim Salimans as first author and Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, and Xi Chen as co-authors.^[21] That paper introduced several training techniques (including feature matching, minibatch discrimination, and a now standard semi-supervised learning formulation) that built on Radford's earlier DCGAN work and produced state of the art semi-supervised classification results on MNIST, CIFAR-10, and SVHN.^[21]

Radford was also a co-author of the 2017 paper "Proximal Policy Optimization Algorithms," with John Schulman as first author and Filip Wolski, Prafulla Dhariwal, and Oleg Klimov as co-authors, posted to arXiv on 20 July 2017.^[29] The paper introduced Proximal Policy Optimization (PPO), a policy-gradient reinforcement learning method that keeps most of the stability of trust region policy optimization while being much simpler to implement, and it later became a standard optimization algorithm for reinforcement learning from human feedback, the technique used to fine-tune InstructGPT and ChatGPT.^[29] PPO ranks among Radford's most-cited papers.^[29]^[30]

Image GPT

In mid-2020 Radford was second author on "Generative Pretraining From Pixels" (Image GPT, also known as iGPT), with Mark Chen as first author and co-authors Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Sutskever.^[22] The paper applied a GPT-2-style autoregressive Transformer directly to sequences of low-resolution image pixels, with no built-in image-specific inductive bias such as 2D convolutions, and showed that the resulting model learned strong image representations as measured by linear probing and fine-tuning.^[22] The work served as an early proof-of-concept that the same scale-then-predict-the-next-token recipe used for language could transfer to other modalities and helped motivate later multimodal generative systems at the lab.^[22]

GPT-3 and DALL-E

In May 2020 OpenAI published "Language Models are Few-Shot Learners," the GPT-3 paper, with Tom B. Brown as first author and a long list of co-authors.^[15] Radford is listed among the paper's authors, contributing to the wider OpenAI effort that scaled the GPT-2 recipe by roughly two orders of magnitude in parameter count, training a 175-billion-parameter dense decoder Transformer that demonstrated in-context, few-shot learning across a wide range of tasks.^[15] The first authorship and most of the systems work moved to other team members; Radford's authorship signals continued involvement in the line of work but no longer as the central lead.^[15]

He was also a co-author of DALL-E, the original text-to-image system announced by OpenAI in early 2021. The associated paper "Zero-Shot Text-to-Image Generation," with Aditya Ramesh as first author and Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Mark Chen, and Sutskever among the co-authors, used a discrete VAE plus a decoder-only Transformer to generate images from text prompts and explicitly relied on CLIP for re-ranking the candidate outputs.^[16] Radford's contribution sits in the line of work connecting the autoregressive Transformer modeling techniques developed for the GPT series with the contrastive learning machinery developed alongside CLIP.^[16] Public reporting on his career credits him as a contributor to the DALL-E family of systems rather than as the first author.^[7]^[16]

Music generation

Radford was also a co-author of "Jukebox: A Generative Model for Music," with Prafulla Dhariwal as first author and Heewoo Jun, Christine Payne, Jong Wook Kim, and Sutskever as additional co-authors, posted to arXiv in April 2020.^[23] Jukebox extended OpenAI's generative-pretraining program to the audio domain, using a multi-scale VQ-VAE to compress raw audio into discrete tokens and then training autoregressive Transformers on those tokens to generate raw-audio music samples with singing, conditioned on artist, genre, and lyrics.^[23] The work foreshadowed Whisper's later use of large-scale weakly supervised audio data and is one of several earlier audio efforts at OpenAI that preceded the speech recognition push that produced Whisper.^[23]

What is CLIP?

On 26 February 2021 Radford was first author on "Learning Transferable Visual Models From Natural Language Supervision," posted to arXiv as 2103.00020, with co-authors Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Sutskever.^[3] The paper introduced CLIP (Contrastive Language-Image Pre-training), a dual-encoder model trained on 400 million (image, caption) pairs collected from the internet using a contrastive objective: given a batch of N image-text pairs, the model is trained to maximize the cosine similarity of the matched image and text embeddings and minimize similarity for the N^2 minus N unmatched pairs.^[3] At inference time, a textual prompt such as "a photo of a {label}" produces an embedding that can be compared against an image embedding to perform zero-shot image classification without any task-specific training.^[3] The paper reported that CLIP matched ResNet-50 ImageNet accuracy in a zero-shot setting (i.e., without ever training on ImageNet labels) and generalized robustly across more than thirty visual benchmarks including OCR, action recognition, and fine-grained classification.^[3] The work appeared in the Proceedings of the 38th International Conference on Machine Learning (PMLR 139) in 2021.^[3]

CLIP became one of the most widely adopted vision-language backbones of the 2020s and a foundational component of many subsequent text-to-image and multimodal systems, including DALL-E 2 and Stable Diffusion.^[3]

Methodologically, CLIP combined two relatively well-understood ideas, contrastive representation learning and large-scale weak supervision from web text-image pairs, and pushed both well past previously demonstrated scales. The paper's training set, called WIT (WebImageText), comprised 400 million internet-sourced (image, caption) pairs filtered around a query list intended to balance domains and concepts.^[3] On the model side, the authors trained both a ResNet-based image encoder family and a Vision Transformer encoder family, and a Transformer text encoder, with the embedding from each modality projected to a shared multimodal space.^[3] The largest model, ViT-L/14 at 336 pixel input resolution, was the configuration most widely deployed in downstream applications.^[3] The paper also articulated a careful discussion of prompt engineering, demonstrating that the textual class label could be embedded inside a sentence template (such as "a photo of a {label}") to improve zero-shot accuracy substantially over using the bare class word.^[3]

What is Whisper?

On 6 December 2022 Radford was again first author on "Robust Speech Recognition via Large-Scale Weak Supervision," posted to arXiv as 2212.04356, with co-authors Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Sutskever.^[4] OpenAI had publicly released the corresponding system, called Whisper, on 21 September 2022.^[17]^[24] Whisper is a Transformer encoder-decoder automatic speech recognition system trained on 680,000 hours of multilingual audio paired with weakly supervised transcripts gathered from the web, of which about one third is non-English speech.^[4]^[24] The model is trained jointly to perform multilingual speech recognition, speech translation into English, and language identification within a single sequence-to-sequence Transformer, conditioned on special tokens that select among these tasks.^[4]^[24]

The paper reported that, despite never being fine-tuned on individual benchmark training sets, Whisper approached the accuracy of fully supervised state-of-the-art models on a wide range of speech recognition benchmarks in a zero-shot transfer setting and was substantially more robust to accents, background noise, and domain shift.^[4] OpenAI released the model weights and inference code under the MIT license at github.com/openai/whisper, which has made Whisper one of the most widely deployed open-source ASR systems.^[17]^[24] OpenAI subsequently released improved checkpoints, including Whisper Large V2 on 8 December 2022 and Whisper Large V3 at OpenAI DevDay in November 2023.^[24]

In the Whisper paper, Radford and his co-authors made an explicit design argument that the field of speech recognition had over-fit to small high-quality benchmarks: a fully supervised model trained on a single carefully curated dataset could attain a very low word error rate on that dataset's held-out split while still failing badly on out-of-distribution audio.^[4] Their proposed alternative, weakly supervised training on hundreds of thousands of hours of heterogeneous web audio with noisy transcripts, traded a small loss in in-distribution accuracy on benchmarks like LibriSpeech for substantial gains in cross-domain robustness.^[4] The model family released spanned configurations from approximately 39 million parameters (Whisper Tiny) to approximately 1.55 billion parameters (Whisper Large), supporting 99 languages and a single joint task interface conditioned on tokens such as <|transcribe|>, <|translate|>, and language tags.^[4]^[24]

Major papers

Year	Paper	Role	Venue / identifier
2015	"Unsupervised Representation Learning with Deep Convolutional GANs" (DCGAN)	First author	arXiv:1511.06434 ^[11]
2016	"Improved Techniques for Training GANs"	Co-author	NeurIPS 2016; arXiv:1606.03498 ^[21]
2017	"Learning to Generate Reviews and Discovering Sentiment"	First author	arXiv:1704.01444 ^[12]
2017	"Proximal Policy Optimization Algorithms" (PPO)	Co-author	arXiv:1707.06347 ^[29]
2018	"Improving Language Understanding by Generative Pre-Training" (GPT-1)	First author	OpenAI technical report, 11 Jun 2018 ^[1]
2019	"Language Models are Unsupervised Multitask Learners" (GPT-2)	First author	OpenAI technical report, Feb 2019 ^[2]
2020	"Jukebox: A Generative Model for Music"	Co-author	arXiv:2005.00341 ^[23]
2020	"Generative Pretraining From Pixels" (Image GPT)	Co-author	ICML 2020 ^[22]
2020	"Language Models are Few-Shot Learners" (GPT-3)	Co-author	arXiv:2005.14165 ^[15]
2021	"Learning Transferable Visual Models from Natural Language Supervision" (CLIP)	First author	arXiv:2103.00020; ICML 2021 ^[3]
2022	"Robust Speech Recognition via Large-Scale Weak Supervision" (Whisper)	First author	arXiv:2212.04356 ^[4]

Public profile and recognition

Radford keeps an unusually low public profile relative to the impact of his work. He maintains an X (formerly Twitter) account under the handle @AlecRad but posts infrequently, and his most active public output remains his GitHub profile at github.com/Newmu.^[9]^[18] As of July 2026 the papers he has authored or co-authored had accumulated more than 387,000 citations on Google Scholar, where his most-cited works include "Language Models are Few-Shot Learners" (GPT-3) and the CLIP paper, each cited more than 60,000 times.^[30] In an OpenAI press statement quoted in coverage of his departure, OpenAI research director Mark Chen said the company "deeply respect and appreciate Alec and his contributions, and we look forward to continuing our collaboration as he explores independent research."^[7] Andrej Karpathy, a former OpenAI co-founder, has on several occasions in public talks and social media credited Radford as one of the most important individual contributors to the early GPT line of work, although those mentions appear scattered across talks and posts rather than in formal interviews.^[19]

The strongest public assessment came from Sam Altman, who wrote on X on 27 December 2024 that Radford "is extremely modest and isn't as well-known as he should be, though i'm certain he doesn't care," and that he was "so looking forward to collaborating with him as an independent researcher."^[25] The 2024 reporting by The Information that broke the news of Radford's planned departure described him as "an OpenAI researcher who helped develop some of its most important artificial intelligence."^[7]

Why did Alec Radford leave OpenAI?

In December 2024, The Information reported that Radford had told OpenAI colleagues he intended to leave the company to pursue research independently, while noting that he planned to continue collaborating with OpenAI and with other AI laboratories.^[7] His exit came during a period of high-profile departures from OpenAI's research and executive ranks across 2024, and several former colleagues, including former chief research officer Bob McGrew, subsequently regrouped around new ventures.^[8]^[26] In March 2025, TechCrunch, which described Radford as "a researcher who helped develop many of OpenAI's key AI technologies," reported that he had received a subpoena dated 25 February 2025 in the copyright litigation known as "in re OpenAI ChatGPT Litigation," brought by authors including Paul Tremblay, Sarah Silverman, and Michael Chabon, who allege that OpenAI used their copyrighted books to train its models without permission.^[20]

What is Alec Radford working on now?

After leaving OpenAI, Radford has worked as an independent researcher while continuing to collaborate with other laboratories.^[7] Around March 2025 he was named an adviser to Thinking Machines Lab, the AI startup founded by former OpenAI chief technology officer Mira Murati, joining at the same time as former OpenAI chief research officer Bob McGrew; TechCrunch, reporting the appointments, described Radford as "a former OpenAI researcher behind many of the company's more transformative innovations."^[26]

In late April 2026, Radford, Nick Levine, and David Duvenaud of the University of Toronto released Talkie (also styled talkie-1930), an open-weight 13-billion-parameter language model trained on roughly 260 billion tokens of English text published before 1 January 1931.^[27]^[28] The cutoff was chosen so that the training material, drawn from books, newspapers, scientific journals, patents, and case law, was entirely out of United States copyright: works first published in the country in 1930 entered the public domain on 1 January 2026.^[27]^[28] Released under the Apache 2.0 license on Hugging Face as separate base and chat checkpoints, Talkie was framed as a research tool for studying how a model reasons about a world it has no data about, and its makers said they intended to scale it toward a GPT-3-scale successor later in 2026.^[27]^[28] The project reflects the same recipe Radford helped establish, large-scale generative pre-training on a big text corpus, applied here to a deliberately historical dataset.^[27]

Significance

Across roughly eight years at OpenAI, Radford was the first author of papers that introduced or defined four of the lab's most consequential publicly released systems: the GPT series of language models from GPT-1 through GPT-2 (and as a co-author of GPT-3), the CLIP vision-language model, and the Whisper speech recognition model.^[1]^[2]^[3]^[4]^[15] These four lines of work, together with DALL-E (on which he was a contributor), are widely credited with establishing the modern paradigm of large-scale generative pre-training applied across text, image, and audio modalities.^[7]^[16] His co-authorship of PPO adds a fifth thread that runs through the reinforcement learning stage now used to align chat models.^[29] Trade coverage of his departure frequently called him the "father of GPT," and Sam Altman publicly credited him as a generational research talent whose work much of the field's progress traces back to.^[8]^[25]

Limitations of public information

Because Radford grants few interviews, does not maintain an extensive personal website, and rarely publishes long-form essays or talks, much of the secondary literature about him relies on a small set of primary sources: the paper author lists themselves, his GitHub profile, OpenAI's official research blog posts, the Boston Globe's 2023 feature on Indico, and a small number of news reports about his 2024 departure and later projects.^[1]^[2]^[3]^[4]^[5]^[6]^[7]^[9]^[13]^[14]^[17]^[27] Specific personal details such as his current residence and his exact role titles at OpenAI year by year remain sparsely documented in publicly verifiable sources and are therefore omitted here.

The systems Radford led or co-led sit inside a broader research lineage that includes the underlying Transformer architecture introduced in Attention Is All You Need, the bidirectional pre-training approach taken by BERT as a contemporaneous alternative to GPT, and the scaling trajectory that produced GPT-3 and subsequent OpenAI ChatGPT-class models.^[1]^[2]^[15] In the vision-language area his CLIP paper is closely related to other contrastive learning approaches to representation learning, and in speech his Whisper paper sits among large-scale speech recognition efforts that paired Transformer encoder-decoders with massive weakly labeled web audio corpora.^[3]^[4]

References

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, "Improving Language Understanding by Generative Pre-Training", OpenAI technical report, 2018-06-11. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. Accessed 2026-05-21. ↩
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, "Language Models are Unsupervised Multitask Learners", OpenAI technical report, 2019-02-14. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. Accessed 2026-05-21. ↩
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever, "Learning Transferable Visual Models From Natural Language Supervision", arXiv:2103.00020, 2021-02-26. https://arxiv.org/abs/2103.00020. Accessed 2026-05-21. ↩
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever, "Robust Speech Recognition via Large-Scale Weak Supervision", arXiv:2212.04356, 2022-12-06. https://arxiv.org/abs/2212.04356. Accessed 2026-05-21. ↩
Wikipedia contributors, "Alec Radford", Wikipedia, 2026. https://en.wikipedia.org/wiki/Alec_Radford. Accessed 2026-07-12. ↩
Hiawatha Bray, "How a couple of Olin College students helped spark the AI chatbot revolution", The Boston Globe, 2023-06-10. https://www.bostonglobe.com/2023/06/10/business/how-couple-olin-college-students-helped-spark-ai-chatbot-revolution/. Accessed 2026-05-21. ↩
Erin Woo, "Senior OpenAI Researcher Radford Departs", The Information, 2024-12-19. https://www.theinformation.com/briefings/senior-openai-researcher-radford-departs. Accessed 2026-05-21. ↩
AIBase News, "The Departure of the Father of GPT Shakes the AI Community: OpenAI's Legendary Researcher Radford Moves to Independent Research", AIBase, 2024-12-20. https://news.aibase.com/news/14126. Accessed 2026-05-21. ↩
Alec Radford, "Newmu (Alec Radford) GitHub profile", GitHub, accessed 2026. https://github.com/Newmu. Accessed 2026-05-21. ↩
Indico Data Solutions, "College Startup Indico Announces $3 million in Funding", PRWeb / Crowdfund Insider, 2014-11. https://www.crowdfundinsider.com/2014/11/56126-massachusetts-college-startup-indico-announces-3-million-funding/. Accessed 2026-05-21. ↩
Alec Radford, Luke Metz, Soumith Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", arXiv:1511.06434, 2015-11-19. https://arxiv.org/abs/1511.06434. Accessed 2026-05-21. ↩
Alec Radford, Rafal Jozefowicz, Ilya Sutskever, "Learning to Generate Reviews and Discovering Sentiment", arXiv:1704.01444, 2017-04-05. https://arxiv.org/abs/1704.01444. Accessed 2026-05-21. ↩
OpenAI, "Improving language understanding with unsupervised learning", OpenAI blog, 2018-06-11. https://openai.com/index/language-unsupervised/. Accessed 2026-05-21. ↩
OpenAI, "Better Language Models and Their Implications", OpenAI blog, 2019-02-14. https://openai.com/index/better-language-models/. Accessed 2026-05-21. ↩
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al., "Language Models are Few-Shot Learners", arXiv:2005.14165, 2020-05-28. https://arxiv.org/abs/2005.14165. Accessed 2026-05-21. ↩
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever, "Zero-Shot Text-to-Image Generation" (DALL-E paper), arXiv:2102.12092, 2021-02-24. https://arxiv.org/abs/2102.12092. Accessed 2026-05-21. ↩
OpenAI, "Introducing Whisper", OpenAI blog, 2022-09-21. https://openai.com/index/whisper/. Accessed 2026-05-21. ↩
Alec Radford, "@AlecRad on X (Twitter)", X / Twitter, accessed 2026. https://x.com/alecrad. Accessed 2026-05-21. ↩
Surf AI, "Alec Radford: The Quiet Force Behind GPT, CLIP, and DALL-E", asksurf.ai, 2025. https://asksurf.ai/pulse/en/alec-radford-quiet-force-behind-gpt-clip-dalle. Accessed 2026-05-21. ↩
Kyle Wiggers, "Key ex-OpenAI researcher subpoenaed in AI copyright case", TechCrunch, 2025-03-04. https://techcrunch.com/2025/03/04/key-ex-openai-researcher-subpoenaed-in-ai-copyright-case/. Accessed 2026-05-21. ↩
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, "Improved Techniques for Training GANs", arXiv:1606.03498 / NeurIPS 2016, 2016-06-10. https://arxiv.org/abs/1606.03498. Accessed 2026-05-21. ↩
Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever, "Generative Pretraining From Pixels", Proceedings of the 37th International Conference on Machine Learning (ICML 2020), 2020. https://proceedings.mlr.press/v119/chen20s.html. Accessed 2026-05-21. ↩
Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever, "Jukebox: A Generative Model for Music", arXiv:2005.00341, 2020-04-30. https://arxiv.org/abs/2005.00341. Accessed 2026-05-21. ↩
Wikipedia contributors, "Whisper (speech recognition system)", Wikipedia, 2026. https://en.wikipedia.org/wiki/Whisper_(speech_recognition_system). Accessed 2026-05-21. ↩
Sam Altman (@sama), post on X, 2024-12-27. https://x.com/sama/status/1872666383210971560. Accessed 2026-07-12. ↩
Kyle Wiggers, "Mira Murati's AI startup gains prominent ex-OpenAI advisers", TechCrunch, 2025-04-08. https://techcrunch.com/2025/04/08/mira-muratis-ai-startup-gains-prominent-ex-openai-advisers/. Accessed 2026-07-12. ↩
Simon Willison, "Introducing talkie: a 13B vintage language model from 1930", simonwillison.net, 2026-04-28. https://simonwillison.net/2026/Apr/28/talkie/. Accessed 2026-07-12. ↩
Nick Levine, David Duvenaud, Alec Radford, "talkie: a 13B vintage language model from 1930", talkie-lm.com, 2026. https://talkie-lm.com/. Accessed 2026-07-12. ↩
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, "Proximal Policy Optimization Algorithms", arXiv:1707.06347, 2017-07-20. https://arxiv.org/abs/1707.06347. Accessed 2026-07-12. ↩
Google Scholar, "Alec Radford", author profile, accessed 2026-07-12. https://scholar.google.com/citations?user=dOad5HoAAAAJ. Accessed 2026-07-12. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Jared Kaplan Jukebox (OpenAI)Tom B. Brown Whisper

Where did Alec Radford grow up and study?

What was Indico Data Solutions?

When did Alec Radford join OpenAI?

What did GPT-1 introduce?

What did GPT-2 demonstrate?

What else did Radford work on at OpenAI?

Image GPT

GPT-3 and DALL-E

Music generation

What is CLIP?

What is Whisper?

Major papers

Public profile and recognition

Why did Alec Radford leave OpenAI?

What is Alec Radford working on now?

Significance

Limitations of public information

Related work

See also

References

Improve this article

Related Articles

Ilya Sutskever

Andrej Karpathy

Bret Taylor

John Schulman

Jakub Pachocki

Bob McGrew

What links here

Related Articles

Ilya Sutskever

Andrej Karpathy

Bret Taylor

John Schulman

Jakub Pachocki

Bob McGrew

What links here