Aditya Ramesh

People

7 min read

Updated Jun 7, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 7, 2026

Fact-checked

In review queue

Sources

11 citations

Revision

v1 · 1,467 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Aditya Ramesh is an artificial intelligence researcher at OpenAI, where he serves as a vice president of research and is best known as the creator of DALL-E, the text-to-image model that helped launch the modern wave of generative image synthesis. He was the lead author of the original DALL-E in 2021 and of DALL-E 2 in 2022, and he led the team that built DALL-E 3 in 2023. Ramesh later assembled and led the group behind Sora, OpenAI's text-to-video model, and as of 2026 he heads an internal research program called Worldsim and the company's robotics division. ^[1]^[8]

Early life and education

Ramesh is of Indian origin. ^[11] He studied at New York University, where he earned a bachelor's degree; he does not hold a graduate degree. During his final undergraduate years he worked on research projects in the laboratory of Yann LeCun, the Turing Award winner and deep learning pioneer who had built NYU's machine learning research group. ^[2]

According to LeCun, Ramesh had intended to pursue a PhD after graduating. Instead he took a summer internship at OpenAI, and the laboratory decided to keep him on. ^[2] That choice placed Ramesh inside OpenAI during the period when the organization was scaling up the transformer based generative models that would come to define its research agenda. His path, from undergraduate intern to vice president of research without a doctorate, later became a frequently cited example of how the field rewarded hands-on model building over formal credentials. ^[2]

Creating DALL-E

Ramesh is best known as the inventor of DALL-E, which OpenAI introduced in January 2021. The system generated images from natural language descriptions and could combine unrelated ideas, such as "an armchair in the shape of an avocado," in coherent and often unexpected ways. Its name is a portmanteau of the Surrealist painter Salvador Dali and the animated Pixar robot WALL-E, chosen to evoke a merger of art and technology. ^[11]^[1]

The first DALL-E was a 12 billion parameter autoregressive transformer that treated text and image tokens as a single stream of data, extending to vision the generative pretraining approach OpenAI had already applied to language. The model leaned on a separate OpenAI system, CLIP, to rank its candidate images by how well they matched the prompt. Ramesh was the lead author of the accompanying paper, "Zero-Shot Text-to-Image Generation," presented at the International Conference on Machine Learning in 2021. His co-authors included Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and OpenAI co-founder Ilya Sutskever. ^[3] The work built on OpenAI's earlier generative pretraining research, including Image GPT, which had shown that transformers trained to predict pixels could learn strong image representations.

In an interview marking the model's second anniversary, Ramesh said the team had been drawn to text-to-image generation because language can describe almost any situation: "We felt like text-to-image generation was interesting because as humans, we're able to construct a sentence to describe any situation." He added that he had expected the technology to matter but was surprised by how quickly it reached a wide audience. ^[1]

DALL-E 2 and DALL-E 3

Ramesh led the follow-up, DALL-E 2, unveiled in April 2022, and was the lead author of its paper, "Hierarchical Text-Conditional Image Generation with CLIP Latents," written with Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. ^[4] Rather than a single autoregressive model, DALL-E 2 used a two stage design: a prior that produces a CLIP image embedding from a text caption, followed by a diffusion model decoder that turns that embedding into an image. The move to diffusion brought higher resolution and greater photorealism, and it introduced editing features such as inpainting, outpainting, and image variations that became standard in later tools. OpenAI released DALL-E 2 cautiously, behind a waitlist and content filters, before opening it more broadly later in 2022. Ramesh has described DALL-E as a "creative co-pilot" for artists, comparable to how OpenAI's Codex assists programmers. ^[1]

In September 2023, OpenAI announced DALL-E 3, which Ramesh led. Built natively into ChatGPT, it followed prompts far more faithfully than its predecessors and let users refine images through conversation. It began rolling out to ChatGPT Plus and Enterprise subscribers in October 2023, and OpenAI allowed artists to opt their work out of future training. ^[5]

Model	Public debut	Paper	Architecture	Notable advances
DALL-E	January 2021	Zero-Shot Text-to-Image Generation	12B-parameter autoregressive transformer over text and image tokens	First broad demonstration of free-form text-to-image generation
DALL-E 2	April 2022	Hierarchical Text-Conditional Image Generation with CLIP Latents	CLIP prior plus diffusion decoder	Higher resolution and photorealism; inpainting, outpainting, variations
DALL-E 3	September 2023	DALL-E 3 system card	Diffusion model integrated with ChatGPT	Much stronger prompt following; conversational image creation

Sora and world simulation

After the DALL-E series, Ramesh turned from still images to video. He built the team that created Sora, OpenAI's text-to-video model, and the product experience at sora.com. ^[8] Within OpenAI the effort sat inside a World Simulation group that Ramesh headed as vice president of research, and he was one of Sora's public leaders alongside researchers Tim Brooks and Bill Peebles. ^[7]

OpenAI unveiled Sora as a research preview on February 15, 2024, releasing sample clips and a technical report titled "Video generation models as world simulators." The model could generate high definition video up to about one minute long from a text prompt, and OpenAI presented it not merely as a video generator but as an early step toward systems that learn an implicit simulation of the physical world. ^[6] OpenAI later opened a public version, Sora Turbo, through sora.com in December 2024. ^[8]

Worldsim and robotics

By 2025 and 2026 Ramesh's focus had shifted from generating pixels to acting in the physical world. He leads an internal research program called Worldsim, which uses powerful world models in the lineage of Sora to simulate reality in enough detail that simulated experience can substitute for scarce real world data. OpenAI has argued that this approach helps overcome the data bottleneck that has long slowed robot learning. ^[8]^[10]

On May 31, 2026, OpenAI announced a dedicated robotics division that grew out of the Worldsim program, with Ramesh leading it. ^[9]^[10] Chief executive Sam Altman framed the move as OpenAI's world simulation research evolving into OpenAI Robotics. The stated near term goal is to build robots that support skilled workers constructing infrastructure, with a longer term ambition of general purpose personal robots in widespread use. The relaunch marked OpenAI's return to robotics after it had disbanded an earlier robotics team around 2020 to 2021, and it followed the end of OpenAI's collaboration with the humanoid robot startup Figure. The division began hiring across areas such as actuator design, simulation realism, and large scale data collection. ^[9]^[10]

Research themes

A consistent thread runs through Ramesh's work: building generative models that internalize a usable model of the world. DALL-E learned to render scenes described in language; Sora extended that idea to motion and time, which OpenAI explicitly described as world simulation; and Worldsim aims to push the same modeling power toward agents that perceive and act. ^[6]^[8] His career also illustrates a broader pattern at OpenAI, in which capabilities first demonstrated in one modality, text, were carried into images, then video, and ultimately toward embodied systems.

Selected publications

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever. "Zero-Shot Text-to-Image Generation." ICML 2021. ^[3]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. "Hierarchical Text-Conditional Image Generation with CLIP Latents." 2022. ^[4]

References

"Two years after DALL-E debut, its inventor is 'surprised' by impact." VentureBeat, 2023. https://venturebeat.com/ai/two-years-after-dall-e-debut-its-inventor-is-surprised-by-impact ↩
Yann LeCun, post on X, December 2022. https://x.com/ylecun/status/1605450677806895104 ↩
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever. "Zero-Shot Text-to-Image Generation." arXiv:2102.12092, ICML 2021. https://arxiv.org/abs/2102.12092 ↩
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. "Hierarchical Text-Conditional Image Generation with CLIP Latents." arXiv:2204.06125, April 2022. https://arxiv.org/abs/2204.06125 ↩
"OpenAI unveils DALL-E 3, allows artists to opt out of training." TechCrunch, September 20, 2023. https://techcrunch.com/2023/09/20/openai-unveils-dall-e-3-allows-artists-to-opt-out-of-training/ ↩
"OpenAI teases an amazing new generative video model called Sora." MIT Technology Review, February 15, 2024. https://www.technologyreview.com/2024/02/15/1088401/openai-amazing-new-generative-ai-video-model-sora/ ↩
"No Priors Ep. 61: OpenAI's Sora Leaders Aditya Ramesh, Tim Brooks and Bill Peebles." No Priors podcast, 2024. https://www.youtube.com/watch?v=reMnn6bV_fI ↩
Aditya Ramesh, personal website, accessed June 2026. http://adityaramesh.com/ ↩
"OpenAI Pivots Directly Into Hardware, Launching Internal Robotics Division." Humanoids Daily, 2026. https://www.humanoidsdaily.com/news/openai-pivots-directly-into-hardware-launching-internal-robotics-division ↩
"OpenAI Relaunches Robotics Division, Altman Targets Personal Robot for Everyone." MLQ.ai, 2026. https://mlq.ai/news/openai-relaunches-robotics-division-altman-targets-personal-robot-for-everyone/ ↩
"Aditya Ramesh: The Inventor Of AI Text-To-Visual Tool DALL-E Has Indian Origins." Homegrown. https://homegrown.co.in/homegrown-voices/dall-e-aditya-rameshs-ai-powered-brainchild-is-reshaping-visual-expression ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Sora