Quoc V. Le

AI Research Google People

22 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

24 citations

Revision

v4 · 4,448 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Quoc Viet Le (Vietnamese: Lê Viết Quốc, born 1982) is a Vietnamese-American computer scientist and a Distinguished Scientist and Google Fellow at Google DeepMind. He is a founding member of the Google Brain team (which merged into Google DeepMind in 2023), and his Google Scholar profile lists more than 456,000 citations and an h-index of 150, making him one of the most-cited researchers in machine learning.^[1]^[2]^[23] His research has shaped large-scale deep learning across vision, language, and dialog: he co-authored the sequence-to-sequence learning paper with Ilya Sutskever and Oriol Vinyals in 2014, introduced Paragraph Vector (often called "doc2vec") with Tomas Mikolov, led the 2012 "cat neuron" demonstration of large-scale unsupervised feature learning with Jeff Dean and Andrew Ng, and initiated the line of work on Neural Architecture Search and AutoML that culminated in EfficientNet.^[1]^[3]^[4]^[5]^[6]^[7] He later contributed to the Meena and LaMDA conversational models that seeded Google's Bard and Gemini product lines, and he advised the Gemini Deep Think system that reached a gold-medal score at the 2025 International Mathematical Olympiad.^[1]^[8]^[9]^[24]

Early life and education

Le was born in 1982 in Hương Thủy, a town in the Thừa Thiên Huế province of central Vietnam, and attended Quốc Học Huế High School before moving to Australia in 2004.^[1]^[10] In a 2014 profile, MIT Technology Review described his childhood home as without electricity and noted that he spent much of his time reading about inventions in a nearby library; around age fourteen he decided that the most useful invention he could pursue would be a machine intelligent enough to invent things on its own.^[11] That ambition (building a machine that itself invents) is a thread that recurs in his later research direction toward automated machine learning.^[16]

He completed a Bachelor of Software Engineering with First Class Honours at the Australian National University (ANU) between 2004 and 2007, where his undergraduate research involved kernel methods in machine learning under Alex Smola.^[1]^[10] He then moved to the United States in 2007 to enroll in the PhD program in computer science at Stanford University, where his advisor was Andrew Ng. He defended his thesis in 2013.^[1]^[10]

Le's MIT Technology Review entry summarises his Stanford strategy succinctly: he proposed building neural networks roughly one hundred times larger than the largest contemporary models and feeding them far more data, an approach that Andrew Ng moved with him into the early Google Brain effort.^[11] During his doctoral years he also contributed to several papers on independent component analysis, locally connected sparse autoencoders, and feature learning at scale that prefigured the 2012 cat-neuron result.^[4]^[12]

Career

Google Brain (2011 to 2023)

In 2011, Le joined Google as one of the founding members of the Google Brain project, alongside Andrew Ng, Jeff Dean, and Greg Corrado.^[1] The early team's mandate was to study what would be possible if neural networks were given access to the kind of distributed CPU and (later) accelerator resources only a hyperscaler could provide; Le's specific role was to bring the large-scale unsupervised feature learning recipe he had developed at Stanford into Google's internal DistBelief training infrastructure.^[11]^[16] Over the following decade he led or co-led several of the team's best-known publications, and he became a recognisable internal figure for proposing research directions (such as NAS and AutoML) that combined ambition for full automation with concrete benchmark wins.^[16]

Google DeepMind (2023 to present)

Following Alphabet's April 2023 merger of Google Brain with DeepMind to form Google DeepMind, Le became a Distinguished Scientist and Google Fellow within the combined research organisation.^[1]^[2] His public Google Research profile continues to list his research interests as "algorithms and theory, machine intelligence, machine perception, [and] natural language processing", and aggregates roughly 130 publications across his Google and earlier academic affiliations.^[2] He has remained based in California and continues to publish across vision, language, dialog, and reasoning topics; recent work on his profile page includes contributions to instruction tuning (the Flan Collection), LaMDA, and AlphaGeometry.^[2] He has increasingly focused on reasoning, serving as an advisor on the Gemini Deep Think effort that achieved an official gold-medal score at the 2025 International Mathematical Olympiad.^[24]

Major research contributions

What was the 2012 "cat neuron" experiment?

Le was first author of "Building high-level features using large scale unsupervised learning", presented at ICML 2012 with Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, and Andrew Ng.^[4]^[12] The paper trained a nine-layer locally connected sparse autoencoder with pooling and local contrast normalisation on a dataset of about ten million 200x200 pixel images sampled from YouTube; the network had roughly one billion parameters and was trained for three days on 1,000 machines (around 16,000 CPU cores) using model parallelism and asynchronous SGD.^[4]^[12]

Despite receiving no labels, individual neurons in the trained model became selective for high-level concepts including human faces, human bodies, and cat faces (the latter becoming the popular shorthand "cat neuron"); the team also reported a roughly 70% relative improvement over the prior state of the art on 20,000-class ImageNet classification when the learned features were fine-tuned with labels.^[4]^[12] The architecture combined three ideas common in computational neuroscience and earlier vision work (locally receptive fields, pooling, and local contrast normalisation) with the new ingredient of asynchronous, model-parallel SGD across a CPU cluster, an engineering decision that allowed the model to digest more data per wallclock hour than any prior unsupervised vision network.^[4]^[12] The work is widely credited with kick-starting industrial interest in deep learning at scale, and it received the ICML Test of Time Honorable Mention in 2022.^[1] The "cat-detecting neural network" attracted unusually broad media coverage at the time, including in the New York Times, and helped move the term deep learning from academic circles into mainstream technology reporting.^[11]^[16]

Distributed Representations of Sentences and Documents (Paragraph Vector, 2014)

At ICML 2014, Le and Tomas Mikolov published "Distributed Representations of Sentences and Documents", which introduced the Paragraph Vector model widely known in open-source implementations as doc2vec.^[5]^[13] The model extends word2vec to variable-length pieces of text by introducing a per-document vector that, together with surrounding word vectors, is trained to predict words in the document. Two variants are described: a distributed-memory version (PV-DM), which is analogous to the continuous-bag-of-words formulation, and a distributed bag-of-words version (PV-DBOW), which ignores word order in the context. The authors argued that bag-of-words representations both lose word order and ignore word semantics; their unsupervised method addresses both, and they reported state-of-the-art results at the time on several text classification and sentiment analysis benchmarks, including the Stanford Sentiment Treebank and the IMDB review dataset.^[5]^[13] Paragraph Vector remains a foundational embedding method for dense document representations and is the conceptual ancestor of later sentence-level encoders.^[5]^[13]

What is sequence to sequence learning (2014)?

With Ilya Sutskever and Oriol Vinyals, Le co-authored "Sequence to Sequence Learning with Neural Networks", presented at NIPS (now NeurIPS) 2014.^[3]^[14] The paper proposed using a multi-layer LSTM to encode the input sequence to a fixed-dimensional vector and a second LSTM to decode the target sequence; on the WMT'14 English to French machine translation task, the model reached 34.8 BLEU when used directly and 36.5 BLEU when used to rerank 1,000 SMT hypotheses, surpassing the 33.3 BLEU of the phrase-based statistical baseline.^[3]^[14] A key empirical trick was reversing the source sentence, which the authors explained as introducing short-term dependencies that improved optimisation.^[3]^[14] The encoder-decoder formulation generalised well beyond translation: the same template was quickly adapted to summarisation, question answering, dialog, parsing, image captioning, and many other structured prediction tasks, and it remains the conceptual backbone of modern encoder-decoder Transformer models such as T5.^[3]^[14] The paper is among Le's most-cited works, with more than 32,000 citations recorded on Google Scholar.^[23]

The paper became a foundational reference for end-to-end neural machine translation, conversation modelling, and many other sequence prediction tasks, and was honoured with the NeurIPS Test of Time Award in 2024.^[1]

Neural Architecture Search (2016 to 2017) and AutoML

In November 2016, Barret Zoph and Le posted "Neural Architecture Search with Reinforcement Learning", which used an RNN controller trained with reinforcement learning to generate descriptions of child neural networks and to maximise the validation accuracy of the resulting architectures.^[6]^[15] On CIFAR-10 the discovered convolutional architecture reached a 3.65% test error, and on Penn Treebank language modelling the system discovered a recurrent cell that outperformed standard LSTM; the technique transferred to character-level modelling as well.^[6]^[15] This neural architecture search (NAS) line of work seeded Google's broader AutoML program and later AutoML Cloud products.^[16] Le and his collaborators (notably Zoph, Vasudevan, and Shlens) followed up with NASNet (which transferred a cell discovered on CIFAR to ImageNet) and ENAS (Efficient Neural Architecture Search) with parameter sharing to make the search dramatically cheaper.^[16] A separate strand, AutoML-Zero (2020), pushed automation even further by attempting to discover entire learning algorithms (not just architectures) from primitive operations.^[16]

EfficientNet (2019)

At ICML 2019 in Long Beach, Mingxing Tan and Le published "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks".^[7]^[17] The paper proposes a compound scaling rule that uniformly grows network depth, width, and input resolution using a single coefficient, rather than scaling each dimension independently. Combined with a NAS-discovered baseline (EfficientNet-B0), the family scales up to EfficientNet-B7, which reached 84.3% top-1 accuracy on ImageNet while being 8.4x smaller and 6.1x faster at inference than the best prior convolutional neural network.^[7]^[17] With more than 39,000 Google Scholar citations, EfficientNet is Le's single most-cited paper.^[23] In 2021 Tan and Le followed up with EfficientNetV2, which retrained the family with training-aware NAS and progressive learning to achieve substantially faster training and improved parameter efficiency, addressing several practical complaints that the original EfficientNet's FLOP counts did not translate cleanly into wall-clock speedups on contemporary accelerators.^[17] EfficientNet's compound-scaling philosophy also influenced later vision and detection backbones such as EfficientDet.^[17]

AutoAugment (2018)

Le also co-authored "AutoAugment: Learning Augmentation Policies from Data" with Ekin D. Cubuk, Barret Zoph, Dandelion Mane, and Vijay Vasudevan, in which a search procedure discovers data augmentation policies (sub-policies of paired image operations such as translation, rotation, or shearing, each with sampled probability and magnitude).^[18] The key insight is to treat data augmentation as a hyperparameter problem and to use a search procedure (analogous to NAS but over augmentation operations) to optimise validation accuracy on the dataset of interest, with policies that transfer between datasets.^[18] AutoAugment achieved state-of-the-art results at the time on CIFAR-10, CIFAR-100, SVHN, and ImageNet without additional training data, and inspired a family of follow-up methods (RandAugment, Population-Based Augmentation, and others) that simplify the search procedure while retaining the core benefit of learned augmentation.^[18]

Meena (2020) and LaMDA (2022)

In January 2020, Google announced Meena, a 2.6 billion parameter neural conversational model trained end-to-end on social-media conversations; Le co-authored the underlying paper "Towards a Human-like Open-Domain Chatbot" with Daniel Adiwardana, Minh-Thang Luong, David R. So, and others, which introduced the Sensibleness and Specificity Average (SSA) metric.^[8] Meena reported SSA scores closer to human level than any prior end-to-end open-domain chatbot of comparable evaluation, and it positioned end-to-end neural dialog (rather than retrieval or rule-based systems) as the dominant paradigm at Google.^[8]

Meena evolved into LaMDA (Language Model for Dialog Applications), unveiled by Google at I/O in May 2021 and described in the January 2022 arXiv paper "LaMDA: Language Models for Dialog Applications", on which Le is one of the listed authors.^[9]^[19] LaMDA is a family of dialog-specialised transformer models up to 137 billion parameters, pre-trained on roughly 1.56 trillion words and fine-tuned for safety and factual grounding (the latter by enabling consultation of an information retrieval system, a translator, and a calculator).^[19] The paper evaluates models at 2B, 8B, and 137B parameter scales and reports that fine-tuning on annotated crowdworker conversations and providing tool access materially improve both groundedness and safety metrics beyond what scaling alone delivers.^[19] The LaMDA work directly seeded Google's Bard conversational product, which launched on 6 February 2023; Bard was later renamed and superseded by Gemini.^[9]

Instruction tuning: FLAN (2021 to 2022)

Le is a co-author of "Finetuned Language Models Are Zero-Shot Learners", presented at ICLR 2022, which introduced FLAN: a 137B parameter pretrained language model fine-tuned on more than 60 NLP tasks expressed as natural-language instructions.^[20] The instruction-tuned model substantially improved zero-shot performance on unseen tasks and outperformed GPT-3 175B on 20 of 25 evaluated tasks, helping to establish instruction tuning as a standard ingredient of modern LLM training.^[20] Follow-up work on the Flan Collection extended these ideas to a much wider set of templates and combined task formats, with Flan-T5 reporting substantial gains over earlier instruction-tuned baselines across held-out benchmarks.^[2]

Chain-of-Thought Prompting (2022)

Le was also a co-author of "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (NeurIPS 2022), with Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, and Denny Zhou.^[21] The paper showed that prompting large language models with a few exemplars containing explicit intermediate reasoning steps substantially improves arithmetic, commonsense, and symbolic reasoning, and that this ability emerges with sufficient model scale.^[21] On math word problems such as GSM8K, chain-of-thought prompting more than doubled the accuracy of PaLM-540B over standard few-shot prompting at the time, and the technique became a widely used baseline for subsequent reasoning research, including self-consistency, tree-of-thought, and large-scale reinforcement learning from reasoning traces.^[21] With more than 35,000 Google Scholar citations, it is Le's second most-cited paper and a direct methodological precursor to the reasoning models he later advised.^[23]^[24]

AlphaGeometry (2024)

In January 2024, Le was a co-author (with Trieu H. Trinh, Yuhuai Wu, He He, and Thang Luong) on "Solving olympiad geometry without human demonstrations", published in Nature.^[22] The system, AlphaGeometry, is a neuro-symbolic theorem prover that pairs a language model trained on millions of synthesised theorems with a symbolic deduction engine; on a benchmark of 30 recent olympiad geometry problems it solved 25, approaching the level of an average International Mathematical Olympiad gold medallist and outperforming the previous best system, which solved 10.^[22] Notably, the model was trained entirely on synthetic data (the team enumerated millions of premises, ran the symbolic engine to produce proofs, and then trained the language model to suggest constructions that the engine could not derive on its own), demonstrating an approach to mathematical reasoning that does not require human-curated proof corpora.^[22] AlphaGeometry's proofs are human-readable, and the system also rediscovered a generalisation of a translated IMO 2004 theorem.^[22] A successor system, AlphaGeometry 2, was reported in 2025.^[22]

How did Quoc Le contribute to Gemini's reasoning models?

Le's recent work centres on getting large models to reason, a thread that runs from chain-of-thought prompting and AlphaGeometry into Google's Gemini reasoning systems. On 21 July 2025, Google DeepMind announced that an advanced version of Gemini with "Deep Think" had earned an official gold-medal score at the 2025 International Mathematical Olympiad (IMO), solving 5 of the 6 problems for 35 of a possible 42 points.^[24] Unlike the 2024 AlphaProof and AlphaGeometry 2 systems, which required human experts to first translate problems into formal languages such as Lean and ran for two to three days, the 2025 model, in the words of the announcement, "operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions, all within the 4.5-hour competition time limit."^[24]

The DeepMind announcement lists Le among the project's advisors: "This effort was advised by Quoc Le and Pushmeet Kohli", with Thang Luong (Le's AlphaGeometry collaborator) leading the overall technical direction.^[24] The result extends Le's two-decade pattern of pairing large pretrained models with explicit reasoning and verification, and it places his earlier chain-of-thought and neuro-symbolic work in a direct lineage with the reasoning capabilities now shipped in Gemini.^[21]^[22]^[24]

Selected papers

Year	Title	Co-authors (selected)	Venue
2012	Building high-level features using large scale unsupervised learning	Ranzato, Monga, Devin, Chen, Corrado, Dean, Ng	ICML 2012 (arXiv:1112.6209)^[4]^[12]
2014	Distributed Representations of Sentences and Documents (Paragraph Vector)	Mikolov	ICML 2014 (arXiv:1405.4053)^[5]^[13]
2014	Sequence to Sequence Learning with Neural Networks	Sutskever, Vinyals	NIPS 2014 (arXiv:1409.3215)^[3]^[14]
2016	Neural Architecture Search with Reinforcement Learning	Zoph	ICLR 2017 (arXiv:1611.01578)^[6]^[15]
2018	AutoAugment: Learning Augmentation Policies from Data	Cubuk, Zoph, Mane, Vasudevan	CVPR 2019 (arXiv:1805.09501)^[18]
2019	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	Tan	ICML 2019 (arXiv:1905.11946)^[7]^[17]
2020	Towards a Human-like Open-Domain Chatbot (Meena)	Adiwardana, Luong, So et al.	arXiv:2001.09977^[8]
2021	Finetuned Language Models Are Zero-Shot Learners (FLAN)	Wei, Bosma, Zhao et al.	ICLR 2022 (arXiv:2109.01652)^[20]
2022	Chain-of-Thought Prompting Elicits Reasoning in LLMs	Wei, Wang, Schuurmans et al.	NeurIPS 2022 (arXiv:2201.11903)^[21]
2022	LaMDA: Language Models for Dialog Applications	Thoppilan, De Freitas, Hall, Shazeer et al.	arXiv:2201.08239^[19]
2024	Solving olympiad geometry without human demonstrations (AlphaGeometry)	Trinh, Wu, He, Luong	Nature^[22]

Influence and applications

The body of work that Le has led or co-led underpins much of the modern landscape of deep learning systems. The 2012 cat-neuron paper and its broader large-scale unsupervised feature learning agenda are widely cited as catalysts for industrial investment in deep learning infrastructure such as the early Google Brain DistBelief system, of which Le and his co-authors were among the principal users.^[4]^[12] It also helped popularise the idea that scale (in data, parameters, and compute) was a first-class lever for representation quality, an argument that subsequent waves of language modelling work would extend to its logical conclusion.^[4]^[12] The 2014 sequence-to-sequence paper, with its encoder-decoder LSTM framing, became one of the canonical templates for neural machine translation and, more generally, for the broader paradigm of sequence-to-sequence modelling that pre-dates and underlies the Transformer era.^[3]^[14] Paragraph Vector remains a baseline reference for document embedding work alongside its sibling word2vec.^[5]^[13]

The NAS and EfficientNet lines have had practical impact on production computer-vision models: EfficientNet became a popular backbone in academic and industrial vision pipelines because of its accuracy-efficiency Pareto frontier on ImageNet, and variants (EfficientNet-Lite, EfficientNetV2, EfficientDet) shipped in on-device and cloud detection products.^[7]^[17] AutoAugment normalised the use of learned, rather than hand-designed, augmentation policies, and the broader AutoML program that grew out of Le's NAS research informed Google Cloud AutoML offerings in vision, language, and tabular learning.^[18]^[16]

Le's later work on conversational language models is reflected in shipping products. Meena and LaMDA were the direct precursors to Google's Bard chatbot announced on 6 February 2023, which Google subsequently rebranded as Gemini.^[9] FLAN-style instruction tuning has become a standard recipe in post-training for both Google's own model families and many open-source LLMs, with Flan-T5 and related checkpoints widely used as instruction-tuned baselines in the research community.^[20] Chain-of-thought prompting helped redefine how researchers and practitioners elicit reasoning behaviour from large pretrained models, and forms part of the methodological foundation for the modern wave of reasoning-focused LLMs, including the Gemini Deep Think system Le advised.^[21]^[24]

Across his career, Le has consistently combined three ingredients that have come to define modern industrial AI research: ambitious framing of the problem (often in terms of automating something previously thought to require human expertise), aggressive use of scale and compute, and a focus on empirical benchmarks as the arbiter of progress.^[4]^[6]^[11]^[16]^[19] The fact that several of his papers (the 2012 unsupervised features paper and the 2014 sequence-to-sequence paper, among others) have been recognised with test-of-time awards a decade after publication reflects how durable those ingredients have proven.^[1]

Recognition

Le's Google Scholar profile records more than 456,000 total citations and an h-index of 150 as of 2026, placing him among the most-cited authors in artificial intelligence; his single most-cited paper is EfficientNet (more than 39,000 citations), followed by Chain-of-Thought Prompting and Sequence to Sequence Learning.^[23]

2014: Named to MIT Technology Review's "Innovators Under 35" list.^[11]
2022: ICML Test of Time Honorable Mention for "Building high-level features using large scale unsupervised learning" (2012).^[1]
2022: Alumni Laureate, Australian National University School of Computing.^[1]
2024: NeurIPS Test of Time Award for "Sequence to Sequence Learning with Neural Networks" (2014).^[1]

He also serves as a senior advisor to AI for Vietnam, a non-profit initiative focused on adapting AI research and applications to Vietnam's linguistic and cultural context.^[10]

Research themes

A few cross-cutting themes connect Le's individual papers across the past fifteen years.

Scale as a first-class design choice. From the 2012 unsupervised feature learning network through Meena, LaMDA, and FLAN, Le's projects have consistently treated parameter count, dataset size, and compute budget as primary design levers rather than afterthoughts. The 16,000-core training run that surfaced the cat neuron was, at the time, one of the largest single-model training runs in industry; LaMDA later trained a 137 billion parameter model on 1.56 trillion tokens of dialog and web text.^[4]^[12]^[19] The empirical observation that scaling delivers qualitative as well as quantitative gains is interwoven with much of his work.^[4]^[11]^[19]

Automation of model design. A second theme is the gradual transfer of design decisions from researchers to learning algorithms themselves. Sequence-to-sequence replaced bespoke pipelines for translation with a single, end-to-end neural model; NAS replaced hand-designed convolutional cells with learned ones; AutoAugment replaced hand-tuned augmentation pipelines with learned policies; and AutoML-Zero attempted to learn entire learning algorithms.^[3]^[6]^[16]^[18] The cumulative effect is an arc from "engineers handcraft features" toward "models, evaluated on objective metrics, search for everything".

Reasoning and grounding in language models. From chain-of-thought prompting through AlphaGeometry and the Gemini Deep Think system, Le's recent work has focused on equipping large pretrained models with mechanisms for explicit reasoning, tool use, and grounding, often through hybrid neuro-symbolic approaches.^[21]^[22]^[24] These directions are consistent with the broader research trajectory of Google DeepMind toward AI systems that solve structured problems rather than only producing fluent text.^[22]^[24]

Critical perspectives

Some of Le's most influential contributions have also been the subject of methodological debate within the research community.

NAS compute cost. Early reinforcement-learning-based Neural Architecture Search approaches like the original Zoph and Le method require very large compute budgets, since each architecture proposal must be partially trained to obtain a reward signal; this prompted a body of follow-up research (such as ENAS, DARTS, and proxy-based NAS) seeking to reduce search cost. The original Zoph and Le experiments reportedly trained on hundreds of GPUs for several days.^[6]^[15]^[16]
EfficientNet scaling on real hardware. While EfficientNet reports strong FLOPs-to-accuracy trade-offs on ImageNet, practitioners have reported that the depthwise and squeeze-and-excitation operations used in the family are not always as friendly to GPU/TPU memory bandwidth as raw FLOP counts suggest; Tan and Le themselves addressed several of these issues in EfficientNetV2 (ICML 2021), which uses training-aware NAS and progressive learning to deliver more wall-clock-friendly speedups.^[17]
LaMDA and dialog safety. LaMDA gained substantial public attention in 2022 when a Google engineer publicly claimed it was sentient, a claim Google and outside researchers rejected; the episode highlighted persistent challenges around anthropomorphisation, factual grounding, and safety in large dialog models, all of which the LaMDA paper itself flags as motivating fine-tuning objectives.^[9]^[19]
Reproducibility and resource asymmetry. A more general critique that applies to several of Le's flagship results (the cat neuron, NAS, EfficientNet at scale, LaMDA) is that they rely on compute budgets that few academic groups can match, which has shaped how subsequent academic work on these problems has had to focus on efficiency, distillation, or proxy benchmarks.^[4]^[6]^[15]^[19]

Andrew Ng, Le's PhD advisor and Google Brain co-founder.
Jeff Dean, Google Brain co-founder and co-author on the 2012 unsupervised-features paper.
Ilya Sutskever, co-author of the 2014 Sequence to Sequence paper.
Noam Shazeer, co-author of the LaMDA paper and a longtime Google Brain collaborator.
Google Brain and Google DeepMind, Le's institutional homes.

References

Wikipedia contributors, "Quoc V. Le", Wikipedia, 2025-11-15. https://en.wikipedia.org/wiki/Quoc_V._Le. Accessed 2026-06-24. ↩
Google Research, "Quoc V. Le", Google Research staff page, 2025. https://research.google/people/quocle/. Accessed 2026-06-24. ↩
Sutskever, Vinyals, Le, "Sequence to Sequence Learning with Neural Networks", arXiv:1409.3215, 2014-09-10. https://arxiv.org/abs/1409.3215. Accessed 2026-06-24. ↩
Le, Ranzato, Monga, Devin, Chen, Corrado, Dean, Ng, "Building high-level features using large scale unsupervised learning", arXiv:1112.6209, 2011-12-29. https://arxiv.org/abs/1112.6209. Accessed 2026-06-24. ↩
Le, Mikolov, "Distributed Representations of Sentences and Documents", arXiv:1405.4053, 2014-05-16. https://arxiv.org/abs/1405.4053. Accessed 2026-06-24. ↩
Zoph, Le, "Neural Architecture Search with Reinforcement Learning", arXiv:1611.01578, 2016-11-05. https://arxiv.org/abs/1611.01578. Accessed 2026-06-24. ↩
Tan, Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks", arXiv:1905.11946, 2019-05-28. https://arxiv.org/abs/1905.11946. Accessed 2026-06-24. ↩
Adiwardana, Luong, So, et al., "Towards a Human-like Open-Domain Chatbot (Meena)", arXiv:2001.09977, 2020-01-27. https://arxiv.org/abs/2001.09977. Accessed 2026-06-24. ↩
Wikipedia contributors, "LaMDA", Wikipedia, 2025-12-10. https://en.wikipedia.org/wiki/LaMDA. Accessed 2026-06-24. ↩
AI for Vietnam Foundation, "Dr. Quoc V. Le, Senior Advisor", aiforvietnam.org, 2024. https://www.aiforvietnam.org/dr-quoc-le-senior-advisor/. Accessed 2026-06-24. ↩
MIT Technology Review, "Quoc Le, Innovators Under 35", MIT Technology Review, 2014. https://www.technologyreview.com/innovator/quoc-le/. Accessed 2026-06-24. ↩
Google Research, "Building high-level features using large scale unsupervised learning", research.google publication page, 2012. https://research.google/pubs/pub38115/. Accessed 2026-06-24. ↩
Proceedings of Machine Learning Research, "Distributed Representations of Sentences and Documents", PMLR v32 (ICML 2014). http://proceedings.mlr.press/v32/le14.html. Accessed 2026-06-24. ↩
NIPS Proceedings, "Sequence to Sequence Learning with Neural Networks", NIPS 2014 paper page. https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks. Accessed 2026-06-24. ↩
Google Research, "Neural Architecture Search with Reinforcement Learning", research.google publication page, 2016. https://research.google/pubs/neural-architecture-search-with-reinforcement-learning/. Accessed 2026-06-24. ↩
History of Data Science, "Quoc V. Le: Fast, Furious and Automatic", History of Data Science profile, 2021. https://www.historyofdatascience.com/quoc-v-le-fast-furious-and-automatic/. Accessed 2026-06-24. ↩
Google Research, "EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling", Google Research blog, 2019-05-29. https://research.google/blog/efficientnet-improving-accuracy-and-efficiency-through-automl-and-model-scaling/. Accessed 2026-06-24. ↩
Cubuk, Zoph, Mane, Vasudevan, Le, "AutoAugment: Learning Augmentation Policies from Data", arXiv:1805.09501, 2018-05-24. https://arxiv.org/abs/1805.09501. Accessed 2026-06-24. ↩
Thoppilan, De Freitas, Hall, Shazeer, et al., "LaMDA: Language Models for Dialog Applications", arXiv:2201.08239, 2022-01-20. https://arxiv.org/abs/2201.08239. Accessed 2026-06-24. ↩
Wei, Bosma, Zhao, Guu, Yu, Lester, Du, Dai, Le, "Finetuned Language Models Are Zero-Shot Learners (FLAN)", arXiv:2109.01652, 2021-09-03. https://arxiv.org/abs/2109.01652. Accessed 2026-06-24. ↩
Wei, Wang, Schuurmans, Bosma, Ichter, Xia, Chi, Le, Zhou, "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models", arXiv:2201.11903, 2022-01-28. https://arxiv.org/abs/2201.11903. Accessed 2026-06-24. ↩
Trinh, Wu, Le, He, Luong, "Solving olympiad geometry without human demonstrations", *Nature*, 2024-01-17. https://www.nature.com/articles/s41586-023-06747-5. Accessed 2026-06-24. ↩
Google Scholar, "Quoc V. Le", citation profile (456,236 citations, h-index 150, top paper EfficientNet 39,233 citations), 2026. https://scholar.google.com/citations?user=vfT6-XIAAAAJ&hl=en. Accessed 2026-06-24. ↩
Google DeepMind, "Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad", 2025-07-21. https://deepmind.google/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/. Accessed 2026-06-24. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributor · full history

Suggest edit

What links here

Barret Zoph DoReMi Jason Wei Least-to-Most Prompting Mixture of Experts (MoE)Self-Discover prompting Step-Back Prompting Yang Zhilin

Early life and education

Career

Google Brain (2011 to 2023)

Google DeepMind (2023 to present)

Major research contributions

What was the 2012 "cat neuron" experiment?

Distributed Representations of Sentences and Documents (Paragraph Vector, 2014)

What is sequence to sequence learning (2014)?

Neural Architecture Search (2016 to 2017) and AutoML

EfficientNet (2019)

AutoAugment (2018)

Meena (2020) and LaMDA (2022)

Instruction tuning: FLAN (2021 to 2022)

Chain-of-Thought Prompting (2022)

AlphaGeometry (2024)

How did Quoc Le contribute to Gemini's reasoning models?

Selected papers

Influence and applications

Recognition

Research themes

Critical perspectives

Related work

See also

References

Improve this article

Related Articles

Noam Shazeer

Pathways (Google AI)

Tri Dao

Percy Liang

Yejin Choi

Richard S. Sutton

What links here

Related Articles

Noam Shazeer

Pathways (Google AI)

Tri Dao

Percy Liang

Yejin Choi

Richard S. Sutton

What links here