Oriol Vinyals
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 2,600 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 2,600 words
Add missing citations, update stale details, or suggest a clearer explanation.
Oriol Vinyals (born 1983, Sabadell, Catalonia, Spain) is a Spanish machine learning researcher who serves as Vice President of Research at Google DeepMind and co-technical lead of the Gemini family of multimodal foundation models.[1][2] He is a co-author of the 2014 paper "Sequence to Sequence Learning with Neural Networks" that helped popularize neural encoder-decoder architectures for machine translation, and he led the AlphaStar project that achieved Grandmaster level in StarCraft II, reported on the cover of Nature in 2019.[3][4] Before joining DeepMind he was a researcher in the Google Brain team, and across his career he has contributed to seq2seq, Pointer Networks, Matching Networks, image captioning, knowledge distillation, AlphaStar, AlphaFold, AlphaCode, Flamingo, and the Gemini series.[1][2] His Google Scholar profile lists more than 430,000 citations and an h-index of 114 as of 2026.[2]
| Item | Details |
|---|---|
| Born | 1983, Sabadell, Catalonia, Spain[5] |
| Nationality | Spanish[5] |
| Education | Telecommunications engineering and mathematics, Universitat Politècnica de Catalunya (CFIS); MS in computer science, UC San Diego; PhD in EECS, UC Berkeley (2013)[5][6] |
| Doctoral advisor | Nelson Morgan (UC Berkeley EECS)[6] |
| Dissertation | "Beyond Deep Learning: Scalable Methods and Models for Learning" (defended December 12, 2013)[6] |
| Employer | Google DeepMind (Vice President of Research; co-technical lead, Gemini)[1][7] |
| Notable works | Seq2Seq (2014); Show and Tell (2015); Pointer Networks (2015); Matching Networks (2016); AlphaStar (2019); Flamingo (2022); Gemini (2023 onward)[2][3][4][8][9][10][11] |
| Honorary doctorate | Universitat Politècnica de Catalunya, conferred 26 November 2025[5] |
Vinyals was born in 1983 in Sabadell, a city in the Vallès Occidental county of Catalonia, Spain.[5] He studied a double degree in telecommunications engineering and mathematics at the Universitat Politècnica de Catalunya (UPC) in Barcelona, enrolling in the interdisciplinary CFIS programme together with coursework at the Barcelona School of Telecommunications Engineering (ETSETB) and the School of Mathematics and Statistics (FME); he completed his undergraduate thesis at Carnegie Mellon University in 2006.[5]
He then moved to the United States. He earned a master's degree in computer science at the University of California, San Diego, before transferring to the University of California, Berkeley for doctoral work in Electrical Engineering and Computer Sciences (EECS).[1][5] His dissertation, "Beyond Deep Learning: Scalable Methods and Models for Learning," was advised by speech researcher Nelson Morgan and was filed in late 2013.[6] The thesis focused on faster and more robust optimization for deep architectures, simpler deep models, theoretical bounds against shallow sparse-coding networks, and applications to acoustic modeling and object recognition.[6] During his PhD he held research internships at Microsoft Research and Google.[5]
After defending his PhD, Vinyals joined the Google Brain team in Mountain View as a research scientist focused on deep learning for text, speech and vision.[1][5] In his Brain years he became one of the most visible figures in early sequence modeling. His Google Research profile lists him as principal scientist and team lead of the Deep Learning group, and notes that his early Brain work fed directly into TensorFlow, Google Translate, text-to-speech, and speech recognition products.[1]
He moved to DeepMind in London in 2016, continuing as a research scientist and rising to Vice President of Research and head of the Deep Learning group.[5] At DeepMind he led the AlphaStar effort on StarCraft II from roughly 2017 to 2019 and was a senior contributor to AlphaCode, AlphaFold, and Flamingo.[2][4][10] Following the April 2023 merger of Google Brain and DeepMind into Google DeepMind, he became co-technical lead of the Gemini project alongside Noam Shazeer and Jeff Dean, reporting up to DeepMind chief executive Demis Hassabis.[5][7]
He was named one of MIT Technology Review's "35 Innovators Under 35" in 2016 for the seq2seq line of work, and he served as program chair for the International Conference on Learning Representations (ICLR) in 2017 and 2018.[1] In November 2025 the Universitat Politècnica de Catalunya conferred on him an honorary doctorate, held at the Vèrtex building on the North Diagonal Campus on 26 November 2025; he delivered a masterclass titled "From AI to AGI: The Quest for True Intelligence" two days later.[5]
Vinyals, with Ilya Sutskever and Quoc V. Le, published "Sequence to Sequence Learning with Neural Networks" at NeurIPS 2014.[3] The paper showed that a stack of multilayer LSTM networks could learn to map a variable-length input sequence to a vector and then decode an output sequence from that vector, all trained end-to-end with backpropagation.[3] On the WMT'14 English-to-French translation benchmark the model reached a BLEU score of 34.8 on the entire test set, surpassing a phrase-based SMT baseline at 33.3, and, when used to rerank the SMT n-best list, reached 36.5 BLEU.[3] The paper also reported that reversing the source sentence dramatically improved learning by introducing short-term dependencies between aligned tokens.[3] On Google Scholar the paper has accumulated more than 32,000 citations.[2] The seq2seq formulation became the template for neural machine translation and, later, for many tasks that frame language generation as a conditional output sequence.
In November 2014 Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan posted "Show and Tell: A Neural Image Caption Generator," presented at CVPR 2015.[8] The system combined a convolutional neural network image encoder pretrained on ImageNet with an LSTM language decoder, trained to maximize the likelihood of the reference caption.[8] The model achieved BLEU-1 scores of 59 on Pascal (against a prior best of 25) and 66 on Flickr30k, and a BLEU-4 of 27.7 on the COCO captioning benchmark.[8] Show and Tell is often cited as the canonical early example of using a sequence decoder conditioned on a perception encoder, a recipe that recurs in modern multimodal systems.
With Meire Fortunato and Navdeep Jaitly, Vinyals introduced "Pointer Networks" at NeurIPS 2015.[9] Where standard attention blends encoder hidden states into a context vector, the Pointer Network uses attention as a pointer that selects an element of the input sequence as the output, accommodating variable-sized output dictionaries.[9] The authors demonstrated the architecture on three combinatorial problems: planar convex hulls, Delaunay triangulations, and the symmetric planar Travelling Salesman Problem.[9] Pointer Networks have since been re-used in extractive summarization, code generation, and neural combinatorial optimization.
In 2016 Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra published "Matching Networks for One Shot Learning" at NeurIPS.[11] The system learns an embedding plus a nearest-neighbor-style classifier that maps a small labeled support set and an unlabeled query to a label without fine-tuning.[11] The episodic training procedure matches the meta-test conditions, a now-standard idea in few-shot learning. The paper reported improvements in one-shot accuracy on miniImageNet and Omniglot over previous baselines and helped open the meta-learning research line that followed.[11]
Vinyals co-authored, with Geoffrey Hinton and Jeff Dean, "Distilling the Knowledge in a Neural Network," which formalized knowledge distillation as training a smaller student model to match the softened output probabilities of a larger teacher.[2] The work, listed among Vinyals's most-cited papers with more than 32,000 citations, became the template for compressing neural models for deployment.[2]
Vinyals led the AlphaStar effort at DeepMind, which set out to play the real-time strategy game StarCraft II at human-professional level.[4] AlphaStar combined supervised learning from human replays with multi-agent reinforcement learning in a "league" where new agents trained against a diverse population, including specialized exploiter agents designed to expose weaknesses.[4] The system played the full game over the live online ladder for all three races (Protoss, Terran, Zerg) and reached Grandmaster level, placing within the top 0.2% of human players.[4] The result, "Grandmaster level in StarCraft II using multi-agent reinforcement learning," appeared as the cover article of Nature on 30 October 2019, with Vinyals as first and corresponding author.[4]
In April 2022 Vinyals was a senior contributor on "Flamingo: a Visual Language Model for Few-Shot Learning," led by Jean-Baptiste Alayrac and colleagues at DeepMind.[10] Flamingo bridged a frozen pretrained vision encoder and a frozen large language model (Chinchilla, with up to 70B parameters) using a Perceiver-style resampler and gated cross-attention layers, producing an 80B-parameter visual language model capable of handling arbitrarily interleaved images, videos, and text.[10] The model set few-shot state-of-the-art results on captioning, visual question answering, and video understanding benchmarks at the time, often beating prior fully-finetuned task-specific models with only a handful of in-context examples.[10] Flamingo is widely regarded as a key precursor of the multimodal language models that followed.
Following the merger of Google Brain and DeepMind into Google DeepMind in April 2023, Vinyals was named co-technical lead of Gemini, Google's family of natively multimodal models, with Noam Shazeer and Jeff Dean.[5][7] The Gemini 1.0 generation, comprising Ultra, Pro and Nano variants, was announced by Sundar Pichai and Demis Hassabis on 6 December 2023; the models were positioned as natively multimodal across text, code, audio, image, and video.[12]
Three days after launch, Vinyals publicly addressed criticism that Google's "Hands-on with Gemini" demonstration video had been heavily edited, posting on X that "all the user prompts and outputs in the video are real, shortened for brevity," and explaining that the clip was meant to inspire developers about what could be built on Gemini's API rather than to depict a real-time interaction.[13] He has continued to communicate Gemini progress publicly via talks and the Google DeepMind: The Podcast series, including episodes on Gemini 2.0 and agentic AI.[7]
Vinyals has a long pattern of co-authorship with a small group of senior deep learning researchers. On the seq2seq paper he worked with Ilya Sutskever and Quoc V. Le.[3] On knowledge distillation he worked with Geoffrey Hinton and Jeff Dean.[2] His Gemini co-leadership is shared with Jeff Dean and Noam Shazeer, with Demis Hassabis as DeepMind chief.[5][7] On the image captioning side his co-authors included Samy Bengio, Alexander Toshev, and Dumitru Erhan.[8] AlphaStar was a large team effort; on the corresponding Nature paper Vinyals shares authorship with Igor Babuschkin, David Silver and many others.[4]
The Google Scholar profile maintained at Vinyals's verified Google email lists him at Google DeepMind, with an h-index of 114 (101 since 2021), an i10-index of 207, and total citations of 430,981 (327,923 since 2021) as of mid-2026.[2] Among his most-cited works the profile lists, in descending order: TensorFlow (2015), AlphaFold (2021), Distilling the Knowledge in a Neural Network (2015), Sequence to Sequence Learning with Neural Networks (2014), Representation Learning with Contrastive Predictive Coding (2018), Gemini (2023), and Flamingo (2022).[2]
Vinyals is an active speaker outside of paper publication. He served as program chair for ICLR in 2017 and 2018 and as area chair for NeurIPS and ICML.[1] He has been profiled in popular and trade press including the IEEE Signal Processing Society's "Industry Leaders" series, which covers his UPC background and dissertation work on speech recognition.[14] On the Google DeepMind podcast he has discussed the evolution of Gemini and agentic AI.[7] He maintains an active presence on X under the handle @OriolVinyalsML, where he discusses model releases and pre-training versus post-training tradeoffs.[13]
Vinyals's career arc maps cleanly onto the deep learning era. The seq2seq paper helped end the dominance of phrase-based statistical translation and supplied the encoder-decoder template that, refined with attention by Bahdanau and others and then with the Transformer, underpins most modern language and multimodal systems.[3] Show and Tell and Flamingo connect computer vision to language generation through the same encoder-decoder pattern, and the resulting line of vision-language work directly preceded today's multimodal models.[8][10] Pointer Networks and Matching Networks broadened what neural sequence models could do, covering combinatorial outputs and few-shot classification.[9][11] AlphaStar was a landmark demonstration that multi-agent reinforcement learning could reach professional human level in a real-time strategy game with imperfect information.[4] As co-technical lead of Gemini he is now responsible for translating these research threads into a commercial multimodal model family.[5][7]
The most prominent public controversy around Vinyals concerns Google's "Hands-on with Gemini" promotional video accompanying the 6 December 2023 Gemini 1.0 launch, which critics argued misleadingly implied real-time conversational interaction with images and video.[13] Vinyals personally addressed the criticism on X, conceding that the user prompts and outputs were "shortened for brevity" and that the video used still images plus text rather than the implied real-time video and voice interface; he framed the demo as inspiration for developers rather than a faithful recording.[13] The episode prompted broader media discussion of whether the Gemini launch had been over-hyped.