Niki Parmar
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,512 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,512 words
Add missing citations, update stale details, or suggest a clearer explanation.
Niki Parmar is an Indian-American artificial intelligence researcher and entrepreneur best known as one of the eight co-authors of the 2017 paper "Attention Is All You Need", which introduced the transformer architecture that underpins modern large language models.[1][2] She spent more than six years at Google, first in applied research and then at Google Brain, where she contributed implementation work on the original Transformer codebase and the Tensor2Tensor research library, and led follow-up research on self-attention for image generation and computer vision.[1][3][4] After leaving Google in late 2021, she co-founded Adept AI in 2022 with Ashish Vaswani and David Luan, then co-founded Essential AI with Vaswani in 2023 after the two departed Adept.[5][6][7] In December 2024 she joined Anthropic as a member of technical staff and contributed to the Claude 3.7 Sonnet release.[8][9]
Parmar grew up in Pune, in the Indian state of Maharashtra, and developed an early interest in mathematics and computer science.[10][31] After completing secondary school she took a gap year because she did not gain admission to the Indian Institutes of Technology through the joint entrance examination, and then enrolled at Pune Institute of Computer Technology (PICT), an engineering college affiliated with Savitribai Phule Pune University.[31] She graduated from PICT with a Bachelor of Engineering in Information Technology.[10][11]
During her undergraduate years Parmar began working through massive open online courses on machine learning and artificial intelligence, including the early Coursera and Stanford courses taught by Andrew Ng and Peter Norvig, which she has repeatedly cited as her introduction to the field.[1][10][31] She has described herself as a largely self-taught coder before formal graduate study, building small projects on her own alongside her engineering coursework.[31]
After completing her undergraduate degree, Parmar moved to the United States to pursue graduate study at the University of Southern California in Los Angeles, where she earned a Master of Science in Computer Science.[11][12] She has publicly discussed the financial difficulty of arranging student loans for her US education.[31] At USC she joined the Computational Social Science Lab led by Morteza Dehghani, an associate professor of psychology and computer science, and worked on applying natural language processing methods to analyse behavioural dynamics on social media platforms.[1][12] She graduated from USC in 2015.[12]
It was at USC that Parmar first met Ashish Vaswani, then a PhD student at the affiliated Information Sciences Institute who delivered a guest lecture on neural networks; the two became frequent collaborators and would later be co-authors on the Transformer paper and co-founders of two companies.[12] Dehghani, her USC adviser, was later quoted by the school saying that he had known Parmar and Vaswani would do "amazing things" together but had not imagined they would change the field.[12]
Parmar joined Google as a software engineer in 2015 after completing her master's degree, working on end-to-end deep learning systems for natural language processing problems.[1][11] Reporting by FourWeekMBA and the Pune Pulse profile describes her joining the team at age 24 and being the sole member without a PhD.[11][31] Her initial projects sat within Google Search and focused on applied questions such as text similarity and question answering, including the development of learned embeddings that could be transferred across downstream tasks.[1] During this period she also worked on big-data and distributed-computing problems, which she has described as her introduction to large-scale production engineering.[1]
In October 2017 Parmar moved to Google Brain as a research software engineer, joining the Mountain View team that had recently produced the Transformer architecture.[1][11] Her research interests at Google Brain centred on self-attention and inductive biases in deep learning, and she worked across both natural language processing and computer vision, with later interests extending to self-supervised learning and the use of machine learning for biology.[1] She remained at Google Brain until November 2021, a tenure of slightly over six years across the two parts of Google.[1][11]
The paper "Attention Is All You Need" was submitted to arXiv on 12 June 2017 and presented later that year at the Conference on Neural Information Processing Systems (NeurIPS) in Long Beach, California.[2][13] All eight authors, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser and Illia Polosukhin, were listed as equal contributors in a footnote on the first page, with each author's role broken out individually.[2][14]
Parmar's contribution is described in that footnote as designing, implementing, tuning and evaluating model variants in the original codebase and in Tensor2Tensor, the open-source library that became the reference implementation of the Transformer.[14] Her work spanned a large number of experimental runs across architectural choices including the number of layers, attention heads, feed-forward dimensions, positional encodings and regularisation schemes that were ultimately reported in the paper.[1][14] At the time of submission her listed affiliation was Google Research, with the contact address nikip@google.com.[14]
The original Transformer was evaluated on WMT 2014 English-to-German and English-to-French machine translation, where it set new BLEU scores while training in a fraction of the wall-clock time required by previous recurrent and convolutional sequence models.[13][15] Subsequent revisions of the paper, the most recent uploaded to arXiv in August 2023, made small corrections to the original presentation but preserved the eight-author list and the overall architecture description.[13]
The paper became one of the most cited works in the history of computer science. As of 2025 it was reported among the ten most-cited papers of the twenty-first century, with more than 170,000 citations in academic databases.[2] The Transformer architecture introduced in the paper became the basis for modern large language models including the GPT series, BERT, and Claude, as well as for vision systems, speech recognition models and music generation systems.[2][15] By 2026, Parmar's Google Scholar profile recorded the Transformer paper at more than 258,000 citations, accounting for the majority of her aggregate citation count of more than 281,000.[3]
In February 2018 Parmar was the first author of "Image Transformer", a paper that adapted the self-attention mechanism to the autoregressive generation of images.[16][17] Co-authored with Vaswani, Uszkoreit, Kaiser, Shazeer, Alexander Ku and Dustin Tran, the paper formulated image generation as a sequence modelling task with a tractable likelihood and restricted attention to local neighbourhoods of pixels so that larger images could be processed than would be feasible with global attention.[16][17] On ImageNet the model improved the best previously published negative log-likelihood from 3.83 to 3.77 bits per dimension.[17] The paper was presented at the 35th International Conference on Machine Learning (ICML) in Stockholm in July 2018.[16]
Image Transformer began a sustained line of work in which Parmar applied self-attention to computer vision. In 2019 she was a co-author on "Stand-Alone Self-Attention in Vision Models", presented at NeurIPS, which showed that self-attention could replace convolutions entirely in image classification and object detection backbones based on ResNet.[18] In 2021 she was a co-author on two further CVPR papers: "Scaling Local Self-Attention for Parameter Efficient Visual Backbones", which introduced the HaloNet family of attention-based image backbones, and "Bottleneck Transformers for Visual Recognition", which proposed replacing the spatial convolutions in the final ResNet bottleneck blocks with global self-attention.[19][20] BoTNet, the architecture proposed in the latter paper, reached 44.4% mask AP and 49.7% box AP on the COCO instance-segmentation benchmark within the Mask R-CNN framework.[20]
Parmar was also a named author on the 2018 paper "Tensor2Tensor for Neural Machine Translation", which described the open-source Tensor2Tensor library that Google Brain used to support its sequence-modelling research.[21] The Tensor2Tensor authors, including Vaswani, Samy Bengio, Eugene Brevdo, François Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Kaiser, Nal Kalchbrenner, Parmar, Ryan Sepassi, Shazeer and Uszkoreit, presented it as a single training and inference framework that included reference implementations of the Transformer along with translation, summarisation, language modelling and image generation models.[21] Parmar has said in interviews that much of the experimentation she ran during the original Transformer project was implemented inside the Tensor2Tensor codebase, and that the library was central to her later work on the Image Transformer and on follow-up vision models.[1][14]
In 2020 Parmar was a co-author on the Interspeech paper "Conformer: Convolution-augmented Transformer for Speech Recognition", which proposed combining self-attention layers with convolution modules to capture both global and local dependencies in audio.[22][23] On the LibriSpeech benchmark the Conformer reached word error rates of 2.1% on the test-clean split and 4.3% on test-other without a language model, and 1.9% and 3.9% with an external language model, improving over previous Transformer-only and CNN-only models.[22][23] The Conformer architecture was subsequently adopted as the default backbone for several production speech recognition systems.[22]
In a 2020 interview published on Medium, Parmar described her research approach as oriented around finding inductive biases that make deep models efficient on natural data, and around the use of self-attention as a flexible substrate for sequence, image and graph problems.[1] She listed generative models, two- and three-dimensional vision and self-supervised representation learning as her then-current focus areas, and pointed to the application of machine learning to biology, including protein and molecular modelling, as an emerging direction she found particularly exciting.[1] She has also cited the original AlexNet result and Geoffrey Hinton's subsequent work on neural networks for vision as formative influences during her transition from applied search to deep learning research.[1]
Parmar left Google Brain in November 2021. Shortly afterwards she co-founded Adept AI Labs, headquartered in San Francisco, alongside chief executive David Luan and chief scientist Ashish Vaswani; she served as chief technology officer.[5][24] Luan was previously head of engineering at OpenAI and led Google's large-model program, while Vaswani and Parmar brought their Transformer research background from Google Brain.[5] Adept emerged from stealth on 26 April 2022 with a US$65 million Series A round led by Greylock and Addition, with additional participation from Root Ventures and angel investors including Andrej Karpathy, Jaan Tallinn and Chris Ré.[5][24]
Adept positioned itself as a research and product lab focused on what it described as useful general intelligence: rather than building general-purpose chatbots, it set out to train neural networks that could operate existing software, including tools such as Airtable, Photoshop, Tableau and Twilio, through natural-language commands.[5][24] The company demonstrated a transformer-based action model that read screenshots and produced sequences of keyboard, mouse and API actions to complete computer tasks.[24]
Adept's pitch attracted heavy attention from venture investors at a moment when generative AI was beginning to shift from research curiosity to product category. The same TechCrunch report that covered the Series A noted the company's Series A round was unusually large for an early-stage AI lab and that the founding team's research credentials (with the Transformer authors among them) was a central part of the pitch.[24] Fortune covered Adept's launch alongside other "digital assistant" startups attempting to embed AI directly inside existing software workflows.[32]
Parmar's tenure at Adept was relatively short. Public LinkedIn records and contemporaneous reporting indicate that she and Vaswani both ended their tenure as Adept founders in November 2022, around seven months after the company's launch.[6] Reporting by The Information at the time noted that two of the Adept co-founders had departed suddenly to start another company.[6] Adept itself continued operating under Luan and other remaining co-founders, raised an additional US$350 million Series B round in March 2023, and ran until June 2024, when Amazon hired Luan and several other Adept executives into a new Artificial General Intelligence team and licensed parts of the company's technology.[25][26][33] This transaction is sometimes described as a reverse acqui-hire and did not directly involve Parmar, who had already departed; investors in Adept were subsequently made whole using funds paid by Amazon, with about twenty employees remaining at Adept under new chief executive Zach Brock.[25][26]
In May 2023 Parmar and Vaswani co-founded Essential AI, also headquartered in San Francisco, and the company raised an US$8.3 million seed round led by Thrive Capital around the same time.[7][27] Essential AI emerged from stealth on 11 December 2023 with a US$56.5 million Series A led by March Capital, with participation from AMD, Franklin Venture Partners, Google, KB Investment, NVIDIA and Thrive Capital.[27][28] Bringing the company's total capital raised to around US$64.8 million, the round drew strategic investment from three of the largest chip and platform companies in the artificial intelligence industry.[27][28]
Essential AI's announced mission was to build what it called the "enterprise brain": full-stack artificial intelligence products that learn to automate workflows and complete tasks that span multiple business applications.[27][28] Vaswani served as chief executive while Parmar held a co-founder role and led aspects of the technical effort.[27] The Series A capital was earmarked for team expansion and for training large language models specialised on enterprise reasoning, code and document-handling tasks.[28]
In its first two years the company released open weights and datasets, including the essential-web-v1.0 corpus of around 24 trillion tokens of organised web text that reached the top of the Hugging Face dataset rankings, and a base and instruction-tuned language model named Rnj-1 released in late 2025, as well as research papers on optimisation techniques such as a study of the practical efficiency of the Muon optimiser for pretraining.[29] The company positions its model releases as contributions to open-source frontier AI for science, technology, engineering and mathematics tasks.[29]
Parmar departed Essential AI in late 2024. According to subsequent reporting, she joined Anthropic in December 2024 and announced the move publicly in February 2025; Vaswani continued as chief executive of Essential AI after her departure.[8][9]
At Anthropic, Parmar joined as a member of technical staff working on research and model development.[8][9] In her public announcement she said she had contributed to the development of Claude 3.7 Sonnet, the hybrid reasoning model that Anthropic released in February 2025 and that the company has described as one of its most consequential releases for coding and complex task performance.[8][9] Industry coverage characterised her hire alongside several other senior technical hires from Adept, Instagram, Workday and You.com as part of a broader pattern of senior engineers and researchers joining Anthropic as members of technical staff rather than as named executives.[9] AnalyticsIndiaMag noted that her arrival, taken together with the company's hiring of other co-founders and CTOs, signalled Anthropic's continuing ability to attract researchers from the top of the field as it scaled up its model and product organisations.[8]
Parmar is one of only a small number of women among the eight original co-authors of the Transformer paper, and several profiles, including Open The Magazine's cover story on the "frontier ladies of AI" and the 100 Women in AI directory, have highlighted her career as part of a broader discussion about representation in foundational machine-learning research.[34][35] She has spoken at industry events including the ScaleUp:AI conference about the trajectory of research from foundational papers to product, and has been profiled in Indian and US technology press as one of the field's most influential researchers of South Asian background.[31][36]
Parmar's published research spans three broad themes: the original Transformer architecture and its implementation, the extension of self-attention to images and other modalities, and self-supervised representation learning.
In the Transformer family, beyond the foundational 2017 paper, she contributed to Tensor2Tensor and to subsequent works that explored variants of self-attention for different domains. The Image Transformer (2018) introduced the use of local 2D self-attention windows for autoregressive image generation, and the Stand-Alone Self-Attention paper (2019) was among the first to show that attention could fully replace convolutions in vision backbones.[16][18] HaloNet (2021) and BoTNet (2021) further developed these ideas into competitive image classification, detection and segmentation systems.[19][20]
In speech recognition, the Conformer (2020) combined convolution with self-attention and set new accuracy records on LibriSpeech, becoming a standard component in subsequent speech models.[22] In natural language processing, Parmar co-authored "Corpora Generation for Grammatical Error Correction" at NAACL-HLT 2019, on building large training corpora for grammar correction.[30]
More recent work covered by her DBLP record includes "Decoder Denoising Pretraining for Semantic Segmentation", published in Transactions on Machine Learning Research and at the CVPR workshops in 2022, and a guest editorial introduction to a 2023 special section of IEEE Transactions on Pattern Analysis and Machine Intelligence on Transformer models in vision.[30]
As of mid-2026, Parmar's Google Scholar profile listed total citations exceeding 281,000, an h-index of 50 and an i10-index of 66, with the Transformer paper alone accounting for over 258,000 of those citations.[3]