# Bryan Catanzaro

> Source: https://aiwiki.ai/wiki/bryan_catanzaro
> Updated: 2026-06-28
> Categories: People
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

Bryan Catanzaro is an American computer scientist who serves as vice president of Applied Deep Learning Research at [NVIDIA](/wiki/nvidia), a research organization he founded in 2016 and has led since, applying [deep learning](/wiki/deep_learning) to language, graphics, speech, and chip design [1][2]. He is best known for originating [cuDNN](/wiki/cudnn), the GPU library that sits underneath most modern deep learning frameworks, and for co-creating [Megatron-LM](/wiki/megatron), one of the most widely used systems for training very large [language models](/wiki/large_language_model) [3][4]. Earlier in his career he helped build [Baidu](/wiki/baidu)'s end-to-end speech recognition systems, and as of 2026 he is one of three vice presidents leading NVIDIA's Nemotron open model program, an effort that involves more than 500 technical staff [5][6].

## Who is Bryan Catanzaro?

Bryan Catanzaro is a deep learning systems researcher and NVIDIA executive. He completed his PhD in electrical engineering and computer sciences at the [University of California, Berkeley](/wiki/uc_berkeley), graduating in 2011 after beginning the program in 2005 [1][7]. He joined NVIDIA as vice president of applied deep learning research in November 2016, and the Berkeley EECS department described his charge as building "next generation systems for training and deploying deep learning" after a senior research role at Baidu [2]. He worked at Berkeley in the group of Kurt Keutzer, focusing on parallel computing, programming languages, and machine learning [7]. His dissertation, "Compilation Techniques for Embedded Data Parallel Languages," produced Copperhead, a data parallel language embedded in Python together with a compiler that mapped high-level array operations onto [GPU](/wiki/gpu) hardware [7].

He had been drawn to graphics processors as an engine for general computation early on. Catanzaro began programming in [CUDA](/wiki/cuda) as a graduate student around 2006, and in 2008 he and his collaborators published one of the first demonstrations of GPU-accelerated machine learning, a paper on fast [support vector machine](/wiki/support_vector_machine) training and classification on graphics processors [1][5]. That combination, rigorous parallel programming applied to learning algorithms, would run through the rest of his career.

## What is cuDNN, and how did Bryan Catanzaro create it?

After finishing his doctorate, Catanzaro joined NVIDIA Research [1]. While there he collaborated with [Andrew Ng](/wiki/andrew_ng)'s group at Stanford on "Deep learning with COTS HPC systems," a 2013 paper showing that a cluster of commodity GPU servers could train deep networks that had previously needed thousands of CPU machines [8]. It was an early signal that GPUs would become the default hardware for deep learning.

The contribution that made his name came out of a side project. In his own account, "I had been working on a little library for neural network computation on the GPU. Nvidia decided to productize this little research prototype, and it became our neural network library, cuDNN" [1]. Released in 2014, cuDNN packaged optimized implementations of the operations that dominate neural network training, such as convolutions, into a [BLAS](/wiki/blas)-like interface that framework authors could call directly [3]. It was quickly integrated into Caffe and later into the other major frameworks, and it remains a core dependency of the deep learning software stack. Catanzaro is a co-author of the paper that introduced it, "cuDNN: Efficient Primitives for Deep Learning" [3].

## What did Bryan Catanzaro do at Baidu?

Around 2014 Catanzaro left NVIDIA to join the Silicon Valley AI Lab at Baidu, recruited by Andrew Ng and Adam Coates [2][5]. He has described the move as a once-in-a-lifetime chance to learn how to do applied AI at scale [5]. As a senior researcher there he built systems for training and deploying end-to-end [speech recognition](/wiki/speech_recognition), and he is a co-author of both Deep Speech and Deep Speech 2, the lab's influential papers that replaced hand-engineered speech pipelines with a single neural network trained directly on audio [9][10]. The Deep Speech 2 team included [Dario Amodei](/wiki/dario_amodei), later the chief executive of [Anthropic](/wiki/anthropic) [5][10].

To make that training practical, Catanzaro released Warp-CTC, an open-source, GPU-accelerated implementation of the connectionist temporal classification loss used to train recognizers on unaligned audio [6]. The library applied the same data-parallel thinking that had defined his Berkeley work, and he has cited speech as the setting where he saw firsthand how much raw compute end-to-end learning could absorb [6].

## What is Bryan Catanzaro known for at NVIDIA?

In 2016 [Jensen Huang](/wiki/jensen_huang) invited Catanzaro back to NVIDIA to start a new applied research lab, and he returned as its only member [2][5]. The group, Applied Deep Learning Research, grew to dozens of scientists organized around a handful of application areas: computer graphics and vision, speech and audio, natural language processing, and chip design [1]. Catanzaro has described a preference for research at the boundaries between established fields, where he believes the best opportunities lie [1].

The table below summarizes projects he has originated or co-authored across his career.

| Project | Year | Role and significance |
| --- | --- | --- |
| Copperhead | 2010 | PhD work: a data parallel language embedded in Python with a GPU compiler [7] |
| cuDNN | 2014 | Originated the prototype NVIDIA productized into its core deep learning library [3] |
| Deep Speech / Deep Speech 2 | 2014 to 2015 | Co-author of Baidu's end-to-end speech recognition systems [9][10] |
| pix2pixHD / vid2vid | 2018 | Co-author of NVIDIA's high-resolution image and video synthesis with GANs [13][14] |
| WaveGlow | 2018 | Co-author of a flow-based neural vocoder for speech synthesis [12] |
| Megatron-LM | 2019 | Co-author of NVIDIA's large-model training framework using model parallelism [4] |
| DLSS | 2020 | Team helped create the deep-learning game-rendering technique [2] |
| Nemotron | 2025 to 2026 | One of three VPs leading NVIDIA's open model, dataset, and recipe initiative [5][15] |

### What is Megatron-LM?

Catanzaro's team is responsible for Megatron-LM, a framework for training [transformer](/wiki/transformer) language models far larger than a single GPU's memory can hold. The 2019 paper "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism" introduced a simple form of tensor, or intra-layer, model parallelism that splits each layer's matrices across many GPUs [4]. A 2021 follow-up combined tensor parallelism with pipeline parallelism across nodes and conventional data parallelism, a recipe the authors showed could scale toward trillion-parameter models; on 1,024 NVIDIA [A100](/wiki/a100) GPUs it trained a [GPT-3](/wiki/gpt_3)-sized 175-billion-parameter model in about a month [11]. Megatron-LM became one of the standard tools for large-model training and forms part of NVIDIA's NeMo software stack.

### How did Bryan Catanzaro contribute to DLSS and generative media?

Catanzaro has contributed to a string of generative media papers from NVIDIA. He is a co-author of WaveGlow, a flow-based network that synthesizes speech audio in a single pass without autoregression [12]; of pix2pixHD, which generated high-resolution images from semantic label maps using conditional [generative adversarial networks](/wiki/generative_adversarial_network) [13]; and of vid2vid, which extended that idea to photorealistic video-to-video synthesis [14].

His team also helped create Deep Learning Super Sampling, or [DLSS](/wiki/dlss), the technique that reconstructs high-resolution game frames from lower-resolution renders using a trained network [2]. In 2022 Catanzaro said that with DLSS 3, in GPU-heavy games seven of every eight pixels on the screen, about 87.5 percent, are generated by a neural network rather than traditionally rendered [19]. With the multi-frame generation introduced in DLSS 4 in 2025, NVIDIA puts the figure at 15 of every 16 pixels, roughly 94 percent, produced by AI, which lets the GPU render a scene far more power-efficiently than brute-force rasterization [5][20].

### What is NVIDIA Nemotron?

As of 2026 Catanzaro is one of three vice presidents leading NVIDIA's Nemotron initiative, an effort to release open models, datasets, and training recipes rather than weights alone [5][15]. He has described the program's scope plainly: "Nemotron is not just a model. What we're trying to do with Nemotron is to support openly developed AI" [5]. By his account the broader effort involves more than 500 full-time technical staff, and he frames the business rationale around the ecosystem: "We know that it's in our interest to help the ecosystem grow, because it creates opportunity for us" [5]. NVIDIA released Nemotron Nano v2, a nine-billion-parameter hybrid state-space model that it called roughly six times faster than similarly sized models, in August 2025 together with most of its pretraining data, and followed with the Nemotron 3 generation [16]. Catanzaro gave the opening address at the Nemotron Summit during the NeurIPS conference and presented the ecosystem at NVIDIA's GTC 2026 event [5][15]. He has also been candid about the constraints of the work, noting in 2026 that even NVIDIA's own research teams have to compete for scarce GPUs [17].

## What is Bryan Catanzaro recognized for?

Catanzaro is recognized less for a single prize than for the unusual reach of his work into everyday practice. cuDNN sits underneath essentially every major deep learning framework, Megatron-LM is a standard system for training frontier models, and the DLSS technology his group helped build renders most of what gamers see on screen [3][4][5]. His research papers, spanning parallel computing, speech, generative models, and large-scale training, are among the most cited in the field [18]. He is a frequent keynote speaker at industry and academic venues, including NVIDIA's GTC and workshops at NeurIPS, and a regular voice in technical media on the direction of deep learning systems [5][15]. Within NVIDIA he is one of the senior leaders most closely associated with translating research into the company's commercial AI platforms [2].

## References

1. deeplearning.ai, "Working AI: At the Office with VP of Applied Deep Learning Research Bryan Catanzaro." https://www.deeplearning.ai/blog/working-ai-at-the-office-with-vp-of-applied-deep-learning-research-bryan-catanzaro
2. EECS at Berkeley, "Bryan Catanzaro joins NVIDIA as Vice President of applied deep learning," November 2016. https://eecs.berkeley.edu/news/2016/11/bryan-catanzaro-joins-nvidia-vice-president-applied-deep-learning
3. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer, "cuDNN: Efficient Primitives for Deep Learning," arXiv:1410.0759, 2014. https://arxiv.org/abs/1410.0759
4. Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro, "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism," arXiv:1909.08053, 2019. https://arxiv.org/abs/1909.08053
5. Nathan Lambert, "Why NVIDIA builds open models with Bryan Catanzaro," Interconnects, 2026. https://www.interconnects.ai/p/why-nvidia-builds-open-models-with
6. Data Innovation, "5 Q's for Bryan Catanzaro, Senior Researcher at Baidu's Silicon Valley Artificial Intelligence Lab," February 2016. https://datainnovation.org/2016/02/5-qs-for-bryan-catanzaro-senior-researcher-at-baidus-silicon-valley-artificial-intelligence-lab/
7. Bryan Catanzaro, "Compilation Techniques for Embedded Data Parallel Languages," PhD dissertation, University of California, Berkeley, 2011. https://escholarship.org/uc/item/6c02679n
8. Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng, Bryan Catanzaro, "Deep learning with COTS HPC systems," ICML 2013. https://proceedings.mlr.press/v28/coates13.html
9. Awni Hannun et al., "Deep Speech: Scaling up end-to-end speech recognition," arXiv:1412.5567, 2014. https://arxiv.org/abs/1412.5567
10. Dario Amodei et al., "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin," arXiv:1512.02595, 2015. https://arxiv.org/abs/1512.02595
11. Deepak Narayanan et al., "Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM," arXiv:2104.04473, 2021. https://arxiv.org/abs/2104.04473
12. Ryan Prenger, Rafael Valle, Bryan Catanzaro, "WaveGlow: A Flow-based Generative Network for Speech Synthesis," arXiv:1811.00002, 2018. https://arxiv.org/abs/1811.00002
13. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro, "High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs" (pix2pixHD), arXiv:1711.11585, 2017. https://arxiv.org/abs/1711.11585
14. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro, "Video-to-Video Synthesis" (vid2vid), arXiv:1808.06601, 2018. https://arxiv.org/abs/1808.06601
15. NVIDIA Blog, "Open Secret: How NVIDIA Nemotron Models, Datasets and Techniques Fuel AI Development." https://blogs.nvidia.com/blog/nemotron-open-source-ai/
16. Bryan Catanzaro (@ctnzr), post on X announcing NVIDIA Nemotron Nano v2, August 2025. https://x.com/ctnzr/status/1957504768156561413
17. Fortune, "Even Nvidia's own research teams can't get enough GPUs," April 9, 2026. https://fortune.com/2026/04/09/nvidia-gpu-shortage-impacts-even-nvidias-own-research-teams-bryan-catanzaro-eye-on-ai/
18. Bryan Catanzaro, Google Scholar profile. https://scholar.google.com/citations?user=UZ6kI2AAAAAJ&hl=en
19. IEEE Spectrum, "New AI Speeds Computer Graphics by Up to 5x," 2022. https://spectrum.ieee.org/ai-graphics-neural-rendering
20. AEC Magazine, "Nvidia DLSS 4 uses AI to boost frame rates in viz software," 2025. https://aecmag.com/visualisation/nvidia-dlss-4-uses-ai-to-boost-frame-rates-in-viz-software/