Jürgen Schmidhuber
Last reviewed
May 31, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 2,482 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 2,482 words
Add missing citations, update stale details, or suggest a clearer explanation.
Jürgen Schmidhuber is a German computer scientist known for foundational work in deep learning and recurrent neural networks. He is most often associated with long short-term memory (LSTM), the network architecture he developed in the 1990s with his student Sepp Hochreiter, and which became the dominant model for sequence tasks such as speech recognition and machine translation in the years before the transformer. [1][2] His research also covers the vanishing gradient problem, artificial curiosity and intrinsic motivation, the Gödel machine, meta-learning, and a 1992 mechanism called fast weights that he and others later connected to the attention used in transformers. [1][6]
Schmidhuber has spent most of his career at the Dalle Molle Institute for Artificial Intelligence Research (IDSIA) in Switzerland, where he has been scientific director since 1995. [3] He was a professor of artificial intelligence at Università della Svizzera italiana (USI) in Lugano from 2009 to 2021, and in October 2021 he became director of the Artificial Intelligence Initiative at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. [4][5] He co-founded the company NNAISENSE in 2014. [4] Alongside the technical work, he is widely known for publicly disputing how credit for several deep-learning ideas has been assigned, including disagreements with Geoffrey Hinton, Yann LeCun, and Yoshua Bengio. Those claims are his own, and several of the researchers he names dispute them. [7][8][9]
Schmidhuber was born on 17 January 1963 in Munich, in what was then West Germany. [1] He studied at the Technical University of Munich (TUM), where he received a diploma in computer science and mathematics in 1987. His diploma thesis dealt with self-referential learning systems and carried the title "Evolutionary principles in self-referential learning, or on learning how to learn." [1] He completed his PhD at TUM in 1991, advised by Wilfried Brauer and Klaus Schulten, and his habilitation followed in 1993. [1][5]
He has said that he set his long-term goal as a teenager: to build a machine that could learn and improve on its own and eventually become more capable than himself, and then to retire. [10] That aim has stayed close to the center of how he describes his research, from early work on learning algorithms through later writing on self-improving systems.
The contribution most closely tied to Schmidhuber's name is LSTM. The vanishing gradient problem, which makes standard recurrent networks struggle to learn dependencies across long time spans, was identified and analyzed by Hochreiter in his 1991 diploma thesis, supervised by Schmidhuber. [1] The two then designed an architecture meant to keep an error signal from decaying as it flows backward through time. The central idea is a memory cell protected by a "constant error carousel," with multiplicative gate units that learn when to let information in and out. The result is a network that can bridge time lags of more than 1000 discrete steps. [2]
The work appeared in stages. The term LSTM was introduced in a 1995 technical report, and the most cited version was published by Hochreiter and Schmidhuber in the journal Neural Computation in 1997. [2] Later refinements added pieces that are now treated as standard. Felix Gers, Schmidhuber, and Fred Cummins added the forget gate around 2000, and a 2005 paper with Alex Graves trained LSTM with full backpropagation through time. [1] A related 2006 method, connectionist temporal classification, made it practical to train recurrent networks for sequence labeling without pre-segmented data, which mattered a great deal for handwriting and speech. [1]
The significance of LSTM grew through the 2010s. Once large datasets and faster hardware arrived, LSTM and its variants became the default for natural language processing, speech recognition, and machine translation. Google, Microsoft, Apple, Amazon, and Facebook deployed it in products such as voice assistants and translation systems. [1][10] For roughly half a decade it was the workhorse of applied sequence modeling, until the transformer architecture, introduced in 2017, displaced it for many large-scale tasks. The basic problem LSTM was built to solve, learning from order and context over long spans, remains central, and the architecture is still used and studied. Hochreiter and collaborators later proposed an extended version called xLSTM in 2024.
Beyond LSTM, Schmidhuber's research spans several lines that recur in modern systems.
The vanishing gradient analysis itself is treated as a landmark result in understanding why deep and recurrent networks were hard to train, and it set up much of the later work on architectures and optimization that made gradient descent viable at depth. [1][9]
From around 1990 he developed what he calls artificial curiosity, a framework in which an agent is rewarded for exploring situations that surprise its own predictive model. [1] The setup pits two networks against each other: one generates outputs or actions, the other tries to predict the consequences, and each works against the other's objective. Schmidhuber presents this as an early form of the adversarial, zero-sum training that later appeared in generative adversarial networks, and as connected to a 1991 method he called predictability minimization. [11] He also connects the same family of ideas to world models, where an agent learns a compact internal simulation of its environment that it can plan or dream within.
In 1992 he described a "fast weights" controller, a network that rapidly reprograms the weights of a second network through outer-product updates rather than carrying state in activations. [1] In 2021 he and colleagues argued that this scheme is mathematically equivalent to the linear, unnormalized form of attention found in some transformer variants, and used that to frame the 1992 work as a precursor to attention. [6] Other researchers treat the connection as real but partial, since the transformer that reshaped the field uses softmax attention trained at a scale the early work did not reach.
The Gödel machine, proposed in the 2000s, is a theoretical design for a self-referential program that can rewrite any part of its own code, including its learning algorithm, once it has proved that the rewrite is beneficial. [1] It is a mathematical construction rather than a built system, and it sits alongside his broader writing on meta-learning and "learning to learn," a theme that runs back to his 1987 diploma thesis.
He has also worked on very deep feedforward networks. With Dan Cireşan he trained GPU-based convolutional networks that won image-recognition contests starting in 2011, including a 2011 traffic-sign benchmark where the system reported better-than-human accuracy. [1] In 2015 his group introduced Highway Networks, a way to train feedforward networks hundreds of layers deep using learned gating, which is closely related to the residual connections that became standard the same year. [1] Ideas from his lab also touch neural architecture search, the automated design of network structures.
Schmidhuber's research home for three decades has been IDSIA, founded in 1988 and named after the Dalle Molle Foundation. The institute is affiliated with USI and the University of Applied Sciences and Arts of Southern Switzerland (SUPSI). He has been its scientific director since 1995, a period during which Luca Maria Gambardella served as the institute's director until 2020. [3] Several researchers who trained or worked there went on to prominent roles in the field, including Alex Graves, Daan Wierstra, and Faustino Gomez.
He held a professorship at TUM from 2004 to 2009, then moved to USI in Lugano as professor of artificial intelligence, a position he held from 2009 to 2021 and continued afterward in an adjunct capacity. [1][5] In 2014 he co-founded NNAISENSE, a Lugano-based company aimed at general-purpose AI and industrial applications, with former students and colleagues including Faustino Gomez, who became chief executive. [4][12]
In September 2021 KAUST announced his appointment as director of its Artificial Intelligence Initiative, and he joined on 1 October 2021. [4][5] At KAUST he has continued as a professor of computer science and serves as co-chair of the university's Center of Excellence for Generative AI, while retaining his role at IDSIA. [5] In interviews he has framed Saudi Arabia's AI investment as a chance for a new period of scientific growth, and he has argued for open and widely available AI, summarized in his slogan "AI for all." [13]
Schmidhuber is well known for arguing in public that several ideas now central to deep learning were introduced earlier than the commonly cited sources, often by researchers who go uncredited, and in some cases by his own group. These are his positions, stated in essays on his website, in talks, and on social media. A number of the people he names disagree, and the dispute is itself frequently discussed in coverage of the field. [7][8][9]
The most prominent thread concerns the 2018 Turing Award, given to Hinton, Bengio, and LeCun for deep learning. Schmidhuber has written that these three "heavily cite each other" while failing to credit earlier pioneers, and he maintains detailed lists of what he calls misattributed results. [8][9] He also published a critique of the 2019 Honda Prize awarded to Hinton, identifying what he described as several misleading attributions of credit, and he traces parts of the foundations to earlier authors, noting that backpropagation has roots in work from the 1960s and a 1970 implementation, and pointing to Alexey Ivakhnenko's 1960s networks as early deep learning. [9]
On generative adversarial networks, Schmidhuber argues that the 2014 GAN paper, whose authors include Ian Goodfellow and Bengio, restates his 1990 artificial curiosity principle and is closely related to his 1991 predictability minimization, and that it should have cited and been named in relation to that earlier work. [11] Goodfellow has responded that he does not see the two as the same method, while later revising his paper to cite Schmidhuber and to spell out the differences. [11] Schmidhuber has continued to press the point, including objecting publicly when the GAN paper received a NeurIPS Test of Time award in 2024.
On the vanishing gradient problem, he credits Hochreiter's 1991 analysis as the first, and has noted that a 1994 paper by Bengio and colleagues on the same difficulty did not cite it. [1][9]
Other researchers have characterized the pattern differently. In a statement to The New York Times, LeCun wrote that Schmidhuber "is manically obsessed with recognition and keeps claiming credit he doesn't deserve for many, many things," and that he "systematically" stands up at the end of talks to claim credit, "generally not in a justified manner." [7] Schmidhuber has replied that such statements come without specific examples. Hinton, in a 2020 Reddit comment about the broader argument, wrote that he had never claimed to have invented backpropagation, attributing the influential 1986 formulation to David Rumelhart and describing his own contribution as showing that the method could learn useful internal representations. [9] The exchanges have become well enough known that researchers sometimes use "to be schmidhubered" as informal shorthand for having a result's originality publicly questioned. [7]
The Wikipedia biography of Schmidhuber has at times carried a notice that a major contributor may have a close connection to the subject, which is worth keeping in mind when weighing online accounts of these debates. The fairest summary is that Schmidhuber and his lab produced a substantial body of early work, that the chronology of several ideas is genuinely tangled, and that the question of how much credit any one group is owed remains contested rather than settled. [7][9]
Schmidhuber's standing in the field rests largely on the long-run influence of LSTM and his other recurrent-network research. [1][2] He received the Helmholtz Award of the International Neural Network Society in 2013, and the Neural Networks Pioneer Award of the IEEE Computational Intelligence Society in 2016, cited for "pioneering contributions to deep learning and neural networks." [14] He is a member of the European Academy of Sciences and Arts. [1]
Coverage of his work often reaches for grand labels. A 2016 New York Times profile was headlined "When A.I. Matures, It May Call Jürgen Schmidhuber 'Dad,'" and various outlets have called him a father of modern AI or of generative AI. [7][10] Schmidhuber himself usually pushes the credit further back, naming earlier researchers as the real originators of deep learning. He was not among the recipients of the 2018 Turing Award or the 2024 Nobel Prizes connected to machine learning, an omission that he and some commentators have flagged given the reach of his early work. [8] He has also taken contrarian public positions on AI risk, declining to sign a 2023 letter calling for a pause on large AI experiments and arguing that much of the alarm around existential risk from AI is overstated. [13]
| Field | Detail |
|---|---|
| Full name | Jürgen Schmidhuber |
| Born | 17 January 1963, Munich, West Germany |
| Nationality | German |
| Fields | Artificial intelligence, deep learning, recurrent neural networks |
| Education | TUM, diploma 1987; PhD 1991; habilitation 1993 |
| Doctoral advisors | Wilfried Brauer, Klaus Schulten |
| Known for | LSTM; vanishing gradient analysis; artificial curiosity; Gödel machine; fast weights |
| IDSIA | Scientific director since 1995 |
| USI Lugano | Professor of AI, 2009 to 2021 (adjunct afterward) |
| TUM | Professor, 2004 to 2009 |
| KAUST | Director, AI Initiative, since October 2021 |
| Company | Co-founder of NNAISENSE (2014) |
| Notable students | Sepp Hochreiter, Alex Graves, Faustino Gomez, Daan Wierstra |
| Awards | INNS Helmholtz Award (2013); IEEE Neural Networks Pioneer Award (2016) |