Jürgen Schmidhuber

Deep Learning Neural Networks People

14 min read

Updated Jun 25, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 25, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v2 · 2,791 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Jürgen Schmidhuber (anglicized Jurgen Schmidhuber, born 17 January 1963) is a German computer scientist best known as a co-inventor of long short-term memory (LSTM), the recurrent neural network architecture he developed in the 1990s with his student Sepp Hochreiter, and for his long-running public disputes over how credit for deep learning ideas has been assigned. ^[1]^[2] The Hochreiter and Schmidhuber LSTM paper, published in the journal Neural Computation in 1997, became the most cited neural network paper of the 20th century and has surpassed 80,000 citations, powering speech recognition and machine translation at Google, Apple, Amazon, Microsoft, and Facebook in the years before the transformer. ^[1]^[2]^[15]

Beyond LSTM, his research covers the vanishing gradient problem, artificial curiosity and intrinsic motivation, the Gödel machine, meta-learning, and a 1992 mechanism called fast weights that he and others later connected to the attention used in transformers. ^[1]^[6] Schmidhuber has spent most of his career as scientific director of the Dalle Molle Institute for Artificial Intelligence Research (IDSIA) in Switzerland, a role he has held since 1995, and since October 2021 he has directed the Artificial Intelligence Initiative at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. ^[3]^[4]^[5] He co-founded the company NNAISENSE in 2014. ^[4] He is also widely known for publicly disputing the credit assigned for several deep-learning ideas, including disagreements with Geoffrey Hinton, Yann LeCun, and Yoshua Bengio; those claims are his own, and several of the researchers he names dispute them. ^[7]^[8]^[9]

Who is Jürgen Schmidhuber?

Schmidhuber was born on 17 January 1963 in Munich, in what was then West Germany. ^[1] He studied at the Technical University of Munich (TUM), where he received a diploma in computer science and mathematics in 1987. His diploma thesis dealt with self-referential learning systems and carried the title "Evolutionary principles in self-referential learning, or on learning how to learn." ^[1] He completed his PhD at TUM in 1991, advised by Wilfried Brauer and Klaus Schulten, and his habilitation followed in 1993. ^[1]^[5]

He has said that he set his long-term goal as a teenager: to build a machine that could learn and improve on its own and eventually become more capable than himself, and then to retire. ^[10] That aim has stayed close to the center of how he describes his research, from early work on learning algorithms through later writing on self-improving systems.

What is long short-term memory (LSTM)?

The contribution most closely tied to Schmidhuber's name is LSTM. The vanishing gradient problem, which makes standard recurrent networks struggle to learn dependencies across long time spans, was identified and analyzed by Hochreiter in his 1991 diploma thesis, supervised by Schmidhuber. ^[1] The two then designed an architecture meant to keep an error signal from decaying as it flows backward through time. The central idea is a memory cell protected by a "constant error carousel," with multiplicative gate units that learn when to let information in and out. The result is a network that can bridge time lags of more than 1000 discrete steps. ^[2]

The work appeared in stages. The term LSTM was introduced in a 1995 technical report, and the most cited version was published by Hochreiter and Schmidhuber in the journal Neural Computation in 1997. ^[2] That paper became the most cited neural network paper of the 20th century, with more than 80,000 citations as of the mid-2020s. ^[15] Later refinements added pieces that are now treated as standard. Felix Gers, Schmidhuber, and Fred Cummins added the forget gate around 2000, and a 2005 paper with Alex Graves trained LSTM with full backpropagation through time. ^[1] A related 2006 method, connectionist temporal classification, made it practical to train recurrent networks for sequence labeling without pre-segmented data, which mattered a great deal for handwriting and speech. ^[1]

The significance of LSTM grew through the 2010s. Once large datasets and faster hardware arrived, LSTM and its variants became the default for natural language processing, speech recognition, and machine translation. Google, Microsoft, Apple, Amazon, and Facebook deployed it in products such as voice assistants and translation systems. ^[1]^[10] For roughly half a decade it was the workhorse of applied sequence modeling, until the transformer architecture, introduced in 2017, displaced it for many large-scale tasks. The basic problem LSTM was built to solve, learning from order and context over long spans, remains central, and the architecture is still used and studied. Hochreiter and collaborators later proposed an extended version called xLSTM in 2024.

What else did Schmidhuber invent?

Beyond LSTM, Schmidhuber's research spans several lines that recur in modern systems.

The vanishing gradient analysis itself is treated as a landmark result in understanding why deep and recurrent networks were hard to train, and it set up much of the later work on architectures and optimization that made gradient descent viable at depth. ^[1]^[9]

From around 1990 he developed what he calls artificial curiosity, a framework in which an agent is rewarded for exploring situations that surprise its own predictive model. ^[1] The setup pits two networks against each other: one generates outputs or actions, the other tries to predict the consequences, and each works against the other's objective. Schmidhuber presents this as an early form of the adversarial, zero-sum training that later appeared in generative adversarial networks, and as connected to a 1991 method he called predictability minimization. ^[11] He also connects the same family of ideas to world models, where an agent learns a compact internal simulation of its environment that it can plan or dream within.

In 1992 he described a "fast weights" controller, a network that rapidly reprograms the weights of a second network through outer-product updates rather than carrying state in activations. ^[1] In 2021 he and colleagues argued that this scheme is mathematically equivalent to the linear, unnormalized form of attention found in some transformer variants, and used that to frame the 1992 work as a precursor to attention. ^[6] Other researchers treat the connection as real but partial, since the transformer that reshaped the field uses softmax attention trained at a scale the early work did not reach.

The Gödel machine, proposed in the 2000s, is a theoretical design for a self-referential program that can rewrite any part of its own code, including its learning algorithm, once it has proved that the rewrite is beneficial. ^[1] It is a mathematical construction rather than a built system, and it sits alongside his broader writing on meta-learning and "learning to learn," a theme that runs back to his 1987 diploma thesis.

He has also worked on very deep feedforward networks. With Dan Cireşan he trained GPU-based convolutional networks that won image-recognition contests starting in 2011, including a 2011 traffic-sign benchmark where the system reported better-than-human accuracy. ^[1] In 2015 his group introduced Highway Networks, a way to train feedforward networks hundreds of layers deep using learned gating, which is closely related to the residual connections that became standard the same year. ^[1] Ideas from his lab also touch neural architecture search, the automated design of network structures.

Where has Schmidhuber worked?

Schmidhuber's research home for three decades has been IDSIA, founded in 1988 and named after the Dalle Molle Foundation. The institute is affiliated with USI and the University of Applied Sciences and Arts of Southern Switzerland (SUPSI). He has been its scientific director since 1995, a period during which Luca Maria Gambardella served as the institute's director until 2020. ^[3] Several researchers who trained or worked there went on to prominent roles in the field, including Alex Graves, Daan Wierstra, and Faustino Gomez.

He held a professorship at TUM from 2004 to 2009, then moved to USI in Lugano as professor of artificial intelligence, a position he held from 2009 to 2021 and continued afterward in an adjunct capacity. ^[1]^[5] In 2014 he co-founded NNAISENSE, a Lugano-based company aimed at general-purpose AI and industrial applications, with former students and colleagues including Faustino Gomez, who became chief executive. ^[4]^[12]

In September 2021 KAUST announced his appointment as director of its Artificial Intelligence Initiative, and he joined on 1 October 2021. ^[4]^[5] At KAUST he has continued as a professor of computer science and serves as co-chair of the university's Center of Excellence for Generative AI, while retaining his role at IDSIA. ^[5] In interviews he has framed Saudi Arabia's AI investment as a chance for a new period of scientific growth, and he has argued for open and widely available AI, summarized in his slogan "AI for all." ^[13]

Why is Schmidhuber known for credit disputes?

Schmidhuber is well known for arguing in public that several ideas now central to deep learning were introduced earlier than the commonly cited sources, often by researchers who go uncredited, and in some cases by his own group. These are his positions, stated in essays on his website, in talks, and on social media. A number of the people he names disagree, and the dispute is itself frequently discussed in coverage of the field. ^[7]^[8]^[9] In a 2016 New York Times interview he summarized his complaint plainly: "Certain researchers in my field have acted as if they invented something, although it was invented by other people whom they did not even mention." ^[7] His guiding principle, as he states it, is that "the inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it." ^[8]

One early flashpoint was the 2015 Nature survey "Deep Learning" by LeCun, Bengio, and Hinton. Schmidhuber published a rebuttal titled "Critique of Paper by 'Deep Learning Conspiracy' (Nature 521 p 436)," arguing that the survey failed to credit earlier pioneers, including Alexey Ivakhnenko, whose 1960s networks he describes as the first working deep learning. ^[8]^[9] The same year he published "Deep Learning in Neural Networks: An Overview" in the journal Neural Networks, a long survey that traces what he regards as the true chronology of the field. ^[1]

The most prominent thread concerns the 2018 Turing Award, given to Hinton, Bengio, and LeCun for deep learning. Schmidhuber has written that these three "heavily cite each other" while failing to credit earlier pioneers, and he maintains detailed lists of what he calls misattributed results. ^[8]^[9] He also published a critique of the 2019 Honda Prize awarded to Hinton, identifying what he described as several misleading attributions of credit, and he traces parts of the foundations to earlier authors, noting that backpropagation has roots in work from the 1960s and a 1970 implementation, and pointing to Ivakhnenko's 1960s networks as early deep learning. ^[9]

On generative adversarial networks, Schmidhuber argues that the 2014 GAN paper, whose authors include Ian Goodfellow and Bengio, restates his 1990 artificial curiosity principle and is closely related to his 1991 predictability minimization, and that it should have cited and been named in relation to that earlier work. ^[11] Goodfellow has responded that he does not see the two as the same method, while later revising his paper to cite Schmidhuber and to spell out the differences; Goodfellow has also noted that Schmidhuber reviewed the 2014 paper and at that time asked only for a citation to predictability minimization, not to artificial curiosity. ^[11] Schmidhuber has continued to press the point. When the GAN paper received a NeurIPS Test of Time award in 2024, he objected publicly on social media, writing: "Yet another award for plagiarism." ^[16]

On the vanishing gradient problem, he credits Hochreiter's 1991 analysis as the first, and has noted that a 1994 paper by Bengio and colleagues on the same difficulty did not cite it. ^[1]^[9]

Other researchers have characterized the pattern differently. In a statement to The New York Times, LeCun wrote that Schmidhuber "is manically obsessed with recognition and keeps claiming credit he doesn't deserve for many, many things," and that "it causes him to systematically stand up at the end of every talk and claim credit for what was just presented, generally not in a justified manner." ^[7] Schmidhuber has replied that such statements come without specific examples. Hinton, in a 2020 Reddit comment about the broader argument, wrote that he had never claimed to have invented backpropagation, attributing the influential 1986 formulation to David Rumelhart and describing his own contribution as showing that the method could learn useful internal representations. ^[9] The exchanges have become well enough known that researchers sometimes use "to be schmidhubered" as informal shorthand for having a result's originality publicly questioned. ^[7]

The Wikipedia biography of Schmidhuber has at times carried a notice that a major contributor may have a close connection to the subject, which is worth keeping in mind when weighing online accounts of these debates. The fairest summary is that Schmidhuber and his lab produced a substantial body of early work, that the chronology of several ideas is genuinely tangled, and that the question of how much credit any one group is owed remains contested rather than settled. ^[7]^[9]

What awards has Schmidhuber received?

Schmidhuber's standing in the field rests largely on the long-run influence of LSTM and his other recurrent-network research. ^[1]^[2] He received the Helmholtz Award of the International Neural Network Society in 2013, and the Neural Networks Pioneer Award of the IEEE Computational Intelligence Society in 2016, cited for "pioneering contributions to deep learning and neural networks." ^[14] He is a member of the European Academy of Sciences and Arts. ^[1]

Coverage of his work often reaches for grand labels. A 2016 New York Times profile was headlined "When A.I. Matures, It May Call Jürgen Schmidhuber 'Dad,'" and various outlets have called him a father of modern AI or of generative AI. ^[7]^[10] Schmidhuber himself usually pushes the credit further back, naming earlier researchers as the real originators of deep learning. He was not among the recipients of the 2018 Turing Award or the 2024 Nobel Prizes connected to machine learning, an omission that he and some commentators have flagged given the reach of his early work. ^[8] He has also taken contrarian public positions on AI risk, declining to sign a 2023 letter calling for a pause on large AI experiments and arguing that much of the alarm around existential risk from AI is overstated. ^[13]

Facts

Field	Detail
Full name	Jürgen Schmidhuber
Born	17 January 1963, Munich, West Germany
Nationality	German
Fields	Artificial intelligence, deep learning, recurrent neural networks
Education	TUM, diploma 1987; PhD 1991; habilitation 1993
Doctoral advisors	Wilfried Brauer, Klaus Schulten
Known for	LSTM; vanishing gradient analysis; artificial curiosity; Gödel machine; fast weights
LSTM paper	Hochreiter & Schmidhuber, Neural Computation, 1997; 80,000+ citations
IDSIA	Scientific director since 1995
USI Lugano	Professor of AI, 2009 to 2021 (adjunct afterward)
TUM	Professor, 2004 to 2009
KAUST	Director, AI Initiative, since October 2021
Company	Co-founder of NNAISENSE (2014)
Notable students	Sepp Hochreiter, Alex Graves, Faustino Gomez, Daan Wierstra
Awards	INNS Helmholtz Award (2013); IEEE Neural Networks Pioneer Award (2016)

References

Jürgen Schmidhuber. Wikipedia. https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber ↩
Hochreiter, S. and Schmidhuber, J. "Long Short-Term Memory." Neural Computation, vol. 9, no. 8, 1997, pp. 1735 to 1780. https://direct.mit.edu/neco/article/9/8/1735/6109/Long-Short-Term-Memory ↩
Dalle Molle Institute for Artificial Intelligence (IDSIA USI-SUPSI). https://idsia.usi-supsi.ch/ ↩
"Schmidhuber named Director of KAUST AI Initiative." KAUST News, 2021. https://www.kaust.edu.sa/en/news/schmidhuber-named-director-of-kaust-ai-initiative ↩
"Jürgen Schmidhuber, Professor, Computer Science." KAUST CEMSE profile. https://cemse.kaust.edu.sa/profiles/jurgen-schmidhuber ↩
Schlag, I., Irie, K. and Schmidhuber, J. "Linear Transformers Are Secretly Fast Weight Programmers." 2021. https://arxiv.org/abs/2102.11174 ↩
Markoff, J. "When A.I. Matures, It May Call Jürgen Schmidhuber 'Dad.'" The New York Times, 27 November 2016. https://www.nytimes.com/2016/11/27/technology/artificial-intelligence-pioneer-jurgen-schmidhuber-overlooked.html ↩
Schmidhuber, J. "Critique of Paper by 'Deep Learning Conspiracy' (Nature 521 p 436)." people.idsia.ch, June 2015. https://people.idsia.ch/~juergen/deep-learning-conspiracy.html ↩
"Who Invented Backpropagation? Hinton Says He Didn't, but His Work Made It Popular." Synced, 23 April 2020. https://syncedreview.com/2020/04/23/who-invented-backpropagation-hinton-says-he-didnt-but-his-work-made-it-popular/ ↩
"Jürgen Schmidhuber: Towards Human-Level Artificial Intelligence." History of Data Science. https://www.historyofdatascience.com/jurgen-schmidhuber/ ↩
Schmidhuber, J. "Generative Adversarial Networks Are Special Cases of Artificial Curiosity (1990) and Also Closely Related to Predictability Minimization (1991)." Neural Networks, 2020. https://arxiv.org/abs/1906.04493 ↩
"Alma Mundi Ventures Invests in AI Startup Nnaisense." PR Newswire, 2017. https://www.prnewswire.com/news-releases/alma-mundi-ventures-invests-in-ai-startup-nnaisense-the-pioneers-of-very-deep-learning-300391576.html ↩
"Why one of the world's major AI pioneers is betting big on Saudi Arabia." Rest of World, 2025. https://restofworld.org/2025/juergen-schmidhuber-ai-saudi-arabia-tech/ ↩
"2016 IEEE CIS Neural Networks Pioneer Award Goes to Jürgen Schmidhuber." people.idsia.ch. https://people.idsia.ch/~juergen/nnpioneeraward.html ↩
"Deep Learning: Our Miraculous Year 1990-1991." people.idsia.ch (overview of LSTM impact and citation record). https://people.idsia.ch/~juergen/deep-learning-miraculous-year-1990-1991.html ↩
Schmidhuber, J. (@SchmidhuberAI), post on the 2014 GAN paper receiving the NeurIPS 2024 Test of Time Award. X (Twitter), December 2024. https://x.com/SchmidhuberAI/status/1867621020808728875 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Automatic Speech Recognition Models David Ha Douglas Eck LSTM MetaGPT Music Sepp Hochreiter

Who is Jürgen Schmidhuber?

What is long short-term memory (LSTM)?

What else did Schmidhuber invent?

Where has Schmidhuber worked?

Why is Schmidhuber known for credit disputes?

What awards has Schmidhuber received?

Facts

References

Improve this article

Related Articles

LSTM

Mixture of Experts (MoE)

Translational invariance

Activation Function

Attention

Backpropagation

What links here

Related Articles

LSTM

Mixture of Experts (MoE)

Translational invariance

Activation Function

Attention

Backpropagation

What links here