Sepp Hochreiter
Last reviewed
Sources
15 citations
Review status
Source-backed
Revision
v1 · 1,699 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
15 citations
Review status
Source-backed
Revision
v1 · 1,699 words
Add missing citations, update stale details, or suggest a clearer explanation.
Sepp Hochreiter (born February 14, 1967) is a German computer scientist regarded as one of the founders of modern deep learning. As a student at the Technical University of Munich, he identified the vanishing gradient problem that had blocked the training of recurrent neural networks, and in 1997 he and Jurgen Schmidhuber published Long Short-Term Memory (LSTM), the architecture that solved it. LSTM became one of the most widely used and most cited methods in the history of machine learning, eventually powering a generation of products in speech recognition and translation [1][2]. Since 2006 Hochreiter has been a professor at Johannes Kepler University Linz (JKU) in Austria, where he heads the Institute for Machine Learning and the LIT AI Lab [1]. In December 2023 he co-founded NXAI, a Linz-based company built around xLSTM, an extended recurrent architecture positioned as a European alternative to the transformer [3][13].
Hochreiter was born on February 14, 1967, in Muhldorf am Inn, in the German state of Bavaria [1]. He studied computer science at the Technical University of Munich (TUM), where in 1991 he completed his Diplom, the German equivalent of a master's degree, with a thesis titled "Untersuchungen zu dynamischen neuronalen Netzen" ("Investigations into dynamic neural networks"), supervised by Jurgen Schmidhuber [1][4]. The thesis is now recognized as a landmark in the field. In it Hochreiter analyzed why gradient-based training fails on long sequences, formally describing what became known as the vanishing and exploding gradient problem [4][5]. Because the error signal that flows backward through a recurrent network either shrinks or grows exponentially at each step, such networks were effectively unable to learn dependencies spanning many time steps.
He remained at TUM for his doctorate, which he completed in 1999 with a dissertation on generalization in low-complexity neural networks ("Generalisierung bei neuronalen Netzen geringer Komplexitat"). His formal doctoral advisor was Wilfried Brauer, although he continued to collaborate closely with Schmidhuber during this period [1][6]. The common shorthand that Hochreiter took his PhD under Schmidhuber is a simplification: Schmidhuber supervised the 1991 diploma thesis, while Brauer was the official adviser of record for the 1999 doctorate.
The solution Hochreiter had begun sketching in 1991 was fully developed and published in 1997, when he and Schmidhuber introduced Long Short-Term Memory in the journal Neural Computation [2]. LSTM is a recurrent neural network in which a memory cell is protected by multiplicative gates, including an input gate and an output gate (a forget gate was added in later work by Felix Gers and colleagues). The gates let the network store information over long intervals and retrieve or discard it on demand, which keeps the error signal from vanishing as it propagates back through time.
For several years the paper attracted little notice, in part because the computing power needed to exploit it did not yet exist. It then became the workhorse of sequence modeling. By the 2010s LSTM underpinned Google Translate, Apple's Siri and predictive keyboard, Amazon's Alexa, and a wide range of speech recognition and handwriting systems, placing the architecture on a very large number of consumer devices [1][15]. The 1997 paper is frequently described as the most cited neural network paper of the twentieth century [1]. LSTM was also part of the technical lineage that led to the large language models of the 2020s, before the transformer largely displaced recurrent designs after 2017. In 2021 Hochreiter received the IEEE Computational Intelligence Society Neural Networks Pioneer Award for the work [1].
After his doctorate Hochreiter held research positions at the Technical University of Berlin and at the University of Colorado Boulder before returning to a German-speaking institution [1]. In 2006 he was appointed professor at Johannes Kepler University Linz, where he founded and led the Institute of Bioinformatics. Over the following decade his group applied machine learning to genomics and drug discovery while continuing core deep learning research [1].
In 2017 Hochreiter became head of the newly created LIT AI Lab, part of the Linz Institute of Technology, and in 2018 his chair was reorganized as the Institute for Machine Learning, which he continues to direct [1][3]. The lab grew into one of the larger academic deep learning groups in Europe and trained many of the researchers who would later join NXAI.
Hochreiter's research spans the theory and practice of deep learning as well as computational biology. Several of his contributions have become standard tools.
| Method or paper | Year | Significance |
|---|---|---|
| Vanishing gradient analysis (Diplom thesis) | 1991 | First detailed account of why RNNs cannot learn long-range dependencies [4] |
| Long Short-Term Memory | 1997 | Gated recurrent memory cell; foundational sequence-learning architecture [2] |
| FABIA | 2010 | Factor-analysis biclustering for gene expression data [1] |
| DeepTox | 2014 | Won the NIH Tox21 computational toxicity-prediction challenge [7] |
| Self-Normalizing Networks (SELU) | 2017 | Self-normalizing activations for very deep feedforward networks [8] |
| Hopfield Networks is All You Need | 2020 | Linked transformer attention to modern Hopfield networks [9] |
| xLSTM | 2024 | Extended LSTM; recurrent alternative to the transformer [10] |
| xLSTM 7B | 2025 | 7B-parameter recurrent LLM built for efficient inference [11] |
In bioinformatics, his group built methods such as FABIA for biclustering of gene expression data and FARMS for analyzing microarrays. In 2014 their DeepTox system won the United States NIH Tox21 Data Challenge, then the largest public benchmark for computational toxicity prediction, taking the overall grand challenge and most of its subtasks [7]. DeepTox was an early demonstration that deep neural networks could outperform established cheminformatics methods at predicting whether molecules are toxic, work directly relevant to drug discovery.
In 2017 Hochreiter and colleagues, led by Gunter Klambauer, introduced self-normalizing neural networks and the scaled exponential linear unit (SELU) activation, which pushes neuron activations toward zero mean and unit variance automatically and so allows very deep feedforward networks to be trained without batch normalization [8]. In 2020 his group published "Hopfield Networks is All You Need," showing that the attention mechanism at the heart of transformers is mathematically equivalent to the update rule of a modern Hopfield network with continuous states. The result connected associative memory to the dominant architecture of the day and gave a new way to analyze how transformer layers behave [9].
In December 2023 Hochreiter co-founded NXAI GmbH in Linz together with the company builder Netural X and PIERER Digital Holding, in close cooperation with JKU [3][13]. He serves as co-founder and chief scientist. The company's stated mission is to commercialize research from his lab and to build what it calls "AI made in Europe," and it announced plans to raise on the order of 100 million euros to scale the effort [3][13].
NXAI is built around xLSTM (Extended Long Short-Term Memory), an attempt to modernize the recurrent approach so it can compete with transformers at large scale. The architecture was described in a paper published in May 2024, led by Maximilian Beck and Korbinian Poppel with Hochreiter as senior author [10]. xLSTM adds exponential gating with stabilization and replaces the original scalar memory with two new variants: sLSTM, which retains a scalar memory but adds a memory-mixing mechanism, and mLSTM, which uses a matrix memory and a covariance update rule and is fully parallelizable during training. Stacked into residual blocks, xLSTM scales linearly with sequence length and uses roughly constant memory during inference, in contrast to the quadratic cost of transformer attention [10].
In March 2025 the team released xLSTM 7B, a 7-billion-parameter language model trained on 2.3 trillion tokens [11][12]. NXAI reported that it matched comparable transformer and Mamba models on standard benchmarks while delivering the highest generation throughput and the lowest memory footprint among models of its size, and it released the weights, training code, and inference kernels openly on GitHub and Hugging Face [11][12]. The company has emphasized efficiency-driven uses where linear scaling matters most, including robotics, automotive systems, and industrial time series. In 2025 it introduced TiRex, a 35-million-parameter xLSTM time series forecasting model that topped international zero-shot forecasting leaderboards despite being far smaller than competing models from Google, Amazon, and Salesforce [14].
Hochreiter has become a prominent advocate for an independent European path in AI. He argues that Europe should compete not by matching the United States and China on raw compute and model size but by building more efficient architectures that, as he puts it, learn what to forget, and he has framed xLSTM as a chance to challenge transformer dominance from the heart of Europe [3][15]. As of 2026 he remains a professor at JKU Linz and chief scientist of NXAI [3][12].