# Sepp Hochreiter

> Source: https://aiwiki.ai/wiki/sepp_hochreiter
> Updated: 2026-06-08
> Categories: People
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

Sepp Hochreiter (born February 14, 1967) is a German computer scientist regarded as one of the founders of modern [deep learning](/wiki/deep_learning). As a student at the [Technical University of Munich](/wiki/technical_university_of_munich), he identified the [vanishing gradient problem](/wiki/vanishing_gradient_problem) that had blocked the training of [recurrent neural networks](/wiki/recurrent_neural_network), and in 1997 he and [Jurgen Schmidhuber](/wiki/jurgen_schmidhuber) published [Long Short-Term Memory](/wiki/lstm) (LSTM), the architecture that solved it. LSTM became one of the most widely used and most cited methods in the history of [machine learning](/wiki/machine_learning), eventually powering a generation of products in speech recognition and translation [1][2]. Since 2006 Hochreiter has been a professor at [Johannes Kepler University Linz](/wiki/johannes_kepler_university_linz) (JKU) in Austria, where he heads the Institute for Machine Learning and the LIT AI Lab [1]. In December 2023 he co-founded [NXAI](/wiki/nxai), a Linz-based company built around xLSTM, an extended recurrent architecture positioned as a European alternative to the [transformer](/wiki/transformer) [3][13].

## Education

Hochreiter was born on February 14, 1967, in Muhldorf am Inn, in the German state of Bavaria [1]. He studied computer science at the Technical University of Munich (TUM), where in 1991 he completed his Diplom, the German equivalent of a master's degree, with a thesis titled "Untersuchungen zu dynamischen neuronalen Netzen" ("Investigations into dynamic neural networks"), supervised by Jurgen Schmidhuber [1][4]. The thesis is now recognized as a landmark in the field. In it Hochreiter analyzed why gradient-based training fails on long sequences, formally describing what became known as the vanishing and exploding gradient problem [4][5]. Because the error signal that flows backward through a recurrent network either shrinks or grows exponentially at each step, such networks were effectively unable to learn dependencies spanning many time steps.

He remained at TUM for his doctorate, which he completed in 1999 with a dissertation on generalization in low-complexity neural networks ("Generalisierung bei neuronalen Netzen geringer Komplexitat"). His formal doctoral advisor was Wilfried Brauer, although he continued to collaborate closely with Schmidhuber during this period [1][6]. The common shorthand that Hochreiter took his PhD under Schmidhuber is a simplification: Schmidhuber supervised the 1991 diploma thesis, while Brauer was the official adviser of record for the 1999 doctorate.

## LSTM

The solution Hochreiter had begun sketching in 1991 was fully developed and published in 1997, when he and Schmidhuber introduced Long Short-Term Memory in the journal Neural Computation [2]. LSTM is a recurrent neural network in which a memory cell is protected by multiplicative gates, including an input gate and an output gate (a forget gate was added in later work by Felix Gers and colleagues). The gates let the network store information over long intervals and retrieve or discard it on demand, which keeps the error signal from vanishing as it propagates back through time.

For several years the paper attracted little notice, in part because the computing power needed to exploit it did not yet exist. It then became the workhorse of sequence modeling. By the 2010s LSTM underpinned [Google](/wiki/google) Translate, Apple's Siri and predictive keyboard, Amazon's Alexa, and a wide range of [speech recognition](/wiki/speech_recognition) and handwriting systems, placing the architecture on a very large number of consumer devices [1][15]. The 1997 paper is frequently described as the most cited [neural network](/wiki/neural_network) paper of the twentieth century [1]. LSTM was also part of the technical lineage that led to the [large language models](/wiki/large_language_model) of the 2020s, before the transformer largely displaced recurrent designs after 2017. In 2021 Hochreiter received the IEEE Computational Intelligence Society Neural Networks Pioneer Award for the work [1].

## Academic career

After his doctorate Hochreiter held research positions at the Technical University of Berlin and at the University of Colorado Boulder before returning to a German-speaking institution [1]. In 2006 he was appointed professor at Johannes Kepler University Linz, where he founded and led the Institute of Bioinformatics. Over the following decade his group applied machine learning to genomics and drug discovery while continuing core deep learning research [1].

In 2017 Hochreiter became head of the newly created LIT AI Lab, part of the Linz Institute of Technology, and in 2018 his chair was reorganized as the Institute for Machine Learning, which he continues to direct [1][3]. The lab grew into one of the larger academic deep learning groups in Europe and trained many of the researchers who would later join NXAI.

## Research

Hochreiter's research spans the theory and practice of deep learning as well as computational biology. Several of his contributions have become standard tools.

| Method or paper | Year | Significance |
| --- | --- | --- |
| Vanishing gradient analysis (Diplom thesis) | 1991 | First detailed account of why RNNs cannot learn long-range dependencies [4] |
| Long Short-Term Memory | 1997 | Gated recurrent memory cell; foundational sequence-learning architecture [2] |
| FABIA | 2010 | Factor-analysis biclustering for gene expression data [1] |
| DeepTox | 2014 | Won the NIH Tox21 computational toxicity-prediction challenge [7] |
| Self-Normalizing Networks (SELU) | 2017 | Self-normalizing activations for very deep feedforward networks [8] |
| Hopfield Networks is All You Need | 2020 | Linked transformer attention to modern Hopfield networks [9] |
| xLSTM | 2024 | Extended LSTM; recurrent alternative to the transformer [10] |
| xLSTM 7B | 2025 | 7B-parameter recurrent LLM built for efficient inference [11] |

In [bioinformatics](/wiki/bioinformatics), his group built methods such as FABIA for biclustering of gene expression data and FARMS for analyzing microarrays. In 2014 their DeepTox system won the United States NIH Tox21 Data Challenge, then the largest public benchmark for computational toxicity prediction, taking the overall grand challenge and most of its subtasks [7]. DeepTox was an early demonstration that deep neural networks could outperform established cheminformatics methods at predicting whether molecules are toxic, work directly relevant to [drug discovery](/wiki/drug_discovery).

In 2017 Hochreiter and colleagues, led by Gunter Klambauer, introduced self-normalizing neural networks and the scaled exponential linear unit (SELU) activation, which pushes neuron activations toward zero mean and unit variance automatically and so allows very deep feedforward networks to be trained without [batch normalization](/wiki/batch_normalization) [8]. In 2020 his group published "Hopfield Networks is All You Need," showing that the [attention mechanism](/wiki/attention_mechanism) at the heart of transformers is mathematically equivalent to the update rule of a modern [Hopfield network](/wiki/hopfield_network) with continuous states. The result connected associative memory to the dominant architecture of the day and gave a new way to analyze how transformer layers behave [9].

## NXAI and xLSTM

In December 2023 Hochreiter co-founded NXAI GmbH in Linz together with the company builder Netural X and PIERER Digital Holding, in close cooperation with JKU [3][13]. He serves as co-founder and chief scientist. The company's stated mission is to commercialize research from his lab and to build what it calls "AI made in Europe," and it announced plans to raise on the order of 100 million euros to scale the effort [3][13].

NXAI is built around xLSTM (Extended Long Short-Term Memory), an attempt to modernize the recurrent approach so it can compete with transformers at large scale. The architecture was described in a paper published in May 2024, led by Maximilian Beck and Korbinian Poppel with Hochreiter as senior author [10]. xLSTM adds exponential gating with stabilization and replaces the original scalar memory with two new variants: sLSTM, which retains a scalar memory but adds a memory-mixing mechanism, and mLSTM, which uses a matrix memory and a covariance update rule and is fully parallelizable during training. Stacked into residual blocks, xLSTM scales linearly with sequence length and uses roughly constant memory during inference, in contrast to the quadratic cost of transformer attention [10].

In March 2025 the team released xLSTM 7B, a 7-billion-parameter language model trained on 2.3 trillion tokens [11][12]. NXAI reported that it matched comparable transformer and [Mamba](/wiki/mamba) models on standard benchmarks while delivering the highest generation throughput and the lowest memory footprint among models of its size, and it released the weights, training code, and inference kernels openly on GitHub and Hugging Face [11][12]. The company has emphasized efficiency-driven uses where linear scaling matters most, including robotics, automotive systems, and industrial time series. In 2025 it introduced TiRex, a 35-million-parameter xLSTM [time series forecasting](/wiki/time_series_forecasting) model that topped international zero-shot forecasting leaderboards despite being far smaller than competing models from Google, Amazon, and Salesforce [14].

Hochreiter has become a prominent advocate for an independent European path in AI. He argues that Europe should compete not by matching the United States and China on raw compute and model size but by building more efficient architectures that, as he puts it, learn what to forget, and he has framed xLSTM as a chance to challenge transformer dominance from the heart of Europe [3][15]. As of 2026 he remains a professor at JKU Linz and chief scientist of NXAI [3][12].

## References

1. "Sepp Hochreiter." Wikipedia. https://en.wikipedia.org/wiki/Sepp_Hochreiter
2. S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory." Neural Computation 9(8):1735-1780, 1997. https://direct.mit.edu/neco/article/9/8/1735/6109/Long-Short-Term-Memory
3. "AI Made in Europe: Entrepreneurial Reinforcement for Leading Researcher Sepp Hochreiter and His xLSTM." JKU Linz, December 2023. https://www.jku.at/en/research/research-at-the-jku/research-news/detail/news/ai-made-in-europe-spitzenforscher-sepp-hochreiter-und-sein-xlstm-erhalten-unternehmerische-verstaerkung-fuer-europaeisches-large-language-model/
4. S. Hochreiter, "Untersuchungen zu dynamischen neuronalen Netzen." Diploma thesis, Technical University of Munich, 1991.
5. J. Schmidhuber, "Deep Learning: Our Miraculous Year 1990-1991." arXiv:2005.05744, 2020. https://arxiv.org/abs/2005.05744
6. "Josef Hochreiter." Mathematics Genealogy Project. https://www.genealogy.math.ndsu.nodak.edu/id.php?id=220038
7. A. Mayr, G. Klambauer, T. Unterthiner, S. Hochreiter, "DeepTox: Toxicity Prediction using Deep Learning." Frontiers in Environmental Science, 2016. https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2015.00080/full
8. G. Klambauer, T. Unterthiner, A. Mayr, S. Hochreiter, "Self-Normalizing Neural Networks." arXiv:1706.02515, 2017 (NeurIPS 2017). https://arxiv.org/abs/1706.02515
9. H. Ramsauer et al. (incl. S. Hochreiter), "Hopfield Networks is All You Need." arXiv:2008.02217, 2020. https://arxiv.org/abs/2008.02217
10. M. Beck, K. Poppel, M. Spanring, A. Auer, et al. and S. Hochreiter, "xLSTM: Extended Long Short-Term Memory." arXiv:2405.04517, 2024 (NeurIPS 2024). https://arxiv.org/abs/2405.04517
11. M. Beck et al., "xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference." arXiv:2503.13427, 2025 (ICML 2025). https://arxiv.org/abs/2503.13427
12. "xLSTM 7B: NXAI releases its new xLSTM 7B model." NXAI, 2025. https://www.nx-ai.com/en/news/xlstm-7b-nxai-releases-its-new-xlstm-7b-model
13. "Start-up NXAI founded by Sepp Hochreiter aims for EUR 100m funding round." MA Insights, 2024. https://www.mainsights.io/ma-news/start-up-nxai-founded-by-sepp-hochreiter-aims-for-eur-100m-funding-round
14. "AI Forecasting Breakthrough: TiRex Sets New Global Standard with IT:U Research Involvement." IT:U, 2025. https://it-u.at/en/news/ai-forecasting-breakthrough-tirex-sets-new-global-standard-with-itu-research-involvement/
15. "AI pioneer wants Europe to forge its own nimbler way forward." The Japan Times, March 15, 2025. https://www.japantimes.co.jp/business/2025/03/15/tech/ai-europe-new-approach/

