Apertus
Last reviewed
Jun 8, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,575 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,575 words
Add missing citations, update stale details, or suggest a clearer explanation.
Apertus is a fully open, multilingual large language model developed in Switzerland and released on September 2, 2025, by EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS) under the Swiss AI Initiative [1][2]. Its name is Latin for "open," reflecting a design goal of complete transparency: the project released not only the open weights but also the full training data composition, source code, training recipe, evaluation suites, and intermediate checkpoints [2][3]. Available in 8 billion and 70 billion parameter sizes and trained on roughly 15 trillion tokens spanning more than 1,800 languages, Apertus is positioned as a public-good, sovereign AI effort and as one of the most transparent fully open models built at its scale, with compliance to data-protection and copyright norms built into its data pipeline [1][3].
Apertus is a decoder-only transformer language model released in two variants, Apertus-8B and Apertus-70B, each available in a base (pretrained) version and an instruction-tuned version [4][5]. Unlike many releases that publish only model weights, Apertus follows a strict open-science standard: every scientific artifact from its development cycle, including the data-preparation scripts, the training and evaluation code, intermediate training checkpoints, and the full documentation of the data mixture, was released under a permissive license [3]. The accompanying technical report is titled "Apertus: Democratizing Open and Compliant LLMs for Global Language Environments" [3].
The model is distinguished by three design priorities: broad multilingual coverage, including under-represented languages; full reproducibility and transparency; and legal and ethical compliance by design, with the data pipeline filtered to respect website opt-out signals and to remove personal and other undesired content before training begins [1][3]. About 40 percent of the pretraining data is non-English, a substantially higher share than in many comparable models [1][2]. Apertus was made available at launch through the Swiss telecommunications operator Swisscom, the Hugging Face platform, and the Public AI network [1][2].
Apertus was produced by the Swiss AI Initiative, a national collaboration whose three core institutional partners are EPFL (Ecole Polytechnique Federale de Lausanne), ETH Zurich, and CSCS, the Swiss National Supercomputing Centre [1][2]. Imanol Schlag served as the technical lead for the project at ETH Zurich [6]. The initiative frames the model as a contribution to digital sovereignty and to open science, aiming to give researchers, developers, and the public a transparent foundation model that is not controlled by a single private company [1][2].
Training was carried out on the "Alps" supercomputer operated by CSCS in Lugano, one of the most powerful systems in Europe for AI research [1][7]. The 70B model was trained across 4,096 NVIDIA GH200 Grace Hopper superchips using the Megatron-LM framework, and the overall effort consumed more than 10 million GPU hours [1][4]. After release, the broader Public AI network coordinated additional inference capacity, allocating over 115,000 GPU hours across 20 clusters in five or more countries, with contributions from partners including AWS, Exoscale, AI Singapore, Cudo Compute, CSCS, and NCI Australia [2].
The defining feature of Apertus is the combination of full openness with regulatory compliance. The model is released under the permissive Apache 2.0 license, which allows use in education and research as well as broad societal and commercial applications [4][5]. Beyond the weights, the team published the complete recipe needed to reproduce the model: data-preparation and filtering code, training code, evaluation harnesses, and the intermediate checkpoints generated during pretraining [3].
On the compliance side, Apertus was trained exclusively on openly available data and was developed with attention to Swiss data-protection law, Swiss copyright law, and the transparency obligations of the EU AI Act [1][3]. The data pipeline retroactively respects machine-readable opt-out requests encoded in robots.txt, meaning that content from websites that signal they do not wish to be used for training is excluded even when those signals were added after the data was first collected [1][3]. The pipeline also filters out non-permissive, toxic, and personally identifiable content before training [3]. This compliance-by-design approach is one reason the project is cited as a reference point for how a frontier-scale model can be built while honoring opt-outs and removing personal data.
To further limit privacy and copyright risk at the modeling level, Apertus is trained with the Goldfish objective, a modified language-modeling loss that computes the training loss on only a randomly selected subset of tokens in each sequence. By masking a fraction of tokens from the loss, the Goldfish objective strongly suppresses verbatim memorization and recall of training data while largely preserving downstream task performance [3][8].
Apertus uses a decoder-only transformer architecture with several deliberate engineering choices aimed at training stability at scale. It employs pre-normalization with RMSNorm and query-key normalization, and it introduces xIELU, a learnable activation function used in place of more conventional choices [3][4]. Optimization uses the AdEMAMix optimizer together with a Warmup-Stable-Decay learning-rate schedule, a combination the team reports enabled effective training of the 70B model on up to 4,096 GPUs [4][8]. The Apertus-70B configuration uses 80 transformer layers, and both models support a context length of up to 65,536 tokens [4][8]. Pretraining proceeded over 15 trillion tokens using a staged data curriculum that progressively mixed web, code, and mathematics data [4][8].
Post-training consisted of supervised fine-tuning followed by alignment. The instruction-tuned variants were aligned using QRPO (a preference-optimization method) to produce the chat-capable Instruct models [4]. All models were trained in bfloat16 precision [4][5].
| Specification | Detail |
|---|---|
| Developer | Swiss AI Initiative (EPFL, ETH Zurich, CSCS) |
| Release date | September 2, 2025 |
| Model type | Decoder-only transformer language model |
| Sizes | 8B and 70B parameters (70B reported as 71B) |
| Variants | Base and Instruct, for each size |
| Training tokens | ~15 trillion |
| Languages | Over 1,800 (1,811 natively supported) |
| Non-English data share | ~40% |
| Context length | 65,536 tokens |
| Activation function | xIELU |
| Normalization | RMSNorm with QK-normalization |
| Optimizer | AdEMAMix (Warmup-Stable-Decay schedule) |
| Training objective | Goldfish loss (memorization-suppressing) |
| Alignment method | Supervised fine-tuning + QRPO |
| Precision | bfloat16 |
| Hardware | 4,096 NVIDIA GH200 (Alps supercomputer, CSCS) |
| Framework | Megatron-LM |
| License | Apache 2.0 |
| Openness | Weights, data recipe, code, checkpoints, evals |
Apertus is built for multilingual AI, with the developers reporting native support for 1,811 languages, including many that are under-represented in mainstream models, such as Swiss German and Romansh [1][4]. The emphasis on broad coverage is reflected in concrete results: the technical report cites Apertus-70B translating from German to Romansh with a BLEU score of 27.8, ahead of Llama-3.3-70B at 21.6 [9].
On standard English-language and multilingual benchmark suites, Apertus is competitive with open peers at the same scale rather than a leader on every metric. According to figures published on the model cards, Apertus-70B records an average pretraining-evaluation score of about 67.5 percent, comparable to Llama 3.1 70B at 67.3 percent and below Qwen2.5-72B at 69.8 percent [4]. The 8B model reports an average around 65.8 percent across its evaluation tasks, with per-benchmark results including ARC at 72.7 percent, HellaSwag at 59.8 percent, WinoGrande at 70.6 percent, PIQA at 79.8 percent, XCOPA at 66.5 percent, and XNLI at 45.2 percent [5]. These numbers should be read as the developers' reported results; independent evaluations have produced their own findings, and Apertus is generally characterized as a solid open model whose primary distinction is transparency and multilingual breadth rather than topping leaderboards [10].
Apertus is widely cited as a milestone for European and Swiss sovereign AI and for the open-science approach to building foundation models. By releasing the full stack, weights, data composition, code, recipes, and checkpoints, under a permissive license, it sets a high bar for what "fully open" means, going beyond the open-weight-only releases that are common in the field and aligning closely with the philosophy of efforts such as Allen AI's OLMo [3][10]. Its compliance-by-design data pipeline, which honors robots.txt opt-outs retroactively and strips personal and copyrighted material, is frequently held up as a model for how to build large language models in line with the EU AI Act and European data-protection norms [1][3].
The project sits alongside other European and sovereign open-language-model initiatives, including the EU-funded EuroLLM and the German Teuken model, as part of a broader push to ensure that the continent has transparent, locally governed AI infrastructure rather than depending solely on models from a handful of large foreign companies [10]. As a public-good model trained on public supercomputing infrastructure and released for unrestricted research and commercial use, Apertus is presented by its creators as a demonstration that frontier-scale, multilingual AI can be built openly, reproducibly, and in compliance with the law [1][2].