Apertus

AI Models Large Language Models Open Source AI

8 min read

Updated Jun 8, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 8, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v1 · 1,575 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Apertus is a fully open, multilingual large language model developed in Switzerland and released on September 2, 2025, by EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS) under the Swiss AI Initiative ^[1]^[2]. Its name is Latin for "open," reflecting a design goal of complete transparency: the project released not only the open weights but also the full training data composition, source code, training recipe, evaluation suites, and intermediate checkpoints ^[2]^[3]. Available in 8 billion and 70 billion parameter sizes and trained on roughly 15 trillion tokens spanning more than 1,800 languages, Apertus is positioned as a public-good, sovereign AI effort and as one of the most transparent fully open models built at its scale, with compliance to data-protection and copyright norms built into its data pipeline ^[1]^[3].

Overview

Apertus is a decoder-only transformer language model released in two variants, Apertus-8B and Apertus-70B, each available in a base (pretrained) version and an instruction-tuned version ^[4]^[5]. Unlike many releases that publish only model weights, Apertus follows a strict open-science standard: every scientific artifact from its development cycle, including the data-preparation scripts, the training and evaluation code, intermediate training checkpoints, and the full documentation of the data mixture, was released under a permissive license ^[3]. The accompanying technical report is titled "Apertus: Democratizing Open and Compliant LLMs for Global Language Environments" ^[3].

The model is distinguished by three design priorities: broad multilingual coverage, including under-represented languages; full reproducibility and transparency; and legal and ethical compliance by design, with the data pipeline filtered to respect website opt-out signals and to remove personal and other undesired content before training begins ^[1]^[3]. About 40 percent of the pretraining data is non-English, a substantially higher share than in many comparable models ^[1]^[2]. Apertus was made available at launch through the Swiss telecommunications operator Swisscom, the Hugging Face platform, and the Public AI network ^[1]^[2].

Developers

Apertus was produced by the Swiss AI Initiative, a national collaboration whose three core institutional partners are EPFL (Ecole Polytechnique Federale de Lausanne), ETH Zurich, and CSCS, the Swiss National Supercomputing Centre ^[1]^[2]. Imanol Schlag served as the technical lead for the project at ETH Zurich ^[6]. The initiative frames the model as a contribution to digital sovereignty and to open science, aiming to give researchers, developers, and the public a transparent foundation model that is not controlled by a single private company ^[1]^[2].

Training was carried out on the "Alps" supercomputer operated by CSCS in Lugano, one of the most powerful systems in Europe for AI research ^[1]^[7]. The 70B model was trained across 4,096 NVIDIA GH200 Grace Hopper superchips using the Megatron-LM framework, and the overall effort consumed more than 10 million GPU hours ^[1]^[4]. After release, the broader Public AI network coordinated additional inference capacity, allocating over 115,000 GPU hours across 20 clusters in five or more countries, with contributions from partners including AWS, Exoscale, AI Singapore, Cudo Compute, CSCS, and NCI Australia ^[2].

Full openness and compliance

The defining feature of Apertus is the combination of full openness with regulatory compliance. The model is released under the permissive Apache 2.0 license, which allows use in education and research as well as broad societal and commercial applications ^[4]^[5]. Beyond the weights, the team published the complete recipe needed to reproduce the model: data-preparation and filtering code, training code, evaluation harnesses, and the intermediate checkpoints generated during pretraining ^[3].

On the compliance side, Apertus was trained exclusively on openly available data and was developed with attention to Swiss data-protection law, Swiss copyright law, and the transparency obligations of the EU AI Act ^[1]^[3]. The data pipeline retroactively respects machine-readable opt-out requests encoded in robots.txt, meaning that content from websites that signal they do not wish to be used for training is excluded even when those signals were added after the data was first collected ^[1]^[3]. The pipeline also filters out non-permissive, toxic, and personally identifiable content before training ^[3]. This compliance-by-design approach is one reason the project is cited as a reference point for how a frontier-scale model can be built while honoring opt-outs and removing personal data.

To further limit privacy and copyright risk at the modeling level, Apertus is trained with the Goldfish objective, a modified language-modeling loss that computes the training loss on only a randomly selected subset of tokens in each sequence. By masking a fraction of tokens from the loss, the Goldfish objective strongly suppresses verbatim memorization and recall of training data while largely preserving downstream task performance ^[3]^[8].

Architecture and training

Apertus uses a decoder-only transformer architecture with several deliberate engineering choices aimed at training stability at scale. It employs pre-normalization with RMSNorm and query-key normalization, and it introduces xIELU, a learnable activation function used in place of more conventional choices ^[3]^[4]. Optimization uses the AdEMAMix optimizer together with a Warmup-Stable-Decay learning-rate schedule, a combination the team reports enabled effective training of the 70B model on up to 4,096 GPUs ^[4]^[8]. The Apertus-70B configuration uses 80 transformer layers, and both models support a context length of up to 65,536 tokens ^[4]^[8]. Pretraining proceeded over 15 trillion tokens using a staged data curriculum that progressively mixed web, code, and mathematics data ^[4]^[8].

Post-training consisted of supervised fine-tuning followed by alignment. The instruction-tuned variants were aligned using QRPO (a preference-optimization method) to produce the chat-capable Instruct models ^[4]. All models were trained in bfloat16 precision ^[4]^[5].

Specification	Detail
Developer	Swiss AI Initiative (EPFL, ETH Zurich, CSCS)
Release date	September 2, 2025
Model type	Decoder-only transformer language model
Sizes	8B and 70B parameters (70B reported as 71B)
Variants	Base and Instruct, for each size
Training tokens	~15 trillion
Languages	Over 1,800 (1,811 natively supported)
Non-English data share	~40%
Context length	65,536 tokens
Activation function	xIELU
Normalization	RMSNorm with QK-normalization
Optimizer	AdEMAMix (Warmup-Stable-Decay schedule)
Training objective	Goldfish loss (memorization-suppressing)
Alignment method	Supervised fine-tuning + QRPO
Precision	bfloat16
Hardware	4,096 NVIDIA GH200 (Alps supercomputer, CSCS)
Framework	Megatron-LM
License	Apache 2.0
Openness	Weights, data recipe, code, checkpoints, evals

Performance and languages

Apertus is built for multilingual AI, with the developers reporting native support for 1,811 languages, including many that are under-represented in mainstream models, such as Swiss German and Romansh ^[1]^[4]. The emphasis on broad coverage is reflected in concrete results: the technical report cites Apertus-70B translating from German to Romansh with a BLEU score of 27.8, ahead of Llama-3.3-70B at 21.6 ^[9].

On standard English-language and multilingual benchmark suites, Apertus is competitive with open peers at the same scale rather than a leader on every metric. According to figures published on the model cards, Apertus-70B records an average pretraining-evaluation score of about 67.5 percent, comparable to Llama 3.1 70B at 67.3 percent and below Qwen2.5-72B at 69.8 percent ^[4]. The 8B model reports an average around 65.8 percent across its evaluation tasks, with per-benchmark results including ARC at 72.7 percent, HellaSwag at 59.8 percent, WinoGrande at 70.6 percent, PIQA at 79.8 percent, XCOPA at 66.5 percent, and XNLI at 45.2 percent ^[5]. These numbers should be read as the developers' reported results; independent evaluations have produced their own findings, and Apertus is generally characterized as a solid open model whose primary distinction is transparency and multilingual breadth rather than topping leaderboards ^[10].

Significance

Apertus is widely cited as a milestone for European and Swiss sovereign AI and for the open-science approach to building foundation models. By releasing the full stack, weights, data composition, code, recipes, and checkpoints, under a permissive license, it sets a high bar for what "fully open" means, going beyond the open-weight-only releases that are common in the field and aligning closely with the philosophy of efforts such as Allen AI's OLMo ^[3]^[10]. Its compliance-by-design data pipeline, which honors robots.txt opt-outs retroactively and strips personal and copyrighted material, is frequently held up as a model for how to build large language models in line with the EU AI Act and European data-protection norms ^[1]^[3].

The project sits alongside other European and sovereign open-language-model initiatives, including the EU-funded EuroLLM and the German Teuken model, as part of a broader push to ensure that the continent has transparent, locally governed AI infrastructure rather than depending solely on models from a handful of large foreign companies ^[10]. As a public-good model trained on public supercomputing infrastructure and released for unrestricted research and commercial use, Apertus is presented by its creators as a demonstration that frontier-scale, multilingual AI can be built openly, reproducibly, and in compliance with the law ^[1]^[2].

References

ETH Zurich. "Apertus: a fully open, transparent, multilingual language model." September 2, 2025. https://ethz.ch/en/news-and-events/eth-news/news/2025/09/press-release-apertus-a-fully-open-transparent-multilingual-language-model.html ↩
Public AI. "With love, from Switzerland." https://publicai.co/stories/apertus ↩
Hernandez-Cano, A., et al. "Apertus: Democratizing Open and Compliant LLMs for Global Language Environments." arXiv:2509.14233. https://arxiv.org/abs/2509.14233 ↩
Hugging Face. "swiss-ai/Apertus-70B-2509 (model card)." https://huggingface.co/swiss-ai/Apertus-70B-2509 ↩
Hugging Face. "swiss-ai/Apertus-8B-2509 (model card)." https://huggingface.co/swiss-ai/Apertus-8B-2509 ↩
Apertus.ai. "Apertus: Open, Multilingual LLM." https://apertus.ai/en/blog/swiss-ai-apertus-models-release/ ↩
CSCS. "Alps supercomputer." https://www.cscs.ch/computers/alps ↩
swiss-ai. "Apertus Tech Report repository." GitHub. https://github.com/swiss-ai/apertus-tech-report ↩
heise online. "Apertus tested: How the multilingual AI model performs." https://www.heise.de/en/background/Apertus-tested-How-the-multilingual-AI-model-performs-10645644.html ↩
effektiv. "Apertus is here: What can the Swiss LLM really do?" https://www.effektiv.ch/en/blog/apertus-release ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

ETH Zurich OLMo 3

Overview

Developers

Full openness and compliance

Architecture and training

Performance and languages

Significance

References

Improve this article

Related Articles

Llama 3

OLMo

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

What links here

Related Articles

Llama 3

OLMo

DeepSeek V4

Kimi K2

DeepSeek V3

Hunyuan

What links here