DolphinGemma
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,199 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,199 words
Add missing citations, update stale details, or suggest a clearer explanation.
DolphinGemma is an audio language model developed by Google to help scientists analyze the vocalizations of wild dolphins. Built on the same research that underpins Google's Gemma family of open models, it was announced on April 14, 2025, which is observed as National Dolphin Day in the United States.[1][2] The project is a collaboration between Google, the Wild Dolphin Project (WDP), and researchers at the Georgia Institute of Technology, and it draws on decades of recordings of Atlantic spotted dolphins. Rather than claiming to translate dolphin speech, the model is designed to find structure in dolphin sounds and predict which sound is likely to follow a given sequence, much as a text model predicts the next word.[1]
Dolphins produce a rich repertoire of sounds, and researchers have long suspected that this repertoire carries social information. Atlantic spotted dolphins (Stenella frontalis) use signature whistles that appear to function like names, burst-pulse "squawks" during conflict, and click "buzzes" during courtship or while chasing sharks.[1] Cataloging these sounds and correlating them with behavior is slow, manual work, and the sheer volume of acoustic data is difficult for humans to sift through. The premise behind DolphinGemma is that a model trained on the structure of human language can be repurposed to surface patterns in a non-human acoustic system.[2]
The model is part of Google's broader Gemma effort, the lightweight open models that share technology with the Gemini family. Work on the project has been associated with Google DeepMind, and one of its named contributors is Dr. Thad Starner, who holds appointments at both Google DeepMind and Georgia Tech.[3] On the field-research side, the work is led by Dr. Denise Herzing, founder and research director of the Wild Dolphin Project.[3]
The Wild Dolphin Project supplies the data that makes the model possible. Founded in 1985, the WDP runs what it describes as the world's longest-running underwater dolphin research project, focused on a specific community of Atlantic spotted dolphins in the Bahamas.[1] Over roughly four decades, researchers have built a labeled archive of underwater audio and video, organized by individual dolphin, by behavioral context, and across generations of the same community.[1][2]
That archive is what gives DolphinGemma a usable training signal. Because the sounds are paired with observed behavior, the WDP corpus lets the model learn associations between particular vocal patterns and the situations in which they occur, instead of treating the audio as an undifferentiated stream.[1] Georgia Tech contributes the engineering and the field hardware, while Google supplies the model architecture and the underlying audio technology.[2]
DolphinGemma is, in Google's framing, strictly audio-in and audio-out. It does not deal in words or images. It takes in sequences of natural dolphin sounds, identifies recurring patterns and structure, and predicts the most likely subsequent sound in the sequence.[1] The same next-token prediction that lets a language model continue a sentence is applied here to a stream of dolphin vocalizations.
Two pieces of Google audio technology make this work. The model uses the SoundStream tokenizer to turn raw dolphin sounds into a compact, machine-readable representation, and those tokens are then processed by a model architecture suited to long, complex sequences.[1][4] The result is a relatively small model of roughly 400 million parameters.[1][2] That compact size is deliberate: it lets the model run on consumer hardware in the field rather than on a server.
| Attribute | Detail |
|---|---|
| Developer | Google, with the Wild Dolphin Project and Georgia Tech |
| Announced | April 14, 2025 (National Dolphin Day) |
| Base | Gemma open models |
| Parameters | ~400 million |
| Modality | Audio-in, audio-out |
| Audio tokenizer | SoundStream |
| Training data | Wild Dolphin Project archive (Atlantic spotted dolphins, since 1985) |
| Field hardware | Google Pixel phones |
The roughly 400-million-parameter size means DolphinGemma can run directly on the Google Pixel phones the WDP already carries into the water, removing the need for bulky, power-hungry custom equipment.[1][2] Earlier field work used a Pixel 6 to run the model for real-time listening and pattern recognition. For the 2025 research season, the team planned to move to a setup built around the Pixel 9, housed in a waterproof rig, which adds speaker and microphone functions and enough processing headroom to run deep-learning models and template-matching algorithms at the same time.[1][4]
Running on the phone in the field matters for a second reason: it feeds into an interactive system called CHAT, short for Cetacean Hearing Augmentation Telemetry, an underwater computer interface developed by the WDP with Georgia Tech.[1][5] In CHAT experiments, researchers play novel synthetic whistles that are assigned to objects the dolphins like, such as sargassum seaweed or a scarf, then watch to see whether the dolphins mimic those whistles to request the items. DolphinGemma's predictive abilities are meant to help the system recognize a dolphin's mimic quickly and relay it to a researcher through bone-conducting headphones, supporting a basic two-way exchange rather than a translation of natural dolphin language.[1][5]
The stated goal of DolphinGemma is to accelerate the search for structure in dolphin communication and, in the longer term, to support research into a shared, simplified vocabulary through CHAT.[1] Both Google and outside coverage have been careful to qualify what this means. The model finds and predicts patterns; it does not decode or translate a dolphin "language," and the project does not claim that dolphins have language in the human sense.[1] Even with machine assistance, interpreting the meaning behind clustered sound types remains the central unsolved problem, and the CHAT vocabulary built up over years of work is still very small.[3]
There are hard physical limits as well. Humans underwater struggle to localize where a sound comes from, and some dolphin sounds fall outside the range of human hearing, which is part of why automated analysis is attractive in the first place.[3] DolphinGemma was trained on Atlantic spotted dolphins specifically, so applying it to other cetaceans such as bottlenose or spinner dolphins is expected to require fine-tuning on those species' sounds.[1][2] Popular headlines describing the project as "talking to dolphins" therefore run ahead of what the model actually does.
Google said it intends to release DolphinGemma as an open model, with availability planned for the summer of 2025.[1][2] The open release is framed as a way to let researchers studying other dolphin populations and other cetacean species adapt the model to their own acoustic data, building on the open-model approach of the broader Gemma 3 lineup.[1][2] As described at announcement, the model was still in active development, with the open weights and the next field season both positioned as upcoming milestones rather than completed work.[4]