CodeGemma
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,200 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,200 words
Add missing citations, update stale details, or suggest a clearer explanation.
CodeGemma is a family of open code-generation models that Google released in April 2024, built on the first generation of its lightweight Gemma models. The collection targets practical software-development tasks: completing code as a developer types, generating functions or whole blocks from a prompt, and answering programming questions in a chat setting. The models share the architecture, tokenizer, and much of the research lineage of Gemma, which itself draws on the same technology used to build Google's Gemini models, but they are further trained on large volumes of source code so that they specialize in programming.[1][2]
Google DeepMind introduced the original Gemma models in February 2024 as text-only, decoder-only transformers released with open weights. CodeGemma extended that line two months later as part of a wave of task-specific Gemma derivatives. Google announced it on April 9, 2024, alongside RecurrentGemma, a separate variant that swaps the standard transformer for a recurrent architecture to improve memory efficiency.[1][3] The two were positioned as the first members of an expanding Gemma ecosystem aimed at developers and researchers, a pattern Google continued later with the vision-language model PaliGemma and the second-generation Gemma 2.
CodeGemma was a point release rather than a single static drop. Version 1.0 shipped on April 9, 2024, and an updated version 1.1, which refreshed the 2B pretrained model and the 7B instruction-tuned model, followed on May 3, 2024.[2] A technical report, credited to the "CodeGemma Team" and listing more than two dozen contributors, was posted to arXiv in June 2024.[4]
The release comprised three checkpoints. The 2B and 7B pretrained models are base models suited to code infilling and completion, while the 7B instruction-tuned model is meant for conversational use. Parameter counts refer to the underlying Gemma sizes; the 7B model has roughly 9 billion total parameters with about 7 billion active in its core, reflecting Gemma's embedding layout.[5]
| Model | Parameters | Type | Primary use |
|---|---|---|---|
| codegemma-2b | 2B | Pretrained | Fast code infilling and open-ended generation in latency-sensitive settings |
| codegemma-7b | 7B | Pretrained | Code completion and code generation |
| codegemma-7b-it | 7B | Instruction-tuned | Code chat and instruction following |
All three inherit Gemma's context window of 8,192 tokens and were trained in bfloat16 precision on Google TPU hardware using JAX.[5] The 2B model was designed for speed: Google reported that it delivers state-of-the-art completion quality among comparable small models while running close to twice as fast as competing options at inference, a gain it attributes largely to architectural choices carried over from Gemma.[1][4]
CodeGemma starts from Gemma and continues pretraining on a code-heavy corpus. The 2B models were trained almost entirely on code, while the 7B models used a mixture of roughly 80 percent code and 20 percent natural language so they would retain general language and reasoning skills.[2][4] The version 1.0 pretrained models saw an additional 500 billion tokens, drawn primarily from publicly available code repositories, mathematics, and web documents; the 2B version 1.1 model was trained on around 1 trillion tokens.[2][5]
The defining technique is fill-in-the-middle (FIM) training. Rather than only predicting the next token from left to right, FIM teaches the model to generate a missing middle section given both the preceding prefix and the following suffix, which mirrors how completion works inside an editor where code exists on both sides of the cursor. CodeGemma was trained with a FIM rate of 80 percent in most checkpoints, rising to 90 percent in the 2B version 1.1 model, and it learned both the prefix-suffix-middle (PSM) and suffix-prefix-middle (SPM) orderings.[2][4] Special control tokens (<|fim_prefix|>, <|fim_suffix|>, <|fim_middle|>, and <|file_separator|>) delimit the segments and mark file boundaries.[2]
The training report describes refinements beyond a basic FIM setup. Documents were split into prefix, middle, and suffix so that the suffix tends to begin at a syntactically natural point rather than an arbitrary character offset, and training examples were assembled at the repository level using dependency-graph and unit-test-based packing so that related source files appear together. The intent was to give the model better cross-file context, closer to how real projects are structured.[4]
The instruction-tuned 7B model was fine-tuned on top of the 7B base using synthetic code instructions plus mathematics datasets, including grade-school and competition-level problem sets, to strengthen logical and mathematical reasoning. It is intended for chat about code, programming, and math topics.[2][4] Across the family, Google reports support for common languages such as Python, JavaScript, Java, and C++, among others.[1][3]
Google evaluated CodeGemma on standard code and reasoning benchmarks. On the Python subset of HumanEval (pass@1) and MBPP, the larger models clearly outperform the 2B model, and instruction tuning lifts HumanEval further. The figures below are the headline pass@1 results from the model card and technical report.[4][5]
| Model | HumanEval | MBPP | GSM8K | MATH |
|---|---|---|---|---|
| codegemma-2b | 31.1 | 43.6 | n/a | n/a |
| codegemma-7b | 44.5 | 56.2 | 44.2 | 19.9 |
| codegemma-7b-it | 56.1 | 54.2 | 41.2 | 20.9 |
The version 1.1 update raised several of these scores; Google reported the instruction-tuned 7B model reaching about 60.4 on HumanEval and 47.3 on GSM8K after the refresh.[2] On infilling-specific tests derived from HumanEval, the pretrained models do well on single-line completion (around 76 to 78 percent for the 2B and 7B base models) with multi-line completion markedly harder.[2][5] Google's framing emphasized that CodeGemma matches the code performance of other open models trained on far larger code corpora while preserving stronger natural language understanding than typical code-only models, a trade-off the team highlighted as a benefit of building on Gemma.[4] Independent benchmark leaderboards, such as the BigCode Models Leaderboard, also tracked the models for cross-language comparison.[2]
CodeGemma was released with open weights through several channels. The checkpoints are hosted on the Hugging Face Hub (google/codegemma-2b, google/codegemma-7b, and google/codegemma-7b-it), on Kaggle, and in the Vertex AI Model Garden, and Google also made them available through NVIDIA NIM microservices.[1][2] Weights are distributed in bfloat16 with support for the Transformers library, and the small footprint of the 2B model (roughly 6 GB in float16) makes it runnable on consumer hardware, while the 7B model fits on a single GPU with quantization.[2]
Use of the models is governed by the Gemma Terms of Use rather than a standard open-source license. Access on Kaggle and Hugging Face requires accepting Google's license consent, and the terms include a prohibited-use policy; Google described the terms accompanying the release as offering more flexibility for commercial use and distribution than earlier restrictions.[1][3] CodeGemma predates Google's later shift of some Gemma-family weights toward more permissive licensing, so its conditions follow the original Gemma framework.