AI watermarking is the practice of embedding detectable signals into content generated by artificial intelligence systems, enabling the identification and verification of AI-generated text, images, audio, and video. These signals, or watermarks, are designed to be imperceptible to human observers but reliably detectable by specialized algorithms. As generative AI systems become capable of producing content that is indistinguishable from human-created work, watermarking has emerged as a key technical approach for maintaining transparency, preventing misuse, and supporting content authenticity. AI watermarking is now the subject of active regulation, with the EU AI Act mandating technical marking of AI-generated content and the United States pursuing watermarking standards through executive action and the National Institute of Standards and Technology (NIST) [1][2].
The need for AI watermarking arises from the convergence of several trends. First, the quality of AI-generated content has improved to the point where humans often cannot distinguish it from authentic content. Large language models such as GPT-4 and Claude produce text that reads naturally, while image generators like Stable Diffusion, DALL-E, and Midjourney create photorealistic images. This capability creates risks for misinformation, fraud, impersonation, and the erosion of trust in digital media.
Second, the volume of AI-generated content on the internet is growing rapidly. Estimates suggest that a significant and increasing fraction of online text and images is now machine-generated. Without reliable identification mechanisms, it becomes difficult for individuals, platforms, and regulators to distinguish authentic content from synthetic content.
Third, the rise of deepfakes (AI-generated videos and audio that convincingly depict real people saying or doing things they never did) has created urgent demand for technical solutions that can identify manipulated media. High-profile incidents involving deepfake videos of politicians, celebrities, and business executives have demonstrated the potential for AI-generated content to cause real-world harm.
These pressures have driven research into watermarking techniques that can be applied at the point of content generation, creating an embedded provenance signal that persists through subsequent distribution, editing, and sharing.
AI watermarking techniques vary significantly depending on the type of content being watermarked. Each modality presents distinct technical challenges and opportunities.
Text watermarking is the process of embedding a detectable statistical signal into text generated by a language model. Because text is a discrete, symbolic medium (consisting of tokens drawn from a vocabulary), watermarking text requires fundamentally different approaches than watermarking continuous media like images or audio.
The core challenge is that any given sentence can be rephrased in many equivalent ways, making text watermarks vulnerable to paraphrasing. Additionally, short text passages contain less information and are therefore harder to watermark reliably than long ones.
Image watermarking embeds signals in the pixel data of AI-generated images. Because images are high-dimensional continuous signals, there is substantial capacity for embedding watermark information without visibly affecting the image. Image watermarks can be designed to survive common transformations such as cropping, resizing, compression, and color adjustment.
Audio watermarking embeds inaudible signals in the waveform of AI-generated audio, including synthetic speech and AI-generated music. Video watermarking extends image watermarking to the temporal dimension, embedding signals across video frames. Both modalities must contend with lossy compression (such as MP3 for audio or H.264 for video) that can destroy weakly embedded signals.
| Content Type | Watermark Medium | Key Challenge | Robustness Requirements |
|---|---|---|---|
| Text | Statistical bias in token selection | Paraphrasing and rewriting | Survive synonym substitution, back-translation |
| Image | Pixel-level perturbations | Visual imperceptibility | Survive cropping, compression, filtering |
| Audio | Frequency-domain embedding | Auditory imperceptibility | Survive compression, noise, speed changes |
| Video | Frame-level pixel perturbations | Temporal consistency | Survive re-encoding, frame rate changes |
The most influential text watermarking approach was introduced by John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein in their 2023 paper "A Watermark for Large Language Models," presented at ICML 2023 [3]. This method, commonly called the "green list" method, has become a foundational reference for subsequent text watermarking research.
The method operates during the text generation process, modifying how the language model selects tokens. Before each token is generated, the algorithm uses a hash function (seeded by the preceding token or tokens) to partition the model's vocabulary into two sets: a "green list" and a "red list." The green list typically contains a fraction gamma (a tunable parameter, often set to 0.5) of the vocabulary.
During generation, the algorithm adds a constant delta to the logits (pre-softmax scores) of all green-list tokens. This "soft" promotion makes green-list tokens slightly more likely to be selected without completely excluding red-list tokens. The result is that watermarked text contains a statistically higher proportion of green-list tokens than would occur by chance, creating a detectable signal.
To detect the watermark, the detector uses the same hash function and the same secret key to determine which tokens in a given text are green-list tokens. It then calculates the proportion of green-list tokens and computes a z-score measuring how far this proportion deviates from the expected proportion under the null hypothesis (no watermark). If the z-score exceeds a predetermined threshold, the text is classified as watermarked.
The detection algorithm does not require access to the language model itself, its parameters, or its API. It only requires knowledge of the secret key used to generate the green/red list partition. This is a significant practical advantage, as it allows third-party detection without the cooperation of the model provider.
The green list method depends on two primary hyperparameters [3]:
The fundamental tradeoff in text watermarking is between watermark strength (detectability) and text quality. A stronger watermark is easier to detect but may introduce perceptible artifacts such as unusual word choices, repetitive phrasing, or subtle shifts in style. Kirchenbauer et al. demonstrated that with appropriate hyperparameter settings, the watermark could be embedded with negligible impact on text quality as measured by perplexity and human evaluation [3].
The authors derived an information-theoretic framework for analyzing the sensitivity of the watermark. The detectability of the watermark depends on the length of the text (longer texts are easier to watermark), the entropy of the language model's output distribution (higher entropy provides more room for watermark embedding), and the hyperparameter settings. For text generated with high certainty (low entropy), such as factual statements or code, the watermark is weaker because the model has less freedom to choose green-list tokens without degrading output quality.
| Parameter | Effect on Watermark Strength | Effect on Text Quality | Typical Setting |
|---|---|---|---|
| Gamma (green list fraction) | Lower gamma = stronger signal | Lower gamma = more constrained | 0.25 to 0.5 |
| Delta (logit boost) | Higher delta = stronger signal | Higher delta = less natural text | 1.0 to 2.0 |
| Text length | Longer = more detectable | No direct effect | 200+ tokens for reliable detection |
| Output entropy | Higher entropy = stronger signal | No direct effect | Depends on prompt |
Image watermarking for AI-generated content builds on decades of research in digital watermarking for copyright protection, but adapts these techniques to the specific requirements and opportunities of generative AI systems.
SynthID is an AI watermarking system developed by Google DeepMind that embeds imperceptible watermarks directly into AI-generated content at the moment of creation [4]. Originally launched for images generated by Google's Imagen model, SynthID has since been extended to cover text, audio, and video.
For images, SynthID embeds a watermark by modifying the image generation process itself, rather than post-processing the finished image. This integration into the generation pipeline allows the watermark to be more robust and less perceptible than post-hoc watermarking methods. The watermark is designed to survive common image transformations including cropping, resizing, application of filters, and lossy compression (such as JPEG encoding).
For text, SynthID adapts the token probability adjustment approach (similar in concept to the Kirchenbauer method) to embed watermarks during generation by large language models. Google has integrated SynthID text watermarking into its Gemini models.
For audio generated through Google's Lyria music generation model and NotebookLM's podcast feature, SynthID embeds inaudible watermarks that persist through common audio modifications such as noise addition, MP3 compression, and speed changes.
SynthID is deployed across Google's consumer-facing generative AI products. As of 2025, every image created through Google's primary generative AI services receives SynthID watermarks automatically. Google has also launched a SynthID Detector portal for journalists and media professionals to verify whether content contains SynthID watermarks [4].
Beyond SynthID, several other approaches to image watermarking have been applied to AI-generated content:
While watermarking embeds signals in the content itself, the Coalition for Content Provenance and Authenticity (C2PA) takes a complementary approach based on metadata and cryptographic signatures [5].
C2PA provides an open technical standard for attaching verifiable provenance information to digital content. When content is created or modified using a C2PA-enabled tool, a cryptographic manifest (called a Content Credential) is generated and attached to the file. This manifest contains:
The Content Credential is cryptographically signed, making it tamper-evident. If someone modifies the content or the attached metadata after signing, the signature verification will fail, indicating that alterations have been made.
When content is created from multiple sources (for example, an AI-generated image composited with a photograph), C2PA records each source as an "ingredient" with its own provenance information. This creates a tree of provenance, similar to a family tree, that can trace the content's history back to its original creation [5].
C2PA has gained broad industry support. Founding members include Adobe, Arm, Intel, Microsoft, and the BBC. Google has integrated C2PA with SynthID, so that AI-generated images from Google services receive both embedded watermarks and C2PA metadata [6]. The C2PA specification is on track to become an ISO international standard, and the W3C is examining it for adoption at the browser level, which would allow web browsers to display provenance information natively.
The primary limitation of C2PA and similar metadata-based standards is that metadata can be stripped from files during common operations such as taking screenshots, copying and pasting, uploading to social media platforms, or converting file formats. Unlike embedded watermarks, which survive in the content itself, metadata is external to the content and can be easily removed, either intentionally or inadvertently. This is why the current consensus favors a multilayered approach combining embedded watermarks with metadata-based provenance [7].
| Approach | Signal Location | Survives Screenshots? | Survives Re-encoding? | Tamper-Evident? | Requires Cooperation? |
|---|---|---|---|---|---|
| Embedded watermark (SynthID) | In the content (pixels, tokens) | Yes | Usually | No (signal may be degraded) | Only at generation time |
| Metadata (C2PA) | In file metadata | No | Depends on format | Yes (cryptographic signature) | At every step of the chain |
| Combined approach | Both | Partially | Partially | Yes | At generation; verified at each step |
Governments and regulatory bodies have begun mandating AI watermarking and content labeling, driven by concerns about misinformation, deepfakes, and the erosion of trust in digital media.
The EU AI Act, which entered into force in 2024, includes specific transparency requirements for AI-generated content under Article 50 [7]. These obligations become legally binding on August 2, 2026, and include:
In December 2025, the European Commission published the first draft of the Code of Practice on marking and labeling AI-generated content, which establishes technical standards for compliance [8]. The Code takes the position that no single marking technique is currently sufficient and proposes a multilayered approach combining:
A second draft was published in March 2026, with the final version expected by June 2026. Although signing the Code is voluntary, regulatory authorities have indicated they will use it as the de facto standard for assessing compliance [8].
On October 30, 2023, President Biden signed Executive Order 14110 on the "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence" [2]. The order defines watermarking as "the act of embedding information, which is typically difficult to remove, into outputs created by AI, including into outputs such as photos, videos, audio clips, or text, for the purposes of verifying the authenticity of the output or the identity or characteristics of its provenance, modifications, or conveyance."
The executive order directed the Department of Commerce to develop guidance for content authentication and watermarking, and required the Secretary of Commerce to submit a report identifying standards, tools, methods, and practices for authenticating content and tracking its provenance, labeling synthetic content, and detecting synthetic content. NIST subsequently published guidance document AI 100-4 on synthetic content identification [9].
The Trump administration, which took office in January 2025, revoked Executive Order 14110 in its early days. However, much of the technical work initiated under the order, including the NIST guidance on synthetic content, continues to influence industry practices and standards development.
China's Interim Measures for the Management of Generative AI Services, effective August 2023, require providers of generative AI to watermark AI-generated content. Several other jurisdictions, including Canada, the United Kingdom, and Australia, are developing or considering similar requirements.
| Jurisdiction | Regulation | Watermarking Requirement | Effective Date |
|---|---|---|---|
| European Union | EU AI Act, Article 50 | Machine-readable marking of AI-generated content | August 2, 2026 |
| United States | Executive Order 14110 (revoked) | Guidance for watermarking and provenance | Directed 2023; EO revoked Jan 2025 |
| United States | NIST AI 100-4 | Technical guidance on synthetic content | 2024 (ongoing) |
| China | Interim Measures for Generative AI | Watermarking of AI-generated content | August 15, 2023 |
Despite significant progress, AI watermarking faces several fundamental challenges that limit its current effectiveness.
The central tension in watermarking is between making the watermark strong enough to survive transformations (robustness) and keeping it invisible to human observers (imperceptibility). A stronger watermark is easier to detect but may visibly degrade content quality. A more subtle watermark preserves quality but may be destroyed by common operations like compression, cropping, or reformatting. This tradeoff was identified as the most active research area at the first ICLR Watermarking Workshop in 2025 [10].
Text watermarks are particularly vulnerable to paraphrasing attacks, where an adversary rewrites the watermarked text to remove the statistical signal while preserving the semantic content. Paraphrasing can be done manually, through machine translation (translating to another language and back), or by using another language model to rephrase the text.
The Self-Information Rewrite Attack (SIRA), published in 2025, demonstrated a particularly effective paraphrasing attack that achieves nearly 100% success rates against seven recent watermarking methods at a cost of approximately $0.88 per million tokens [11]. SIRA exploits the design of watermarking systems by using self-information (a measure of token predictability) to identify which tokens are likely watermark-carrying tokens and selectively rewriting those tokens.
Research has also shown that both model-based and score-based detection methods can exhibit near-zero true positive rates under strong paraphrasing and text-mixing attacks [12]. Statistical watermarks can be reverse-engineered and removed with approximately 85% success rates by analyzing patterns in the model's outputs.
Watermarking short text passages (tweets, comments, short messages) is inherently difficult because the statistical signal has fewer tokens to accumulate in. Reliable detection typically requires 200 or more tokens, making watermarking impractical for many common forms of communication.
Current text watermarking systems suffer from relatively high false positive rates, with some systems reporting rates of 15 to 20 percent [10]. A false positive occurs when human-written text is incorrectly identified as watermarked. High false positive rates undermine the utility of watermarking for content moderation, legal evidence, and academic integrity applications.
Watermarking can only be applied at the point of generation, by the model provider. Open-source models that users can run locally, such as Llama and Mistral, can be modified to remove watermarking code. This means that watermarking is primarily effective for models served through APIs by commercial providers, and does not cover the growing ecosystem of open-source generative AI.
Modern content creation workflows often involve multiple AI systems. An image might be generated by one model, edited by another, and have text added by a third. When content passes through multiple AI systems, attributing the final output to a specific model or determining which parts are AI-generated becomes significantly more complex. Multi-model attribution remains a largely unsolved problem [10].
Beyond paraphrasing, adversaries can attempt to remove watermarks through various technical means, including image processing operations (noise addition, re-encoding, style transfer), audio processing (resampling, filtering), and targeted attacks that specifically identify and destroy the watermark signal. A 2025 breakthrough found that multiple watermarks can coexist within the same media, leading to research on "watermark ensembling" that combines multiple techniques for enhanced resilience [10].
Several promising research directions are addressing the current limitations of AI watermarking.
Researchers have developed watermarking methods with provable robustness guarantees. Zhao et al. (2023) proposed a provably robust watermarking scheme for AI-generated text that provides mathematical guarantees about the watermark's survivability under certain classes of attacks [13]. More recent work at USENIX Security 2025 extended provable robustness to multi-bit watermarks that can encode richer information (such as model identity or generation timestamp) while maintaining formal security guarantees [14].
A line of research aims to create watermarks that introduce zero distortion to the output distribution. These methods achieve watermarking by modifying the random number generation used during sampling rather than modifying the output probabilities. In principle, distortion-free watermarks produce text that is statistically identical to unwatermarked text, avoiding any quality degradation. However, they tend to be less robust to attacks than distortion-based methods.
The discovery that multiple watermarks can coexist in the same piece of content has opened up the possibility of watermark ensembling, where several complementary watermarking techniques are applied simultaneously. If an adversary removes one watermark, others may survive. This approach improves overall robustness at the cost of increased complexity in both embedding and detection.
As AI systems increasingly generate multimodal content (text with images, video with audio narration), research is exploring watermarking approaches that work consistently across modalities and can verify the provenance of composite content.
AI watermarking is one of two main approaches to identifying AI-generated content. The other is post-hoc AI content detection, which attempts to classify content as human-generated or AI-generated based on statistical properties of the content itself, without any pre-embedded signal.
Post-hoc detectors (such as GPTZero, ZeroGPT, and OpenAI's classifier) analyze features like perplexity, burstiness, and statistical patterns to make their determination. However, these detectors have significant limitations: they produce high error rates (both false positives and false negatives), they are easily fooled by paraphrasing, and their accuracy degrades as AI-generated text improves in quality.
Watermarking has a fundamental advantage over post-hoc detection: because the signal is deliberately embedded, it can be made arbitrarily strong (subject to quality constraints) and its statistical properties are known exactly, allowing precise calculation of false positive and false negative rates. The disadvantage is that watermarking requires cooperation from the model provider and can only identify content from watermarked models.
The two approaches are complementary. Watermarking provides high-confidence identification for content from cooperating providers, while post-hoc detection provides a fallback for content from non-cooperating sources. Together, they form a more complete identification ecosystem than either approach alone.
AI watermarking has an important connection to the problem of model collapse. Model collapse occurs when generative AI models are trained on data that includes outputs from previous AI models, leading to progressive degradation of output quality and diversity. Watermarking can help mitigate model collapse by enabling training pipeline operators to identify and filter out AI-generated content from their training data, preserving the integrity of training datasets and ensuring that models continue to learn primarily from human-generated data [15].
As of early 2026, AI watermarking is at an inflection point between research and widespread deployment.
On the deployment front, Google's SynthID is the most broadly deployed watermarking system, integrated across Google's generative AI products for text, images, audio, and video. Other major AI companies, including OpenAI and Meta, have announced or deployed their own watermarking systems, though details of implementation vary. The C2PA standard has achieved broad industry adoption, with major technology companies, camera manufacturers, and media organizations implementing Content Credentials.
Regulatory pressure is intensifying. The EU AI Act's transparency requirements become enforceable in August 2026, creating a firm deadline for compliance. The European Commission's Code of Practice on AI content marking is being finalized, establishing technical standards that will likely influence global practice. While the US Executive Order was revoked, the technical work at NIST continues, and industry momentum toward watermarking has not reversed.
On the research front, the field is actively grappling with the limitations of current methods. The robustness-imperceptibility tradeoff remains the central challenge. Text watermarking is particularly vulnerable to paraphrasing attacks, and current false positive rates are too high for many applications. However, advances in provably robust watermarks, watermark ensembling, and integration with metadata-based provenance standards offer paths toward more reliable systems.
The first ICLR Watermarking Workshop, held in 2025, reflected the growing maturity and urgency of the field, bringing together researchers from cryptography, machine learning, signal processing, and policy to address the interdisciplinary challenges of AI content identification [10].
The fundamental question facing the field is whether watermarking can scale to meet the challenge of an internet increasingly filled with AI-generated content. Current systems work well within controlled environments (such as a single provider's ecosystem) but face significant challenges in the open, adversarial environment of the wider internet. Addressing this challenge will require continued advances in watermarking technology, broader industry adoption, regulatory enforcement, and the development of shared standards and infrastructure for content provenance.