General Data Protection Regulation (GDPR)
Last reviewed
Apr 30, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 · 5,343 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 · 5,343 words
Add missing citations, update stale details, or suggest a clearer explanation.
The General Data Protection Regulation (GDPR), formally Regulation (EU) 2016/679, is the European Union law governing how personal data of individuals in the EU and European Economic Area (EEA) is collected, stored, used, transferred, and deleted. Adopted by the European Parliament on 14 April 2016 and by the Council on 27 April 2016, it entered into force on 24 May 2016 and became directly applicable in all EU member states on 25 May 2018, replacing the 1995 Data Protection Directive (95/46/EC). Because it is a regulation rather than a directive, it does not require transposition into national law and has identical legal effect in every member state from the same date.
The GDPR has become the most consequential modern privacy statute, both because of its broad extraterritorial reach (it can apply to companies anywhere in the world if they target or monitor people in the EU) and because of its enforcement teeth: fines can reach 20 million euros or 4 percent of a company's total worldwide annual turnover from the preceding financial year, whichever is higher. For artificial intelligence and machine learning, GDPR matters enormously. Almost every meaningful AI system trained on data about real people, whether a recommender model, a credit-scoring algorithm, a facial recognition tool, or a large language model trained on web-scraped text, is at least partly governed by it. The regulation predates the modern generative AI era by several years, but its principles, lawful bases, and Article 22 protections against solely automated decisions are now the primary EU legal lens through which AI training data, model behaviour, and user rights are evaluated.
The European Commission proposed a comprehensive data protection reform in January 2012. After more than four years of negotiation, the European Parliament adopted the final text on 14 April 2016 and the Council formally adopted it on 27 April 2016. The regulation was published in the Official Journal of the European Union on 4 May 2016 (OJ L 119) and entered into force twenty days later. A two-year transition period allowed controllers and processors to prepare; the regulation became fully applicable on 25 May 2018.
GDPR is not the only relevant instrument. It sits alongside the Law Enforcement Directive (2016/680, adopted the same day, governing processing by police and criminal justice authorities), the ePrivacy Directive (2002/58/EC, covering electronic communications and cookies), and, since August 2024, the EU AI Act. The relationship with the AI Act is explicitly without prejudice to GDPR: AI Act Recital 10 confirms that nothing in the AI Act overrides existing data protection law, and both regimes apply in parallel to AI systems that process personal data.
Article 3 of the GDPR is unusually broad and is the main reason the regulation matters to companies headquartered far outside Europe. There are three triggers:
The second trigger is what catches non-EU AI companies. The Italian, French, Greek, and other supervisory authorities have used Article 3(2) to assert jurisdiction over US-based facial recognition vendor Clearview AI even though it has no EU presence, on the grounds that scraping and analysing images of EU residents constitutes monitoring of behaviour in the Union.
Article 5 sets out seven principles that every processing of personal data must satisfy. Failure to comply with any of them sits in the higher fine tier. These principles are also where most AI-specific compliance friction shows up.
| Principle | Article 5 reference | Plain meaning | Why it bites for AI |
|---|---|---|---|
| Lawfulness, fairness, transparency | 5(1)(a) | Processing must have a lawful basis, must not be deceptive, and must be explained to data subjects. | Hard to satisfy for web-scraped training data where data subjects were never told. |
| Purpose limitation | 5(1)(b) | Data collected for one purpose cannot be repurposed in a way that is incompatible. | Reusing customer support transcripts to train a foundation model is a classic compatibility problem. |
| Data minimisation | 5(1)(c) | Process only what is adequate, relevant, and limited to what is necessary. | Foundation model training tends to vacuum up far more data than any narrow purpose justifies. |
| Accuracy | 5(1)(d) | Inaccurate data must be erased or rectified without delay. | Generative models that hallucinate facts about identifiable people raise direct accuracy issues. |
| Storage limitation | 5(1)(e) | Data must not be kept in identifiable form longer than necessary. | Model weights that memorise training examples can effectively store data indefinitely. |
| Integrity and confidentiality | 5(1)(f) | Appropriate security measures, including against unauthorised access. | Covers prompt-injection leaks, model extraction attacks, and breaches of inference logs. |
| Accountability | 5(2) | The controller must be able to demonstrate compliance with the other six. | Documentation, DPIAs, and audit trails are not optional. |
Accountability is structural. It forces the controller not just to comply but to be able to prove compliance through records, impact assessments, and governance.
No processing of personal data is lawful under the GDPR unless it relies on at least one of the six lawful bases in Article 6(1):
For AI training, the live debate is between consent and legitimate interests. Consent is hard at scale: it must be freely given, specific, informed, and unambiguous, and the data subject must be able to withdraw it at any time. For models trained on billions of web pages, individual consent is essentially impossible to collect. That pushes companies toward legitimate interests, which requires a three-step balancing test (legitimate purpose, necessity, balancing against the data subject's rights and reasonable expectations). The CNIL's 2024 and 2025 recommendations and the EDPB's Opinion 28/2024 lay out detailed expectations for how this balancing should be documented for AI development.
Article 9 separately prohibits processing of special categories of data unless one of ten narrow exceptions applies. Special categories include data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data processed for the purpose of uniquely identifying a person, health data, and data concerning sex life or sexual orientation. For AI, the biometric and health categories are the sharpest constraint: facial recognition, voice biometrics, and clinical AI all need an Article 9 condition (typically explicit consent or a substantial public interest basis) on top of an Article 6 lawful basis.
GDPR distinguishes between four key roles, each with different obligations.
| Role | Definition | Typical AI example |
|---|---|---|
| Data controller | The natural or legal person that determines the purposes and means of processing. | A bank deciding to use an AI credit-scoring model. |
| Data processor | A party that processes personal data on behalf of the controller. | A cloud vendor running the bank's model. |
| Data subject | An identified or identifiable natural person. | The applicant whose data is scored. |
| Data Protection Officer (DPO) | A designated expert overseeing compliance, required for some controllers and processors. | An in-house officer or external service contractor. |
Articles 37, 38, and 39 govern the DPO. Designation is mandatory for public authorities, for controllers whose core activities require regular and systematic monitoring of data subjects on a large scale, and for those whose core activities consist of large-scale processing of special categories or criminal convictions data. Many providers of large AI systems trip at least one of those triggers. The DPO must be selected on professional qualities and expert knowledge of data protection, must be involved in all data protection issues in a timely manner, must report to the highest level of management, and must not receive instructions on how to perform the tasks. Failure to appoint a required DPO is a lower-tier infringement (up to 10 million euros or 2 percent of global turnover).
Chapter III of the GDPR grants individuals a series of rights they can exercise against any controller. The headline rights for AI are listed below.
| Right | Article | What it gives the data subject |
|---|---|---|
| Right of access | 15 | Confirmation of whether their data is being processed and a copy of that data, plus information about purposes, recipients, retention, and the existence of automated decision-making. |
| Right to rectification | 16 | Correction of inaccurate personal data without undue delay. |
| Right to erasure (right to be forgotten) | 17 | Deletion of personal data in defined circumstances. |
| Right to restriction of processing | 18 | A pause on processing while accuracy or lawfulness is contested. |
| Right to data portability | 20 | Receipt of their data in a structured, commonly used, machine-readable format and the right to transmit it to another controller. |
| Right to object | 21 | Objection on grounds related to the data subject's particular situation, especially against direct marketing or processing based on legitimate interests. |
| Rights related to automated individual decision-making | 22 | Protection against being subject to a decision based solely on automated processing that produces legal or similarly significant effects. |
The 15, 17, 20, and 22 cluster is the source of most AI-specific compliance challenges. A right of access request to a generative model provider can include not just stored chat logs but also any training material that identifies the requester. A right to erasure request raises the unresolved technical question of whether deleting an example from a training set requires retraining the resulting model. The right to portability assumes data is structured; it sits awkwardly with vector embeddings and inferred profiles.
Article 22 is the single most discussed GDPR provision in the AI literature. Article 22(1) gives data subjects the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning them or similarly significantly affects them. Article 22(2) lists three exceptions: when the decision is necessary for entering into or performing a contract, when authorised by EU or member state law with suitable safeguards, or when based on the data subject's explicit consent. Even where an exception applies, Article 22(3) requires "suitable measures to safeguard the data subject's rights and freedoms and legitimate interests", including at minimum the right to obtain human intervention, to express a point of view, and to contest the decision. Article 22(4) effectively bars solely automated decisions on special-category data unless explicit consent or a substantial public interest basis exists, with safeguards.
Articles 13(2)(f), 14(2)(g), and 15(1)(h) require controllers carrying out automated decision-making within the meaning of Article 22 to provide "meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject." This phrase has been the subject of an extended academic debate.
In 2017, Sandra Wachter, Brent Mittelstadt, and Luciano Floridi argued in International Data Privacy Law ("Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation") that the GDPR creates only a limited right to be informed about the system's general logic, not a substantive right to a case-specific explanation of why a particular decision was reached. Andrew Selbst and Julia Powles responded later that year in the same journal ("Meaningful Information and the Right to Explanation"), arguing that the right to explanation does exist as a matter of purposive interpretation: "meaningful information about the logic involved" must be read functionally to enable data subjects to actually exercise their other rights, and constraining it to ex ante system descriptions defeats the purpose. The debate is still unresolved at the level of doctrine, although national supervisory authorities and courts have generally pushed in the Selbst and Powles direction in practice.
For AI builders, the practical effect is that any high-impact, fully automated AI decision (loan approval, hiring screen, fraud denial, benefits eligibility) needs a documented human review pathway, advance disclosure of how the system works in a way the data subject can understand, and a way for the data subject to contest a specific outcome.
Article 35 requires the controller to carry out a Data Protection Impact Assessment (DPIA) before processing where a type of processing, in particular using new technologies, is likely to result in a high risk to the rights and freedoms of natural persons. Article 35(3) lists three cases where a DPIA is always required: systematic and extensive evaluation of personal aspects based on automated processing (including profiling) on which decisions producing legal or similarly significant effects are based, large-scale processing of special categories or criminal data, and systematic monitoring of publicly accessible areas on a large scale.
The European Data Protection Board and national supervisory authorities have published lists of operations that always need a DPIA. Both the ICO (UK) and CNIL (France) flag innovative use of new technologies (with AI named explicitly) as a high-risk indicator. The CNIL's published criteria treat the creation of a training dataset for an AI system as itself capable of triggering high risk; combined with profiling or automated decisions, a DPIA is essentially mandatory for any meaningful AI deployment that touches personal data.
The DPIA must contain a description of the processing and its purposes, an assessment of necessity and proportionality, an assessment of risks to data subjects, and the measures envisaged to address those risks. Where the residual risk remains high after mitigation, the controller must consult the supervisory authority before processing (Article 36).
Article 25(1) requires controllers, both at the time of determining the means of processing and at the time of the processing itself, to implement appropriate technical and organisational measures designed to implement data protection principles in an effective manner. Article 25(2) requires that, by default, only the personal data necessary for each specific purpose be processed.
For AI systems, this article is where data minimisation, pseudonymisation, federated learning, differential privacy, and on-device inference become legally relevant rather than just desirable. The EDPB's Guidelines 4/2019 on Article 25 set out detailed expectations: the controller must consider the state of the art, the cost of implementation, the nature, scope, context and purposes of processing, and the risks of varying likelihood and severity. Picking a more privacy-preserving training architecture when one is technically feasible and not economically prohibitive is not optional in the EDPB's view.
Chapter V of the GDPR restricts transfers of personal data outside the EEA. Permitted mechanisms include adequacy decisions (the European Commission deciding that a third country provides essentially equivalent protection), Standard Contractual Clauses (SCCs), Binding Corporate Rules, certifications, codes of conduct, and a narrow set of derogations.
On 16 July 2020 the Court of Justice of the European Union handed down its judgment in Case C-311/18, Data Protection Commissioner v Facebook Ireland and Maximillian Schrems ("Schrems II"). The Court invalidated the EU-US Privacy Shield adequacy decision because US surveillance laws (notably FISA Section 702 and Executive Order 12333) did not provide essentially equivalent protection or effective judicial redress for EU data subjects. SCCs survived but with conditions: the data exporter must perform a transfer impact assessment, must consider the law and practice of the destination country, and must implement supplementary measures (technical, contractual, or organisational) where the destination's law fails to provide an essentially equivalent level of protection.
The Irish Data Protection Commission applied that reasoning to Meta in its decision of 12 May 2023, finding that Meta's continued bulk transfers of Facebook user data to the US on the basis of the 2021 SCCs (with supplementary measures) did not address the surveillance risks identified in Schrems II. The DPC, following an EDPB binding decision under Article 65, imposed a 1.2 billion euro administrative fine, the largest GDPR fine to date, and ordered Meta to suspend future transfers. The EU-US Data Privacy Framework adopted by the Commission on 10 July 2023 has since restored a transfer route for participating US companies, but it remains under legal challenge.
The practical consequence for AI is that any cross-border training pipeline, any third-country cloud inference, and any vendor relationship that ships logs to the United States needs a transfer mechanism, a transfer impact assessment, and contractually backed supplementary measures.
GDPR is enforced by national supervisory authorities (one or more per member state), coordinated through the European Data Protection Board (EDPB). Cross-border cases follow a one-stop-shop mechanism led by the lead supervisory authority (typically the authority of the controller's main establishment) with consultation of concerned authorities. Where lead and concerned authorities disagree, Article 65 allows the EDPB to issue a binding decision.
Fines under Article 83 sit in two tiers. Lower tier infringements (such as failures around DPO designation, records of processing, or breach notification) cap at 10 million euros or 2 percent of total worldwide annual turnover, whichever is higher. Higher tier infringements (basic principles of Article 5, lawful bases of Articles 6 and 9, data subject rights, transfer rules) cap at 20 million euros or 4 percent of total worldwide annual turnover, whichever is higher. The Court of Justice of the European Union has confirmed that "undertaking" for fine calculation purposes follows the EU competition law concept, meaning the turnover of the entire corporate group can be used.
| Authority and date | Target | Headline finding | Sanction |
|---|---|---|---|
| CNIL, 21 January 2019 | Google LLC | Lack of transparency, inadequate information, and lack of valid consent for ads personalisation on Android. | 50 million euros, upheld by the Conseil d'État on 19 June 2020. |
| Italian Garante, 9 March 2022 | Clearview AI | Unlawful processing of biometric data of Italian residents through facial-image scraping; no Article 6 or Article 9 basis. | 20 million euros and order to delete Italian residents' data. |
| Hellenic DPA, 13 July 2022 | Clearview AI | Unlawful biometric processing of Greek residents. | 20 million euros. |
| French CNIL, 17 October 2022 | Clearview AI | Same pattern: facial-image scraping without legal basis; refusal to comply. | 20 million euros, with a further 5.2 million euro penalty in May 2023 for non-compliance. |
| Italian Garante, 30 March 2023 | OpenAI (ChatGPT) | Provisional emergency order to stop processing personal data of Italian users pending investigation, citing absent legal basis for training, transparency failures, no age verification, and an undeclared March 2023 data breach. | Service paused in Italy, restored in late April 2023 after remediation. |
| Irish DPC, 12 May 2023 | Meta Platforms Ireland | Continued transfers of Facebook user data to the US on SCCs without adequate supplementary measures, in breach of Article 46(1). | 1.2 billion euros and suspension order. |
| Italian Garante, 20 December 2024 | OpenAI (ChatGPT) | Conclusion of the 2023 investigation: no legal basis for training-data processing, transparency violations, breach notification failures, no proper age verification. | 15 million euros plus a six-month public information campaign across Italian media. |
Additional Clearview decisions extended the same pattern: the UK ICO issued a fine in May 2022 (originally 7.5 million pounds, later overturned by the First-Tier Tribunal in October 2023 on jurisdictional grounds) and the Dutch Autoriteit Persoonsgegevens issued a 30.5 million euro fine in September 2024. Clearview has so far refused to pay most of these penalties, which has highlighted the practical limits of GDPR enforcement against non-EU companies with no European assets.
The EDPB created a ChatGPT Taskforce in April 2023 and published its preliminary report on 23 May 2024. The report sets out the Board's working position on training-data lawfulness, web scraping, accuracy, fairness, transparency, and data subject rights for generative AI. It treats accuracy under Article 5(1)(d) as a hard constraint that controllers cannot pass off to end users with disclaimers, and it requires that responsibility for compliance not be contractually shifted to data subjects.
The EDPB followed up with Opinion 28/2024, adopted on 17 December 2024 in response to a referral from the Irish DPC under Article 64(2). The opinion addressed four questions: when an AI model can be considered anonymous, how legitimate interests should be assessed for AI training, what data subjects can reasonably expect, and what happens when a model is trained on unlawfully processed data. The headline conclusions: an AI model trained on personal data is not automatically anonymous (case-by-case assessment required, taking memorisation risk and extraction risk into account); legitimate interests can be a viable basis for training but requires the full three-step test and rigorous documentation; unlawful upstream processing can taint downstream deployment unless the model has been demonstrably anonymised first.
The French CNIL has been the most active national authority on AI-specific GDPR guidance. Its 2024 and 2025 publications include recommendations on the development phase, on legitimate interest as a basis for training, on annotation, on informing data subjects, and on facilitating individual rights. The CNIL has also published a self-assessment guide for AI systems and announced further sectoral guidance through 2025.
The UK Information Commissioner's Office (operating under the UK GDPR after Brexit) maintains an extended guidance suite on "Explaining decisions made with AI" (jointly with the Alan Turing Institute) and on AI and data protection more broadly. Although the UK regime is no longer formally the EU GDPR, the substantive obligations remain closely aligned, and the ICO's analyses are routinely cited across Europe.
GDPR compliance for an end-to-end machine learning pipeline runs into a handful of recurring problems that are still not fully solved.
Training data lawfulness is the first. Web scraping at the scale required for foundation models almost always sweeps up identifiable personal data. Obtaining consent from each data subject is impossible. Legitimate interests can work, but the three-step test demands a documented purpose, a necessity argument, and a balancing exercise that takes account of the reasonable expectations of people whose data was published online for unrelated reasons. The EDPB's December 2024 opinion is sceptical of blanket legitimate interest claims and expects detailed mitigations.
Anonymisation versus pseudonymisation is the next pain point. Recital 26 puts truly anonymous data outside the GDPR's scope. Pseudonymous data (where a key still allows reidentification) remains personal data. Most "anonymisation" performed by ML practitioners is in legal terms only pseudonymisation, especially where the dataset is large, sparse, or combinable with other datasets. The k-anonymity, l-diversity, and t-closeness literature gives rough technical anchors but no legal safe harbour.
Model memorisation and regurgitation is now a documented risk. Empirical studies have shown that large language models can reproduce verbatim training examples, including names, addresses, code with embedded secrets, and personally identifying snippets. If a trained model can output personal data on demand, the model itself may be storing personal data within the meaning of Article 4(1), with consequences for storage limitation, data subject access, and erasure rights. The EDPB Opinion 28/2024 explicitly puts this question on the table.
The right to be forgotten applied to trained models is the hardest open problem. Article 17 gives data subjects a right to erasure in defined circumstances. Removing a row from a training database is straightforward; removing the influence of that row from a model that has already been trained is not. Full retraining is expensive and slow. Approximate machine unlearning (gradient-based methods, influence functions, and various certified unlearning algorithms) is an active research area, but as of 2025 there is no standard technical method that satisfies the legal requirement to erase. Some controllers have responded with output-side filtering, refusal lists, and post-training fine-tuning; supervisory authorities have not yet endorsed these as sufficient on their own.
Accuracy and hallucinations sit alongside erasure. Article 5(1)(d) requires personal data to be accurate, and Article 16 grants a right to rectification. A generative model that confidently states false information about an identifiable person sits awkwardly with both. The Italian Garante's 2024 OpenAI decision and the EDPB ChatGPT Taskforce report both flagged accuracy as a live risk, and the noyb-led complaint against OpenAI in April 2024 specifically targeted hallucinated personal data that the company could not correct.
The automated decision-making boundary requires careful drafting. Article 22 only triggers for decisions "based solely" on automated processing with legal or similarly significant effects. Practitioners often try to thread the needle by inserting nominal human review. Supervisory authorities and national courts have been clear that token rubber-stamping does not count: the human reviewer must have the authority and the means to actually reach a different conclusion.
Documentation and accountability close the loop. The accountability principle in Article 5(2) puts the burden of proof on the controller. For AI, that means data flow diagrams, training data provenance, DPIAs, transfer impact assessments, model cards, retention schedules, security tests, and audit trails. Where any of these are missing, the supervisory authority's working assumption is non-compliance.
The EU AI Act entered into force on 1 August 2024 and applies in stages: prohibitions on certain practices from 2 February 2025, governance and general-purpose AI obligations from 2 August 2025, the bulk of high-risk obligations from 2 August 2026, and the high-risk-as-product-component obligations from 2 August 2027. The AI Act is a product safety regulation that classifies AI systems by risk and imposes corresponding obligations. It does not displace GDPR; both apply in parallel to AI systems that process personal data.
Key overlaps:
National data protection authorities will continue to be the primary AI enforcers in many cases, especially for deployment decisions involving identifiable individuals. The AI Act creates new authorities (national notifying authorities, market surveillance authorities, the European AI Office) that will need to coordinate with the existing data protection regulators.
A number of GDPR-AI questions are still unresolved at the level of doctrine or practice:
These are not academic questions. They will be answered through enforcement decisions, EDPB opinions, and CJEU rulings over the rest of the decade, and the answers will shape how AI is built and deployed for markets that include the EU.