On Wednesday, the European Data Protection Board (EDPB) released an opinion that examines how developers of artificial intelligence (AI), including large language models (LLMs), can lawfully utilize personal data in compliance with the European Union’s privacy regulations. The EDPB plays a pivotal role in guiding the application of these laws and provides essential insights for regulatory enforcement.
The EDPB’s opinion addresses several key questions: Can AI models be classified as anonymous (which would exempt them from privacy regulations); Is it feasible to utilize a “legitimate interests” legal foundation for lawfully processing personal data during the development and deployment of AI models (which would eliminate the need for individual consent); and Can AI models developed with unlawfully processed data be deployed legally?
The question surrounding the appropriate legal basis for ensuring that AI models comply with the General Data Protection Regulation (GDPR) remains contentious. OpenAI’s ChatGPT has previously faced scrutiny in this regard, and non-compliance with privacy regulations could result in penalties amounting to 4% of global revenue or mandated alterations in AI tool operations.
Approximately a year ago, Italy’s data protection authority noted preliminary findings indicating that OpenAI’s chatbot violated the GDPR. Following that, additional complaints have been filed against the technology in countries like Poland and Austria, focusing on issues related to its lawful data processing operations, its propensity to generate false data, and its failure to correct inaccuracies regarding individuals.
The GDPR outlines specific criteria regarding lawful personal data processing and grants individuals various data access rights, such as requesting copies of their data, having their data deleted, and amending incorrect information. However, for AI chatbots that generate fabricated data or “hallucinate,” these requests can present substantial challenges.
While generative AI tools have encountered multiple GDPR-related complaints, enforcement actions have been notably limited thus far. EU data protection authorities are grappling with the challenges of applying longstanding data protection laws to a rapidly evolving technology that relies heavily on data for training purposes. The EDPB’s opinion aims to provide guidance to oversight bodies in their decision-making processes.
In a statement, Ireland’s Data Protection Commission (DPC)—which initiated the request for the EDPB’s insights on these issues and will oversee GDPR compliance for OpenAI following a recent legal transition—indicated that the EDPB’s opinion will promote “proactive, effective, and consistent regulation” of AI models across the EU.
Commissioner Dale Sunderland emphasized that the opinion would also assist the DPC in engaging with companies developing new AI frameworks before they launch in the EU market, as well as addressing the growing number of AI-related complaints that have been filed with the DPC.
The opinion provides guidance not only for regulators on approaching generative AI but also for developers concerning how privacy regulators might engage with critical issues like legality. However, the overarching message is that there will be no universal solution for the legal ambiguities they encounter.
Model Anonymity
Regarding model anonymity—defined by the Board as an AI model that is “very unlikely” to “directly or indirectly identify individuals” whose data contributed to its creation and that does not allow users to extract such data through queries—the opinion emphasizes the necessity of evaluating this “on a case-by-case basis.”
The document also compiles what the Board refers to as “a non-prescriptive and non-exhaustive list” of approaches through which model developers may demonstrate anonymity. This includes careful selection of training data sources to prevent or limit the gathering of personal data (potentially excluding “inappropriate” sources), data minimization efforts during the data preparation phase prior to training, employing robust “methodological choices” aimed at significantly reducing or removing re-identification risks, such as utilizing “regularization methods” to enhance model generalization and reduce overfitting, and implementing privacy-preserving techniques like differential privacy. Additional measures might also be enacted to reduce the likelihood of users accessing personal data via prompts.
These suggestions indicate that numerous design and developmental decisions made by AI developers could affect regulators’ assessments regarding the applicability of the GDPR to a specific model. Only genuinely anonymous data—where there is no chance of re-identification—falls outside the regulation’s scope; however, in the context of AI models, the criteria have been set at risks of identifying individuals or their data being “very unlikely.”
Prior to the EDPB’s opinion, there was ongoing debate within data protection authorities regarding AI model anonymity, including viewpoints suggesting that models cannot qualify as personal data. However, the Board clarified that AI model anonymity is not an automatic assumption, and case-by-case evaluations are required.
Legitimate Interest
The opinion also examines whether a legitimate interest legal basis can be employed for the development and deployment of AI. This is crucial as there are only a few legal foundations available under the GDPR, and many are unsuitable for AI, as OpenAI has already experienced with the Italian DPA’s enforcement.
The legitimate interest basis is likely to be favored by AI developers constructing models because it does not necessitate consent from every individual whose data is processed to create the technology. Given the vast amounts of data involved in training LLMs, it is evident that a consent-based legal basis would not be economically viable or scalable.
Again, the Board indicates that Data Protection Authorities (DPAs) will need to conduct assessments to determine whether legitimate interest applies as a legal basis for processing personal data to construct and implement AI models. This involves the standard three-step approach, requiring that watchdogs evaluate the purpose and necessity of the data processing (ensuring it is lawful and specific), and explore any less intrusive alternatives to achieve the desired outcome while conducting a balancing test to assess the impact on individual rights.
The EDPB’s opinion suggests that it may be feasible for AI models to satisfy all criteria for relying on a legitimate interest legal basis, pointing out that developing an AI model for a conversational agent service designed to aid users or deploying enhanced threat detection in an information system could fulfill the initial test (lawful purpose).
When assessing the second test (necessity), regulators must evaluate whether the processing actually serves the lawful purpose and if there are no less intrusive alternatives—carefully balancing the volume of personal data processed against the goal, considering the GDPR’s data minimization principle.
The third test (balancing individual rights) must “take into account the specific circumstances of each case,” according to the opinion. Extra consideration should be devoted to any potential risks to individuals’ fundamental rights that may arise during the development and implementation processes.
Part of the balancing assessment also obliges regulators to weigh the “reasonable expectations” of data subjects—meaning, whether individuals whose data is processed for AI could expect their information to be utilized in that manner. Key factors to consider include whether the data was publicly available, the source of the data, the context of its collection, any existing relationship between the individual and the processor, and prospective uses of the model.
In instances where the balancing test fails due to individuals’ interests outweighing those of processors, the Board suggests implementing mitigation measures to minimize the impact of the processing on individuals, which should be tailored to the “circumstances of the case” and the “characteristics of the AI model,” including its intended purpose.
Examples of mitigation strategies referenced in the opinion include technical measures (similar to those noted in the model anonymity section); pseudonymization methods (ensuring personal data cannot be re-identified); measures to conceal personal data or replace it with synthetic data in the training set; methods aiming to facilitate individuals’ ability to exercise their rights (such as opting out); and transparency practices.
Additionally, the opinion addresses measures for mitigating risks associated with web scraping, which the Board states presents “specific risks.”
Unlawfully Trained Models
The opinion also tackles the challenging issue of how regulators should manage AI models trained on data collected unlawfully, as mandated by the GDPR.
Once again, the Board suggests that regulators evaluate “the circumstances of each individual case”—indicating that responses from EU privacy watchdogs regarding AI developers who infringe this legal requirement will vary based on the specific situation.
However, the opinion seems to provide a potential escape route for AI models that may have been constructed on questionable legal grounds—such as those that scraped data indiscriminately—if developers take measures to ensure personal data is anonymized prior to deployment.
In such circumstances—provided that developers can prove that the ongoing operation of the model does not involve processing personal data—the Board asserts that the GDPR would not apply, stating: “Thus, the unlawfulness of the initial processing should not impact the subsequent operation of the model.”
Lukasz Olejnik, an independent advisor affiliated with KCL Institute for Artificial Intelligence and whose GDPR complaint against ChatGPT remains pending with Poland’s DPA for over a year, pointed out the implications of this aspect of the opinion. He cautioned that “great care must be exercised to prevent systematic misuse schemes.”
“This presents an intriguing possible shift in how data protection laws have been interpreted until now,” he shared with TechCrunch. “By concentrating solely on the end result (anonymization), the EDPB might inadvertently validate the indiscriminate scraping of web data without appropriate legal foundations, potentially undermining the GDPR’s fundamental principle that personal data must be processed lawfully from the point of collection to disposal.”
When asked about the overall impact he anticipates from the EDPB opinion on his complaint against ChatGPT, Olejnik noted: “The opinion does not constrain national DPAs. However, I am confident that PUODO [Poland’s DPA] will take it into account in their judgment,” while emphasizing that his case against OpenAI’s chatbot “extends beyond training and encompasses accountability and Privacy by Design.”
TechCrunch features a newsletter focused on AI! Subscribe here to receive it in your inbox every Wednesday.
Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence


