AI model training requires enormous amounts of data, which include copyright-protected materials. The EU has introduced an exemption to the reproduction right, which permits the reproduction of works for the purposes of text and data mining (TDM).[1] What was not immediately obvious was that the TDM exemption applies to AI training, as the Digital Single Market (DSM) Directive does not mention AI. However, there no longer seems to be any dispute about its applicability.

To balance the interests of users and rightsholders, an opt-out mechanism has been introduced in the case of TDM conducted for purposes other than scientific research[2]. Rightsholders can prohibit the use of works for TDM purposes through explicit reservation of use in an appropriate manner, such as machine-readable means in the case of content made publicly available online.

The opt-out mechanism is implemented in various ways among member states. Some expressly require that rights be reserved by machine-readable means if a work is available online (e.g. Poland), while others, such as Italy, do not.

A recently published study commissioned by the EUIPO – The Development of Generative Artificial Intelligence from a Copyright Perspective[3] confirms that the implementation and efficiency of opt-out mechanisms poses challenges.

Examples of questions regarding opt-out with respect to online works include:

  1. does reservation of rights made in natural language (e.g. in terms and conditions on a website) always meet the machine-readable criterion?
  2. can collective management organisations reserve rights in a general manner concerning their entire repertoire?
  3. can a licensee reserve rights in a valid manner?

Furthermore, there is no single, uniformly adopted technical measure for reservation of rights. The most popular is Robot Exclusion Protocol (REP) (robots.txt) and the TDM Reservation Protocol. Since these measures have serious limitations, new are being developed.

There is no official repository of opt-outs exercised by rightsholders. The EC has ordered a study to assess the feasibility of a central registry of TDM opt-out expressed by rightsholders.[4]

According to art. 53 (1) (c) AI Act[5], developers of general-purpose AI models (GPAIM) are required to put in place a policy dealing with issues such as compliance with TDM opt-outs. A recently published third draft of the General-Purpose AI Code of Practice[6] requires developers of GPAIMs to employ web-crawlers that read and follow instructions issued in accordance with the REP (robots.txt), as specified in Internet Engineering Task Force (IETF) Request for Comments No. 9309[7]. Developers must also make their best efforts to comply with other appropriate machine-readable protocols to reserve rights, for example through asset-based or location-based metadata, which either result from a cross-industry standard-setting process or are state-of-the-art and widely adopted by rightsholders.

Practice is still developing. If rightsholders decide to opt out, they should carefully consider whether measures are effective. Respecting TDM opt-outs is a challenge for the developers of AI models, too.


[1] Art. 3 and 4 of Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market (DSM Directive) (OJ L 130, 17.5.2019, p. 92–125)

[2] Performed by research organisations and cultural heritage institutions.

[3] Available at  https://www.euipo.europa.eu/pl/publications/genai-from-a-copyright-perspective-2025

[4] https://op.europa.eu/en/web/public-procurement/procurement-details/-/procurement/8726813a-bd9b-4f58-8679-01c80f7a1abf

[5] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (AI Act) (OJ L 2024/1689, 12.7.2024).

[6] https://digital-strategy.ec.europa.eu/en/library/third-draft-general-purpose-ai-code-practice-published-written-independent-experts

[7] Zob.: https://datatracker.ietf.org/doc/rfc9309/