EU AI Act: The New Content Training Summary Template for GPAI Providers

By Christie Rae • 19 September 2025

What is the Content Training Summary Template?

The European Commission recently released an explanatory notice and template to help providers of general-purpose AI (GPAI) models summarise the content used to train their models. The template supports GPAI providers in meeting their obligations under Article 53 of the EU AI Act, making a summary about the content used for training of all GPAI models publicly available.

Crucially, it also represents another step towards building trust in AI by increasing transparency, in line with the objectives of the regulation.

While the summary of information about a GPAI model provided using the Template is publicly available, the Commission has accounted for the need to protect trade secrets and confidential business information. As such, the explanatory notice clarifies that the summary should be ‘generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law.’

Section One: General Information

The first section of the template includes general information about the GPAI provider and model, including provider contact information, versioned GPAI model name, model dependencies, and the date on which the model was placed on the Union market. Providers must detail the modalities present in the training data so far as they are identifiable, including:

Text

Image

Audio

Video

Other

Providers must detail training data size by selecting ranges within the estimated total data size for each modality. They also need to describe the types of content for each selected modality, for example:

Fiction text

Non-fiction text

Scientific text

Photography

Visual artworks

Infographics

Social media images

Musical compositions

Audiobooks

Private audio communication

Music videos

Films

TV programmes

Video games

Social media videos.

Finally, providers must share the latest date of data acquisition or collection for model training and any additional information about the collection of training data.

Section Two: Data Sources

The second, and largest, section of the template requires providers to detail specific sources of data used to train the GPAI model. Organisations should specify the modality or modalities of the content covered by the datasets concerned in each section, then answer specific questions for each type of data source.

This section classifies the term “dataset” as a single, pre-packaged collection of data; data that has been filtered and pre-processed from the same pre-packaged collection should not be considered a new dataset to be disclosed separately. If a dataset falls into more than one category, providers should select the most relevant category.

GPAI providers must provide details about the datasets used to train the model:

Publicly available datasets

Datasets compiled by a third party are made available publicly for free and are readily downloadable as a whole or in predefined chunks.

Private non-publicly available datasets obtained from third parties

Datasets commercially licensed by rightsholders or their representatives.

Private datasets obtained from other third parties.

Data crawled and scraped from online sources

Crawled, scraped data, or data otherwise compiled from online sources, excluding publicly available datasets already covered.

User data

User data collected by all services and products of the provider, not including data licensed by users based on commercial transactional agreements or customer data, to fine-tune models for specific purposes.

Synthetic AI-generated data

Data created for training the model on the outputs of another model, such as AI feedback through reinforcement learning, not including the use of AI models to clean or enrich data.

Other sources of data

Data that does not fall under any of the previous categories, e.g. data collected from offline sources, self-digitised media, datasets labelled by humans commissioned by the provider.

Section Three: Data Processing Aspects

The third section of the template focuses on the measures the provider has implemented to identify and comply with any reservations of rights under the text and data mining (TDM) exception or limitation set out in Article 4 of the Directive on Copyright in the Digital Single Market. These measures should also align with the provider’s copyright policy, as required by Article 53 of the EU AI Act.

This includes describing measures the provider has implemented before model training to respect reservations of rights from the TDM exception or limitation:

Measures implemented before and during data collection

Opt-out protocols and solutions honoured by the provider

Opt-out protocols and solutions honoured by third parties from which datasets have been obtained.

GPAI providers must provide a general description of the measures they have taken to avoid or remove illegal content under Union law from the training data. However, they aren’t required to disclose specific details about their internal business practices or trade secrets.

Finally, the template provides an optional section where providers can share any other relevant information about data processing measures taken before or after the training of the model.

Next Steps

For GPAI providers, it’s vital to review existing GPAI model documentation and processes. In preparation for using the template, organisations should ensure clear internal visibility on dataset sources, dataset modalities, sizes and content types, and existing data processing measures.

Implementing best practices, such as those outlined in the AI management standard ISO 42001 to build an ethical AI management system (AIMS), can also help to increase transparency, reduce AI risk, ensure clear documentation and build trust in an organisation and its AI models.

Christie Rae

Christie is a content marketing specialist at ISMS.online. With over seven years' experience, she aims to write informative, engaging content and has worked across a range of industries including cybersecurity, software as a service, tech and pharmaceuticals.

The Biggest AI Governance Challenges in 2026

This year’s Safer Internet Day theme, smart tech, safe choices – exploring the safe and responsible use of AI, stresses the importance of responsib...

Christie Rae

Cyber security 30 January 2026

Global Change Your Password Day: A Call to Action

February 1 marks Global Change Your Password Day. Established in 2012 to encourage awareness of good password management practices, it serves as an...

Christie Rae

Cyber security 29 December 2025

State of Information Security Report: 11 Key Statistics and Trends for the Legal Industry

The 2025 State of Information Security Report revealed the complex cyber challenges and opportunities that security leaders faced over the last 12 ...

Christie Rae