EU AI Act Article 10: Data Governance Requirements Explained

If your AI system is classified as high-risk under the EU AI Act, Article 10 is non-negotiable. It mandates specific data governance practices for training, validation, and testing datasets — and enforcement begins August 2, 2026. Fines for non-compliance can reach €35 million or 6% of global turnover.

Most teams assume "we have data lineage" equals compliance. It doesn't. Article 10 requires documented design choices, bias mitigation steps, and statistical properties of every dataset used to train or validate a high-risk system.

This guide walks through what Article 10 actually requires, which systems it applies to, and how to document compliance before the deadline.

What Article 10 Requires

Article 10 applies to high-risk AI systems listed in Annex III (e.g., HR screening tools, credit scoring, biometric identification, critical infrastructure management). It mandates that training, validation, and testing data meet specific quality criteria:

Requirement	What It Means	Documentation You Need
Relevant, representative, free of errors	Data must reflect the real-world use case without systematic gaps	Dataset composition report showing demographic/geographic coverage
Appropriate statistical properties	Data must have sufficient volume, variance, and balance for the task	Statistical summary: sample size, class distribution, variance metrics
Examination for biases	You must actively search for and document biases that could lead to discriminatory outcomes	Bias audit report with mitigation steps (e.g., resampling, fairness constraints)
Data governance and management	Formal processes for data collection, labeling, storage, and versioning	Data governance policy document + audit trail of dataset versions

Article 10 does not prescribe specific statistical tests or bias metrics. That's intentional — the regulation is technology-neutral. But it does require you to document your choices and explain why they're appropriate for your system's risk profile.

Who Article 10 Applies To

Article 10 obligations fall on providers of high-risk AI systems — the entity that develops the system or has it developed and places it on the EU market under their name or trademark.

If you're a deployer (an organization using a high-risk system developed by someone else), Article 10 compliance is the provider's responsibility. But you still need to verify that the provider has fulfilled it, especially if you're in a regulated sector (finance, healthcare, public services).

If you're a startup or scale-up building your own AI, you are the provider. Article 10 applies in full.

The Five Data Governance Practices Article 10 Demands

1. Dataset Design Choices Must Be Documented

Why did you choose this dataset? What real-world population or scenario does it represent? What are its known limitations?

Example: If you're building an AI-powered resume screener (Annex III, category 4), your training data must represent the actual applicant population you'll encounter. If your dataset is 80% male CVs from tech roles, and you deploy the system to screen healthcare applicants, Article 10 is violated.

What to document:

Dataset source and collection methodology
Geographic, demographic, and domain coverage
Known gaps or underrepresented groups
Rationale for dataset selection

2. Statistical Properties Must Be Appropriate

"Appropriate" means sufficient for the task's risk level and complexity. A high-risk credit scoring model needs more rigorous statistical validation than a low-risk content recommendation engine.

What to document:

Sample size and how it was determined
Class distribution (e.g., 60% approved loans, 40% rejected)
Feature variance and correlation analysis
Train/validation/test split ratios and methodology

If your dataset is imbalanced (e.g., 95% negative class), document why that reflects reality and what steps you took to prevent the model from ignoring the minority class (e.g., stratified sampling, class weighting, SMOTE).

3. Bias Examination Is Mandatory

Article 10(3) explicitly requires examining datasets for "possible biases" that could lead to discrimination based on protected characteristics (race, gender, age, disability, etc.).

This is not optional. You must actively search for bias, document what you found, and explain your mitigation strategy.

Practical steps:

Slice your dataset by protected attributes (if available) and measure performance disparities
Use fairness metrics (e.g., demographic parity, equalized odds, calibration) appropriate to your use case
Document any disparities found and the remediation steps taken (e.g., rebalancing, fairness constraints, post-processing)
If protected attributes are not in your dataset, document proxy analysis (e.g., ZIP code as a proxy for race in US credit data)

Example: A hiring AI trained on historical data may learn that "gaps in employment" correlate with rejection — but if women are more likely to have employment gaps due to parental leave, the model encodes gender bias. Article 10 requires you to detect and mitigate this.

4. Data Governance Processes Must Be Formalized

Article 10(4) requires "data governance and management practices" — not just good intentions, but documented processes.

Minimum documentation:

Data collection policy (who can add data, under what conditions)
Labeling guidelines and quality control (inter-annotator agreement scores, label audits)
Data versioning and lineage (which model version was trained on which dataset version)
Access controls and audit logs (who accessed training data, when, and why)

If you retrain your model on new data, you must repeat the Article 10 analysis for the updated dataset. One-time compliance is not sufficient.

5. Testing Data Must Be Separate and Representative

Article 10(5) requires that testing datasets be "appropriate, representative, free of errors and complete" — and separate from training data.

This is basic ML hygiene, but the EU AI Act makes it a legal requirement. If you evaluate your model on the same data you trained it on, you violate Article 10.

What to document:

How you ensured test data independence (e.g., temporal split, stratified holdout)
Why your test set represents real-world deployment conditions
Test set performance broken down by subgroups (to detect disparate impact)

Common Article 10 Compliance Gaps

Most teams building high-risk AI have some data governance practices. But few have the documentation Article 10 demands. Here are the most common gaps:

No bias examination documentation — teams run fairness metrics but don't document findings or mitigation steps
No dataset design rationale — teams use "whatever data we had" without documenting why it's appropriate
No versioning or lineage — teams retrain models but can't trace which dataset version produced which model version
No statistical justification — teams don't document why their sample size, class balance, or feature set is sufficient for the risk level
No formal governance policy — data practices exist informally but aren't written down or auditable

How to Document Article 10 Compliance

Article 10 compliance is proven through technical documentation (required under Article 11). At minimum, you need:

Dataset Specification Document — for each dataset (training, validation, test):
- Source, collection date, and methodology
- Size, structure, and statistical properties
- Known limitations and gaps
- Bias examination results and mitigation steps
Data Governance Policy — organization-wide:
- Data collection and labeling standards
- Versioning and lineage tracking
- Access controls and audit procedures
- Retraining and re-evaluation triggers
Model Card or Technical Documentation — per model:
- Which datasets were used (with version hashes)
- Why those datasets are appropriate for the use case
- Test set performance overall and by subgroup
- Residual risks and monitoring plan

These documents must be maintained and updated throughout the system's lifecycle. If you retrain, you update the documentation. If you discover a new bias, you document it and your response.

Article 10 and the August 2026 Deadline

Article 10 obligations become enforceable on August 2, 2026 for high-risk AI systems. If your system is already in production, you have until that date to bring your data governance into compliance.

If you're launching a new high-risk system after August 2, 2026, Article 10 compliance is required before you place it on the market.

The enforcement timeline is fixed. August 2, 2026 doesn't move. Fines for non-compliance start at €15 million or 3% of global turnover (for data governance violations specifically) and can escalate to €35 million or 6% for systemic non-compliance.

How Vigilia Helps with Article 10 Compliance

Vigilia's EU AI Act audit includes an Article 10 gap analysis as part of the high-risk system assessment. The report identifies:

Whether your system is high-risk (and therefore subject to Article 10)
Which data governance documentation is missing
Specific remediation steps to close Article 10 gaps
Estimated compliance effort and timeline

The audit takes 20 minutes and costs €499 — versus €5,000–€40,000 and 1–3 months for a traditional compliance audit.

Generate your Article 10 compliance report now: https://www.aivigilia.com

If you're not ready to purchase, try the free EU AI Act checker to see if your system is classified as high-risk: https://www.aivigilia.com

This article is for informational purposes only and does not constitute legal advice. Consult a qualified EU AI Act attorney for compliance guidance specific to your system.