Use Case: EU AI Act Compliance

The Macro Scenario Layer for AI Act Compliance

The EU AI Act (Regulation 2024/1689) requires high-risk AI systems to demonstrate robustness across deployment environments, coverage of reasonably foreseeable risks, and assessment of systemic impact. For AI operating in economic contexts, this means testing under diverse macro conditions.

WorldSim provides the structured, reproducible macro scenario environments that Articles 9, 10, 15, and 55 require. Not the full compliance stack, but the essential macro testing layer that no other tool provides.

High-risk AI compliance deadline: 2 August 2026
The Compliance Gap

What the AI Act Requires vs What Tools Exist

Current AI compliance tooling covers bias testing, documentation, and model cards. The macro-scenario robustness layer is missing.

What the AI Act Requires

Art. 9 "Reasonably foreseeable risks" must be identified and tested, including scenarios the system hasn't encountered in production
Art. 10 Training and test data must reflect "the specific geographical, contextual setting" where the system will be deployed
Art. 15 Systems must be "resilient regarding errors, faults, or inconsistencies that may occur within the environment"
Art. 55 GPAI with systemic risk must assess impact on "financial or economic stability" with adversarial testing and scenario evaluation

The Missing Layer

Covered by existing tools: Individual bias testing, model cards, data documentation, adversarial ML attacks
Not covered: Structured macro-scenario environments for testing AI under recession, inflation, demographic shift, energy crisis, or fiscal stress
Not covered: Cross-country macro diversity for 27 EU member states with different economic conditions
Not covered: GPAI systemic economic risk assessment with structured, reproducible scenarios
WorldSim fills this gap.
The Framework

5 Ways WorldSim Supports AI Act Compliance

Each capability maps directly to a specific AI Act obligation. Together they form the macro scenario compliance layer.

1

Environmental Robustness Testing

Article 15

What the Act requires: AI systems must be resilient to errors, faults, or inconsistencies in the operating environment.

What WorldSim provides: Structured macro environments representing recession, stagflation, energy crisis, demographic shift, and other conditions your AI will encounter. Run your model against each environment and document whether performance stays within tolerance. 5,000+ simulated paths per scenario give you statistical confidence, not just one stress test.

2

GPAI Systemic Risk Assessment

Article 55

What the Act requires: Providers of GPAI with systemic risk must assess impact on "financial or economic stability" and perform adversarial testing.

What WorldSim provides: Structured scenarios showing how AI-driven decisions cascade through economic systems. Model what happens when AI displacement hits 35% with low R&D investment. Simulate the structural consequences of automated financial decision-making at scale. The coupling rules show exactly how macro variables interact, providing the causal framework that systemic risk assessment demands.

3

Scenario Coverage for Risk Management

Article 9

What the Act requires: Identify and test against "reasonably foreseeable risks," with testing against "prior defined metrics and probabilistic thresholds."

What WorldSim provides: A structured library of macro scenarios that defines "reasonably foreseeable" for economic contexts. Recession, energy shock, inflation spike, demographic pressure, fiscal crisis: each is a named, reproducible scenario configuration. For conformity assessment, you can document exactly which scenarios were tested, with what parameters, producing what distributional outcomes. The audit trail is built in.

4

Representativeness Across Deployments

Article 10

What the Act requires: Test data must reflect "the specific geographical, contextual, behavioural, or functional setting" where the system will be used.

What WorldSim provides: An AI deployed across 27 EU member states faces fundamentally different economic conditions in each. Germany's electricity costs are 2.5x Poland's. Greece's debt is 3x Ireland's. Romania's demographics are opposite to Sweden's. WorldSim provides macro environments for all 195 countries with the same 26 KPIs, enabling systematic testing across the full diversity of deployment contexts.

5

Continuous Monitoring Benchmarks

Article 9 (ongoing)

What the Act requires: Risk management must be "continuous and iterative" throughout the AI system lifecycle, not just pre-deployment.

What WorldSim provides: A macro benchmarking layer for ongoing monitoring. Your model was validated under specific macro conditions (e.g. TI 0.52, inflation 2.3%, unemployment 3.4%). When real-world conditions shift significantly (TI drops to 0.38, inflation hits 6%), WorldSim flags that the deployment environment has moved outside the validated envelope. This triggers revalidation before the model degrades in production.

Practical Applications

How It Works: Step-by-Step for Each AI Type

WorldSim doesn't feed directly into your model. It defines the macro environment that determines the statistical properties of your model's input population. Here's exactly how the connection works for each AI type.

Annex III, Category 5

Credit Scoring Model

The model: Takes (salary, savings, age, employment_status, loan_amount, credit_history) as inputs. Outputs P(default). Trained on 2020-2024 applicant data.
The problem: The model was trained during a benign period (unemployment 3.5%, inflation 2%, stable housing). Article 15 requires it to be robust under different economic conditions. But the bank can't wait for a recession to test it.

How WorldSim connects:

1 WorldSim generates macro scenarios: Recession (unemployment 8%, GDP -3%, inflation 6%), housing crash (price-to-income -20%), stagflation (inflation 8%, rates 5%). Each scenario has a Trajectory Index and full distributional output.
2 The bank maps macro to micro: Under "recession" conditions, the bank adjusts its test population: higher job loss rates in the applicant pool, lower savings, more variable income, higher default rates. This is exactly what EBA stress tests already require banks to do (project PD and LGD under macro scenarios).
3 Test the credit model against each environment: Run the credit scoring model on the recession-conditioned test set. Measure accuracy, false positive rate, and fairness metrics. Does the model still perform within tolerance?
4 Document for conformity assessment: "Credit model tested under WorldSim scenario #2047 (Germany, recession, TI 0.35). Accuracy degraded from 92% to 81%. Within tolerance threshold of 75%. Run group ID: abc-123, seed: 42. Reproducible."

WorldSim's role: defines the structured macro environment. The bank's existing stress testing infrastructure handles the macro-to-micro translation. WorldSim adds value by providing structurally coherent scenarios (not just "unemployment +5pp" in isolation, but the full coupled cascade), cross-country coverage (27 EU markets), and distributional output (P10/P50/P90, not just one stress point).

Scenario configuration
WorldSim scenario configuration: inflation, electricity, interest rate tilts for stress testing

The bank configures macro tilts (inflation +2.1σ, electricity +1.3σ, rates -1.4σ) to define each stress scenario. Each tilt translates to real-world values shown in the sidebar.

Annex III, Category 4

AI Hiring / Recruitment Tool

The model: Takes (CV text, qualifications, experience_years, skills, location) as inputs. Outputs a candidate ranking score. Trained on successful hires from 2019-2024.
The problem: The model learned what a "good hire" looks like during a specific economic period. Article 15 requires robustness across different labour market conditions. When unemployment doubles, the applicant pool changes fundamentally: more experienced candidates apply for junior roles, career-changers increase, and salary expectations shift.

How WorldSim connects (for the AI vendor or data science team building the tool):

1 WorldSim defines labour market environments: Tight market (unemployment 3%, net migration +10/1000), recession (unemployment 10%, AI displacement 35%), demographic shift (65+ share 30%, fertility 1.1). Each is a named, reproducible scenario.
2 The vendor's data science team segments test data by macro conditions: Partition existing hiring data by the economic conditions that prevailed when each hire was made. Applicants hired during 2009-2010 (recession) have different characteristics than those from 2022-2024 (tight market): more career-changers, longer gaps, different salary expectations.
3 Test the hiring model under each environment: Does the ranking algorithm produce different outcomes when the applicant pool shifts? Does it unfairly penalise candidates with employment gaps (common during recessions)? Does it work equally well when deployed in Poland (unemployment 2.8%) vs Spain (unemployment 11%)?
4 Document which conditions were tested: "Hiring model validated under 4 macro environments (WorldSim scenarios: tight, moderate, recession, structural shift). Performance within tolerance for all except recession, where employment-gap bias was identified and mitigated."
Article 55

GPAI / Foundation Model (Systemic Risk)

The model: A large language model that generates financial analysis, investment advice, or economic commentary. Used by millions of users. Exceeds 10^25 FLOPs.
The problem: Article 55 requires assessment of risks to "financial or economic stability." If millions of users receive the same AI-generated investment advice simultaneously, it could amplify market movements. The provider must assess this systemic risk.

How WorldSim connects:

1 WorldSim models the macro consequences: "What happens structurally if AI displacement reaches 35% with low R&D investment?" The coupling rules cascade this through unemployment (+1.2σ), fiscal pressure, crime, migration, and GDP. This is the systemic impact assessment Article 55 demands.
2 Test the LLM's economic outputs: Feed WorldSim's scenario descriptions to the GPAI. Does it give appropriate advice under recession conditions? Does it correctly flag risks? Or does it amplify panic or complacency?
3 Evaluate correlated behaviour risk: If the LLM recommends "sell Italian bonds" to 10 million users simultaneously under a sovereign stress scenario (WorldSim: Italy debt 167%, P90 at 194%), does this amplify the crisis? WorldSim provides the structured scenario framework for this assessment.
GDP per Capita: P10/P50/P90 fan chart with Monte Carlo distribution
WorldSim GDP per Capita drilldown showing distributional output for systemic risk assessment

WorldSim produces full distributional outputs (P10/P50/P90) for every KPI. For GPAI systemic risk assessment, these distributions quantify the range of economic outcomes that AI-driven decisions could influence or amplify.

Annex III, Category 5

Public Benefits Eligibility AI

The model: Takes (household_income, employment_status, dependents, housing_cost, disability_status) as inputs. Outputs eligibility decision for social benefits. Deployed across multiple EU member states.
The problem: Article 10 requires the model to work correctly across "the specific geographical, contextual setting" of deployment. An eligibility threshold calibrated for Germany (median income €45k, unemployment 3.4%) may produce systematic errors in Greece (median income €18k, unemployment 10%). The input distributions are fundamentally different across countries.

How WorldSim connects:

1 WorldSim profiles each deployment country: Germany (TI 0.52, unemployment 3.4%, inflation 2.5%), Greece (TI 0.43, unemployment 10%, inflation 2.9%), Poland (TI 0.50, unemployment 2.8%, inflation 4.3%). Each has fundamentally different macro conditions that shape the applicant population.
2 The agency tests with country-specific data: Pull test data from each country and verify the model's eligibility decisions are appropriate for local conditions. A threshold that works in Germany may incorrectly deny benefits to a majority of Greek applicants.
3 Stress-test with future projections: WorldSim shows Greece's unemployment trajectory to 2050 under different paths. If unemployment improves (P10) or worsens (P90), does the model still produce fair outcomes? Document the range of conditions tested.
Honest Scope

What WorldSim Does Not Cover

Full AI Act compliance requires multiple tools. WorldSim is one essential layer, not the entire stack. Here's what you'll need from other providers.

Individual Bias Testing

Testing for fairness across protected characteristics (age, gender, ethnicity) requires personal-level demographic data and micro-level analysis. WorldSim operates at the macro level and does not assess individual-level bias.

Adversarial ML Robustness

Testing against adversarial attacks, data poisoning, and model manipulation requires specialised adversarial ML tooling. WorldSim tests environmental robustness (changing macro conditions), not adversarial inputs.

Data Provenance Documentation

Article 10's data governance requirements include documenting training data sources, collection processes, and annotation methodology. This is an administrative and process challenge that requires data management tooling.

Copyright Compliance

GPAI providers must comply with Union copyright law and publish training data summaries. This is a legal and data management obligation outside WorldSim's scope.

Model Cards & Documentation

Technical documentation, instructions for use, and conformity declarations require documentation tooling and processes. WorldSim provides scenario test results that feed into this documentation, but doesn't generate the documentation itself.

Personal-Level ML Training

Training a credit scoring or hiring model requires personal-level features (income, payment history, qualifications). WorldSim generates macro environments for testing, not personal-level training data.

Build Your Macro Scenario Compliance Layer

The high-risk AI deadline is 2 August 2026. Start testing your AI systems against structured macro scenarios with full audit trail and reproducibility.