New LLM Pipeline Analyzes 4,322 EU Regulatory Submissions with Full Traceability
AI extracts 15K+ stakeholder topics from Digital Fairness Act feedback with verbatim evidence.
Public consultations generate massive datasets of stakeholder submissions that are practically impossible to analyze manually. Researchers Thales Bertaglia, Haoyang Gui, Catalina Goanta, and Gerasimos Spanakis from Maastricht University address this with an LLM pipeline and interactive dashboard for structured topic extraction. The system processes raw PDF attachments and web-form responses, extracts topic annotations, and grounds every extraction in a verbatim quote from the source text. They demonstrated it on the European Commission's Digital Fairness Act (DFA) public call for evidence, handling 4,322 submissions to produce 15,368 topic annotations supported by 20,951 evidence quotes. The design rests on three principles: verbatim grounding, full traceability, and transparency by design.
The dashboard exposes the full extraction dataset across five analytical views, from dataset-level topic overviews to individual paragraph drill-downs, with every result traceable to its source. Beyond the predefined DFA topic categories, the pipeline identified emergent stakeholder concerns such as Age Verification, Payment Processor Censorship, and Digital Ownership—issues a fixed-taxonomy approach would have missed. The pipeline is domain-generic; adapting it to a new consultation requires only a prompt update and a new dataset. A live demo is available, and the code and processed data are publicly released on GitHub. This work significantly advances the practical, transparent use of LLMs for large-scale regulatory analysis, enabling policymakers and researchers to quickly surface nuanced public opinions with verifiable evidence.
- Processed 4,322 DFA submissions into 15,368 topic annotations with 20,951 verbatim evidence quotes, ensuring traceability.
- Discovered emergent stakeholder concerns (Age Verification, Payment Processor Censorship, Digital Ownership) beyond a fixed taxonomy.
- Domain-generic design: adapting to a new consultation requires only a prompt update and a new dataset, with full code and data open-source.
Why It Matters
Enables fast, transparent, and scalable analysis of public feedback shaping EU digital regulations.