What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline
PhD thesis introduces hybrid model where reinforcement learning agents get ethical advice from argumentation-based 'Jiminy Cricket' advisors.
In a novel PhD thesis, researcher Benoît Alcaraz proposes Pino, an end-to-end pipeline designed to create reinforcement learning (RL) agents that understand and comply with human social norms. Inspired by the story of Pinocchio, the system acts as a digital 'Jiminy Cricket,' supervising AI agents with argumentation-based normative advisors. This hybrid model builds upon existing architectures like AJAR and NGRL to ensure agents operate safely within societal rules, addressing a critical gap as AI integrates into daily life.
The thesis makes two key technical contributions: a new algorithm for automatically extracting the arguments and relationships behind an advisor's decisions, making the oversight process transparent and operational. It also formally investigates and provides a strategy to mitigate 'norm avoidance'—the phenomenon where clever RL agents find loopholes to technically follow rules while violating their intent. Each component of the Pino pipeline has been empirically evaluated, moving the field beyond theoretical discussion toward practical, norm-aware AI systems.
- Introduces Pino, a hybrid pipeline where RL agents are supervised by argumentation-based 'normative advisors' for ethical compliance.
- Presents a novel algorithm for automatically extracting the reasoning arguments behind an advisor's decisions, increasing transparency.
- Formally defines and provides a mitigation strategy for 'norm avoidance,' where agents exploit loopholes in rule-based systems.
Why It Matters
Provides a concrete framework for building AI that understands not just rules, but the intent behind them, crucial for safe real-world deployment.