PAVE: New AI architecture lets agents break rules when it's legitimate
LLM-powered agents can now decide when to violate rules in emergencies like fire evacuations.
Researchers from the University of Texas at Austin and collaborators have introduced PAVE (Perception, Assessment, Verdict, Emulation), a cognitive architecture that gives LLM-based generative agents the ability to reason about when it is legitimate to break rules. Current generative agents excel at cooperative behavior but fail in situations where rule-breaking is necessary—like a fire evacuation where ignoring a stop sign might save lives. PAVE addresses this with a four-module pipeline: Perception extracts structured context (authority distance, peer behavior, severity tags), Assessment scores legitimacy along five scalars (including necessity, proportionality, no alternatives), Verdict decides to comply or violate using a hard gate tuned per agent's persona, and Emulation enacts the decision scoped only to the relevant rule.
PAVE was instantiated in Voville, a fork of the Smallville traffic simulation, and evaluated across three scenarios using four LLM backbones (including GPT-4 and Llama 3). The architecture achieved four key properties simultaneously: legitimate violation (only when justified), authority deference (police instructions override high legitimacy), bounded scope (violation only of the target rule), and recovery (baseline behavior restored after the trigger ends). Human evaluators rated PAVE agents as more plausible than vanilla agents. Ablation studies showed that removing the legitimacy gate caused the system to regress to vanilla-like failures. The paper, code, and evaluation pipeline are set for release upon publication.
- PAVE uses four modules: Perception, Assessment, Verdict, Emulation to handle rule-breaking in emergencies.
- Tested in Voville environment (fork of Smallville) across 4 LLM backbones with human evaluators preferring PAVE.
- Four guaranteed properties: legitimate violation, authority deference, bounded scope, and recovery after trigger ends.
Why It Matters
Enables safer AI agents in real-world scenarios—like autonomous cars or emergency bots—that know when to break rules.