Research & Papers

Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

Cornell researchers use Jane Austen to train AI on contextual privacy.

Deep Dive

A new paper from Cornell University presents a novel method for aligning LLM agents with contextual privacy expectations using "normative simulacra"—structured representations of privacy norms and information flows—extracted from fiction novels. The researchers, Matt Franchi, Madiha Zahrah Choksi, Harold Triedman, and Helen Nissenbaum, argue that existing approaches to privacy reasoning are either too costly (doubling inference costs with supervisor-assistant setups) or too narrow (fine-tuning on task-specific data). Their solution: fine-tune LLMs via supervised learning followed by Group Relative Policy Optimization (GRPO) reinforcement learning, using a composite reward function that includes task clarity, structural completeness, internal consistency, and context identification.

The team evaluated their approach across seven models on five CI-aligned benchmarks spanning diverse societal contexts. Results showed that supervised fine-tuning alone introduced a conservative prior toward restricting information flow, improving recognition of privacy-relevant situations but not the correctness of privacy judgments. However, combining GRPO with normative grounding achieved the highest score on a law compliance benchmark and the strongest correlation with crowdsourced human privacy expectations. To prevent overfitting, the researchers introduced per-completion contrastive scoring, where each completion is evaluated against both the correct normative universe and a randomly selected wrong one, teaching models to condition on context rather than memorize source-specific norms. This demonstrates that fiction-derived normative simulacra can effectively teach contextual privacy reasoning that transfers to real-world domains.

Key Points
  • Method extracts privacy norms from fiction novels and uses them for fine-tuning LLMs via supervised learning plus GRPO reinforcement learning
  • Achieved highest score on law compliance benchmarks and strongest correlation with human privacy expectations across 7 models
  • Per-completion contrastive scoring prevents overfitting by evaluating against both correct and wrong normative universes

Why It Matters

Fiction-trained models now surpass supervisor architectures in privacy reasoning, enabling cheaper, more context-aware AI agents.