AI Safety

Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework

arXiv cs.CY May 05, 2026

⚡New system uses expert agents to filter harmful content based on individual sensitivity profiles

Deep Dive

Traditional content moderation relies on centralized rules that fail to account for the subjective nature of harm perception. A new paper from Ewelina Gajewska and colleagues introduces a multi-agent personalized inference framework built on large language models (LLMs). The architecture uses three agent types: domain-specific Expert Agents that analyze content for particular harm categories, a Manager Agent that orchestrates analysis and selects the right experts, and a Ghost Profile Agent that simulates a user's unique perspective based on their sensitivity profile. This allows the system to tailor moderation decisions to each individual, moving beyond one-size-fits-all blocking or flagging.

Evaluated against non-personalized baselines, the framework achieved up to a 32% improvement in accuracy, meaning it better aligns with what each user actually finds harmful. The granularity of personalization is controlled by the platform, ensuring moderation policies can still be enforced. The paper, accepted to the 34th European Conference on Information Systems (ECIS 2026), provides policy-relevant insights for platform governance, showing how LLM agents can reconcile societal norms with individual digital rights. This approach could revolutionize online moderation by making it both more effective and more respectful of user autonomy.

Key Points

32% improvement in accuracy over non-personalized moderation baselines, aligning with individual user sensitivities
Three-agent architecture: Expert Agents (domain-specific), Manager Agent (orchestrator), Ghost Profile Agent (user perspective simulator)
Paper accepted to ECIS 2026, offering a scalable policy framework that balances platform governance with digital rights

Why It Matters

Personalized content moderation could finally balance platform safety with individual user autonomy and sensitivity.

Read Original Article

Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework

Why It Matters

Stay Ahead in AI