Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework
New system uses expert agents to filter harmful content based on individual sensitivity profiles
Traditional content moderation relies on centralized rules that fail to account for the subjective nature of harm perception. A new paper from Ewelina Gajewska and colleagues introduces a multi-agent personalized inference framework built on large language models (LLMs). The architecture uses three agent types: domain-specific Expert Agents that analyze content for particular harm categories, a Manager Agent that orchestrates analysis and selects the right experts, and a Ghost Profile Agent that simulates a user's unique perspective based on their sensitivity profile. This allows the system to tailor moderation decisions to each individual, moving beyond one-size-fits-all blocking or flagging.
Evaluated against non-personalized baselines, the framework achieved up to a 32% improvement in accuracy, meaning it better aligns with what each user actually finds harmful. The granularity of personalization is controlled by the platform, ensuring moderation policies can still be enforced. The paper, accepted to the 34th European Conference on Information Systems (ECIS 2026), provides policy-relevant insights for platform governance, showing how LLM agents can reconcile societal norms with individual digital rights. This approach could revolutionize online moderation by making it both more effective and more respectful of user autonomy.
- 32% improvement in accuracy over non-personalized moderation baselines, aligning with individual user sensitivities
- Three-agent architecture: Expert Agents (domain-specific), Manager Agent (orchestrator), Ghost Profile Agent (user perspective simulator)
- Paper accepted to ECIS 2026, offering a scalable policy framework that balances platform governance with digital rights
Why It Matters
Personalized content moderation could finally balance platform safety with individual user autonomy and sensitivity.