AI Safety

An Evaluation of Chat Safety Moderations in Roblox

Researchers used LLMs to uncover rampant abuse escaping Roblox's automated filters.

Deep Dive

A new academic paper from researchers Priya Kaushik, Sonja Brown, Rakibul Hasan, and Sazzadur Rahaman provides the first independent evaluation of Roblox's chat safety moderation. The team collected approximately 2 million chat messages from four public game servers across multiple age groups, adhering to ethical guidelines and Roblox's terms of service. To handle the scale, they manually labeled 99.8K messages as safe or unsafe to create a ground-truth dataset, then tested four locally hosted large language models (LLMs) to identify the best performer for flagging policy-violating content. That LLM was applied to the entire corpus, and flagged messages were categorized using iterative open and axial coding until thematic saturation.

The findings reveal a disturbing reality: many unsafe messages—including grooming attempts, sexualization of minors, bullying, harassment, violent threats, self-harm references, and sharing of sensitive personal information—routinely bypass Roblox's automated moderation. Worse, users whose messages had been previously flagged continued to send harmful content by employing a range of evasion techniques, such as using code words, replacing letters with symbols, and splitting messages across multiple shorter texts. The study highlights critical gaps in a platform used by hundreds of millions daily, where a substantial portion of users are underage. The authors call for more robust, transparent moderation systems and independent audits to better protect vulnerable users.

Key Points
  • Study collected 2M chat messages from 4 Roblox games across multiple age groups to test moderation effectiveness.
  • Manually labeled 99.8K messages as ground truth, then used best-performing LLM to flag unsafe content across entire dataset.
  • Found groomers, bullies, and abusers evade filters with code words, symbol substitution, and fragmented messaging.

Why It Matters

For millions of underage Roblox users, current chat filters fail to block serious abuse, demanding urgent overhaul.