Research & Papers

Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

New system combines ML-assisted sampling with LLM labeling to track policy violations across billions of daily impressions.

Deep Dive

A team of researchers has published a paper outlining a novel, scalable system for measuring the prevalence of policy-violating content on large platforms. The core challenge is that harmful content is often rare, making traditional human labeling slow and expensive for platform-wide studies. Their solution is a three-part, design-based system. First, it draws daily probability samples from the impression stream (what users actually see), using machine learning to intelligently weight the sample toward high-exposure and high-risk content while preserving statistical unbiasedness. Second, it labels these sampled items using a multimodal LLM governed by specific policy prompts and validated against a human-labeled 'gold set.' Third, it produces statistically rigorous prevalence estimates with confidence intervals and supports dashboard drilldowns. A key innovation is 'one global sample with many pivots'—a single daily sample can be re-weighted through post-stratification to estimate prevalence by geography, content age, or user surface. The paper details the statistical estimators, variance calculations, and an engineering workflow to make the system configurable across different content policies. This moves beyond simple detection to provide a continuous, representative measurement of user experience at scale.

Key Points
  • System uses ML-assisted sampling to concentrate labeling budget on high-risk content, making rare-violation studies feasible.
  • Employs multimodal LLMs for automated labeling, governed by policy prompts and validated with human gold sets.
  • Produces daily, design-consistent prevalence estimates with confidence intervals, enabling drilldowns by geography, content age, and platform surface.

Why It Matters

Enables platforms to proactively measure safety at scale, moving from reactive content removal to data-driven policy and risk assessment.