Latent Space Probing for Adult Content Detection in Video Generative Models
Lightweight classifiers on CogVideoX latent space achieve 97.29% F1 in just 4–6ms
As AI-generated video becomes ubiquitous, moderating adult content in real time is a growing challenge. Current detection methods analyze either text prompts or final pixel outputs, missing the rich internal representations formed during generation. Researchers from a team led by Alizishaan Khatri propose a novel latent space probing framework that intercepts the denoised latent representations of the CogVideoX video diffusion model during inference. They attach lightweight classifier heads to these internal states, enabling detection without waiting for full video generation. To support the work, the team constructed a large-scale binary dataset of 11,039 ten-second video clips — 5,086 violating clips from adult websites and 5,953 non-violating clips from YouTube.
The results are striking: the probing classifiers achieve 97.29% F1 score on a held-out test set, with an inference overhead of only 4–6 milliseconds. This is both faster and more accurate than existing pixel-space or prompt-based filters. The lightweight architecture (a few thousand parameters) makes it suitable for real-time deployment on consumer GPUs. The paper will appear at the 2026 IEEE DSN Workshops. The work demonstrates that latent-space signals encode strong discriminative features for harmful content, opening a new path for efficient, proactive moderation in generative video systems.
- Novel latent space probing instead of post-hoc pixel or prompt analysis
- Dataset of 11,039 clips (5,086 violating from adult sites, 5,953 non-violating from YouTube)
- 97.29% F1 on held-out test set with only 4–6ms inference overhead using lightweight classifiers
Why It Matters
Enables real-time, cost-effective content moderation inside generative video models without slowing generation.