Dynamic Eraser for Guided Concept Erasure in Diffusion Models
New training-free framework erases sensitive concepts from Stable Diffusion while preserving image quality.
Researcher Qinghui Gong has introduced a novel framework called Dynamic Semantic Steering (DSS), designed to solve a critical problem in text-to-image (T2I) diffusion models: the precise removal of unsafe or unwanted concepts. Current methods, which operate during image generation (inference-time), often fail catastrophically. They either cause uncontrolled over-correction, erasing too much, or suffer from 'semantic drift,' where the image collapses into nonsense. DSS tackles this with a two-part, training-free approach: Sensitive Semantic Boundary Modeling (SSBM) to automatically find safe semantic anchors, and Sensitive Semantic Guidance (SSG), which uses the model's own cross-attention features to detect and suppress target concepts via a mathematically derived, closed-form solution.
This technical innovation translates to a massive performance leap. DSS achieves an average erasure rate of 91.0%, a significant jump from the 18.6% to 85.9% range of current state-of-the-art methods. Crucially, it does this while preserving the fidelity and quality of the generated image for all other, benign concepts. For developers and platforms using models like Stable Diffusion, this means they can now implement robust safety filters without needing to retrain massive models from scratch. The framework's interpretability, stemming from its use of the model's internal attention maps, also provides clearer insight into *how* a concept is being removed, moving safety from a black-box process toward a more controllable one.
- Achieves 91.0% average erasure rate, outperforming previous SOTA methods (18.6%-85.9%).
- Uses a novel two-part framework: Sensitive Semantic Boundary Modeling (SSBM) and Sensitive Semantic Guidance (SSG).
- Training-free and lightweight, allowing safe deployment without costly model retraining.
Why It Matters
Enables platforms to deploy safer AI image generators by removing harmful content without breaking the model or requiring full retraining.