Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation
New framework proves safety guidance must be applied early in the denoising process for effective, high-quality generation.
A team of researchers has introduced Safety-Guided Flow (SGF), a unified framework that bridges two previously distinct approaches to making generative AI models safer. The work, accepted as an Oral presentation at ICLR 2026, creates a probabilistic model using a Maximum Mean Discrepancy (MMD) potential. This framework mathematically shows that existing methods like Shielded Diffusion and Safe Denoiser are specific instances of a broader energy-based guidance principle that steers models away from unsafe data samples.
The core technical breakthrough of SGF is its identification of a critical timing window for applying safety interventions. By leveraging analysis from control-barrier functions used in robotics, the researchers proved that negative guidance must be strong only during the early stages of the denoising process in models like Stable Diffusion. Outside this window, the guidance should decay to zero to prevent it from degrading the final quality and diversity of the generated images or plans.
Evaluated on multiple realistic safe generation scenarios, the SGF framework provides a clear, justified rule for when safety measures are necessary, moving beyond the heuristics of past methods. This gives AI engineers a principled blueprint for building safeguards into generative models, ensuring they avoid harmful content or physical obstacles (in the case of robot planning) while maintaining the fidelity of their outputs. The code for the framework is publicly available, facilitating further research and application.
- Unifies robot safety (control barrier functions) and content safety (negative guidance) under one MMD-based framework.
- Identifies a critical early-time window where safety guidance is essential, preventing quality loss later in generation.
- Validated on real tasks and accepted for an Oral presentation at the top-tier ICLR 2026 conference.
Why It Matters
Provides a foundational, timing-aware method for building inherently safer generative AI and robotics systems without compromising output quality.