MidSteer: Optimal Affine Framework for Steering Generative Models
New affine framework enables steering diffusion models and LLMs with minimal output disruption.
Steering intermediate representations has become a powerful method for controlling generative models, especially for post-deployment alignment and safety, but has lacked a comprehensive theoretical foundation. Now, a team of researchers from Queen Mary University of London, Huawei, and other institutions has bridged this gap with MidSteer (Minimal Disturbance concept Steering). The paper first establishes a rigorous link between concept steering and affine concept erasure, proving that standard removal methods are a special case of the LEACE algorithm. Building on this, they introduce LEACE-Switch, a principled framework for optimally switching between concepts in latent space.
MidSteer goes further by relaxing the assumptions of LEACE-Switch, allowing for directed, minimal-disturbance transformations across modalities. The framework works with both vision diffusion models (e.g., Stable Diffusion) and large language models (e.g., GPT-style architectures), enabling fine-grained control over attributes like style, safety, or bias without retraining or significant output quality loss. The paper provides both theoretical optimality proofs and empirical results showing favorable performance on steering tasks, making it a practical tool for AI safety and customization.
- Formalizes concept steering theory, linking it to affine erasure (LEACE) and proving optimality conditions.
- Introduces LEACE-Switch for optimal concept switching and MidSteer for minimal-disturbance transformations.
- Validated on vision diffusion models and large language models, enabling post-deployment control without retraining.
Why It Matters
Enables safer, more precise control of AI outputs post-deployment without costly retraining or output quality loss.