Teaching AI image models to say no more precisely and on command
Researchers give AI a better 'off switch' to refuse harmful or unwanted requests.
A new method called CR-VLM gives developers fine-grained control over when AI vision models refuse to answer. It uses 'activation steering' to adjust the model's internal signals, preventing it from being overly cautious or overly permissive. The system includes a gating mechanism to preserve normal function and a module to align visual understanding with refusal rules. Tests show it makes refusals more effective, efficient, and adaptable to different user needs and safety requirements.
Why It Matters
This enables more nuanced and user-customizable safety controls for the next generation of multimodal AI assistants.