New method trains AI to make decisions using the right evidence
Researchers teach AI to avoid shortcuts and focus on the correct information for its decisions.
A new training method forces AI models to justify their decisions using specific, human-approved evidence, not just statistical shortcuts. It uses 'attribution constraints' to penalize the model when it relies on the wrong parts of data, like irrelevant areas of an image. Tested on image classification and AI agents, this approach improves both accuracy and the reasonableness of the model's decision-making process, making AI more reliable and transparent.
Why It Matters
This makes AI systems more trustworthy by ensuring their reasoning aligns with human logic.