Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision
AI and humans make different directional mistakes revealing hidden biases in vision models
In a paper published on arXiv (2604.21909), researchers Leyla Roksan Caglar, Pedro A.M. Mediano, and Baihan Lin explore how humans and deep vision models diverge in their inductive biases through directional confusions. They tested matched responses on natural-image categorization under 12 perturbation types, quantifying asymmetry in confusion matrices. Using a Rate-Distortion (RD) framework, they derived three geometric signatures—slope (beta), curvature (kappa), and efficiency (AUC)—to characterize generalization geometry. Results show humans have broad but weak asymmetries, while deep models exhibit sparse, strong directional collapses. Robustness training reduces global asymmetry but fails to achieve human-like breadth-strength profiles. Mechanistic simulations confirm that different asymmetry organizations shift the RD frontier in opposite directions, even when performance is matched. This positions directional confusions and RD geometry as compact, interpretable signatures of inductive bias under distribution shift, offering new ways to evaluate and improve AI vision systems.
The study's implications extend to AI safety and interpretability, as it provides a systematic method to uncover hidden biases in vision models that accuracy alone misses. By revealing how models 'confuse' categories differently from humans, researchers can design more robust and human-aligned systems. The RD framework offers a principled way to measure these biases, potentially improving model training and evaluation. This work highlights the importance of going beyond top-1 accuracy to understand model behavior, especially in critical applications like autonomous driving or medical imaging where misclassification patterns matter.
- Humans show broad but weak directional confusions; deep vision models have sparse, strong directional collapses
- Rate-Distortion framework yields three geometric signatures: slope, curvature, and efficiency
- Robustness training reduces global asymmetry but fails to replicate human-like graded similarity profiles
Why It Matters
Reveals hidden biases in AI vision systems that accuracy alone misses, improving model evaluation and alignment with human perception.