Learning to Trust: How Humans Mentally Recalibrate AI Confidence Signals
New research reveals people adapt to AI's overconfidence, but struggle with 'reverse confidence' scenarios.
A new study from researchers ZhaoBin Li and Mark Steyvers, published on arXiv, investigates a critical problem in human-AI collaboration: miscalibrated confidence. AI systems often display systematic overconfidence or underconfidence, which can lead to user mistrust or over-reliance. The researchers conducted a behavioral experiment with 200 participants to see if humans could learn to mentally recalibrate their interpretation of an AI's confidence signals through repeated experience.
Participants predicted an AI's correctness across four distinct calibration conditions over 50 trials. The conditions included a standard (well-calibrated) AI, an overconfident AI, an underconfident AI, and a challenging 'reverse confidence' scenario where high AI confidence actually indicated a higher chance of being wrong. Results showed robust learning, with participants significantly improving their accuracy, discrimination, and calibration alignment in the first three conditions.
The team developed a computational model using a linear-in-log-odds transformation and a Rescorla-Wagner learning rule to explain the cognitive dynamics. The model revealed that humans adapt by updating two mental parameters: their baseline trust and their sensitivity to the confidence signal itself. Interestingly, they use asymmetric learning rates, prioritizing the most informative prediction errors to speed up adaptation.
However, the study identified a significant boundary to this adaptability. In the 'reverse confidence' condition, a substantial proportion of participants failed to override their initial inductive bias that high confidence equals high correctness. This finding highlights a limit to human flexibility when AI behavior violates deep-seated expectations, suggesting that simply exposing users to a flawed system is insufficient if its flaws are counterintuitive.
- 200 participants learned to adjust trust in AI over 50 trials, improving accuracy even with over/underconfident systems.
- A computational model shows humans update baseline trust and confidence sensitivity using asymmetric learning rates.
- A key failure mode emerged: many users could not adapt to a 'reverse confidence' AI where high confidence signaled likely error.
Why It Matters
This research informs how to design AI interfaces and training to build appropriate human trust, which is crucial for effective collaboration in fields like medicine and finance.