Research & Papers

Second-order Theory of Mind lets AI detect and correct human biases

An AI that models your mistaken beliefs about it can improve interactions significantly

Deep Dive

A team of researchers led by Patrick Callaghan at Carnegie Mellon University (implicitly) has published a paper introducing an AI agent equipped with second-order Theory of Mind (ToM-2). Unlike standard AI that only models the world or a user's direct beliefs, this agent models what a person *thinks the agent knows*—a recursive reasoning layer that accounts for human misunderstandings about the AI's own knowledge. Using the Interactive Partially Observable Markov Decision Process (I-POMDP) framework, the agent continuously updates its model of the user's beliefs, including the cognitive biases and heuristics (CBH) that cause those beliefs to diverge from reality. When it detects a discrepancy, it generates targeted feedback to realign the user's mental model.

The researchers conducted an in-person user study comparing a ToM-2 learner against a baseline agent without second-order reasoning. The results showed that the ToM-2 learner significantly improved the informativeness of teacher actions—meaning users provided better and more relevant feedback—and subjective surveys indicated that users found the agent's feedback more useful and intuitive. This work addresses a critical gap in human-AI interaction: mismatched expectations about what an AI knows can lead to frustration and inefficiency. By explicitly modeling and correcting these misconceptions, the ToM-2 agent paves the way for more collaborative and trustworthy AI systems, especially in tutoring, assistance, and teaming scenarios.

Key Points
  • The ToM-2 agent models a user's erroneous beliefs about the AI itself using recursive reasoning (second-order Theory of Mind).
  • Built on the I-POMDP framework, it detects cognitive biases and heuristics (CBH) that lead to user misconceptions.
  • A user study demonstrated the ToM-2 learner significantly improved teacher action informativeness and subjective feedback usefulness.

Why It Matters

This enables AI to proactively correct human misunderstandings, making collaborations smoother and reducing friction in tutoring and teaming.