OSCToM-8B achieves 76% accuracy on FANToM vs ExploreToM's 0.2%?

OSCToM-8B achieves 76% accuracy on FANToM vs ExploreToM's 0.2%

Data synthesis procedure is 6x more efficient than prior methods?

Data synthesis procedure is 6x more efficient than prior methods

Uses RL and compositional surrogate models to generate belief conflicts?

Uses RL and compositional surrogate models to generate belief conflicts

Research & Papers

OSCToM: New AI method boosts Theory of Mind reasoning 76% vs near 0%

arXiv cs.AI May 22, 2026

⚡LLM social reasoning just got a 6x more efficient data synthesis upgrade.

Deep Dive

A new paper from researchers introduces OSCToM (Observer-Self Conflict Theory of Mind), a framework that combines reinforcement learning, a domain-specific language, and compositional surrogate models to generate adversarial examples for high-order Theory of Mind tasks. These tasks involve nested beliefs and information asymmetries where an observer's view of another agent conflicts with their own belief state. Existing benchmarks like ExploreToM fail to adequately test such recursive reasoning, but OSCToM directly targets these gaps.

The results are striking: OSCToM-8B achieves 76% accuracy on the FANToM benchmark, a massive leap from ExploreToM's reported 0.2%. It also improves on FANToM results and remains competitive on Hi-ToM and BigToM. The data synthesis process is 6x more efficient, meaning smaller models can now handle advanced cognitive reasoning without requiring massive datasets. Code is available on GitHub.

Key Points

OSCToM-8B achieves 76% accuracy on FANToM vs ExploreToM's 0.2%
Data synthesis procedure is 6x more efficient than prior methods
Uses RL and compositional surrogate models to generate belief conflicts

Why It Matters

Enables smaller LLMs to master complex social reasoning, unlocking more human-like AI interactions.

Read Original Article

OSCToM: New AI method boosts Theory of Mind reasoning 76% vs near 0%

Why It Matters

Related Articles

Stay Ahead in AI