Math Takes Two: A test for emergent mathematical reasoning in communication
Can two AI agents invent numbers without being taught?
A new paper titled "Math Takes Two: A test for emergent mathematical reasoning in communication" by Michael Cooper and Samuel Cooper, accepted at the HCAIR workshop at ICLR 2026, proposes a radical shift in how we evaluate AI's mathematical abilities. Instead of testing models on standard arithmetic or algebra problems, the benchmark requires two agents to develop a shared symbolic protocol from scratch to solve a visually grounded task. The task is designed so that using a numerical system—like counting or grouping—makes the solution easier, but the agents are given no prior math knowledge or predefined language. This forces them to discover and agree on latent numerical representations purely through interaction.
The benchmark directly challenges the assumption that current LLMs' high scores on math benchmarks reflect true reasoning. The authors argue that much of this performance could be statistical pattern matching over learned formal syntax. By requiring agents to invent communication and abstract concepts from first principles, Math Takes Two provides a cleaner test of emergent mathematical cognition. The work draws inspiration from how human mathematical thinking co-evolved with precise communication needs. This approach could help researchers build models that don't just mimic math but genuinely understand and create numerical systems.
- Tests two AI agents developing a numerical system from scratch via communication
- Accepted at the HCAIR workshop at ICLR 2026
- Aims to distinguish true reasoning from statistical pattern matching in LLMs
Why It Matters
This benchmark could reveal if AI can truly reason abstractly, not just pattern-match on math problems.