Research & Papers

Decision Aggregation under Quantal Response

A new paper finds aggregating moderately random LLM outputs boosts accuracy on complex reasoning tasks.

Deep Dive

A new research paper from Zhihuan Huang, Yichong Xia, and Yuqing Kong, titled 'Decision Aggregation under Quantal Response,' challenges the assumption that perfect rationality is always best for collective decision-making. The study analyzes how to best aggregate decisions from multiple experts, each with private information. Departing from the 'fully rational' model, it uses a 'quantal response' framework to capture the bounded rationality and inherent randomness in real agents, like humans or AI. Within a minimax regret framework, the researchers prove a surprising result: when individual rationality is below a certain threshold, simple majority voting is the optimal, most robust way to combine decisions.

The paper's most counterintuitive finding is that groups of these imperfect, somewhat random agents can actually outperform groups of perfectly rational, deterministic agents. The reason is that the agents' stochastic behavior can encode weak but informative signals that would be lost in purely deterministic reasoning. The authors validated this theory using large language models (LLMs), which naturally exhibit quantal response behavior through their 'temperature' parameter, which controls output randomness. Their experiments showed that aggregating the outputs from multiple LLM instances with moderate temperature settings—making them moderately stochastic—significantly improved accuracy on complex reasoning tasks compared to using a single, deterministic model.

Key Points
  • Proves majority voting is the optimal aggregator for groups of 'boundedly rational' agents with private signals.
  • Shows groups of imperfect, stochastic agents can outperform groups of perfectly rational, deterministic ones.
  • Validated using LLMs: aggregating outputs from models with moderate temperature boosts complex reasoning accuracy.

Why It Matters

This provides a formal framework for improving AI system performance through strategic aggregation of multiple, slightly varied model outputs.