Research & Papers

Coarse Q-learning: Indifference vs. Indeterminacy vs. Instability

New theory shows AI agents can get stuck in loops when grouping options...

Deep Dive

In a new paper on arXiv, economists Philippe Jehiel and Aviman Satpathy introduce Coarse Q-learning (CQL), a reinforcement-learning model designed for bandit problems where the set of available alternatives changes stochastically over time. The key innovation is that alternatives are exogenously partitioned into similarity classes, and feedback from sampled alternatives is pooled within those classes to form class-level valuations. Choices then follow a multinomial logit over these class valuations, and valuations update toward realized payoffs like in standard Q-learning.

Using stochastic approximation, the authors derive the mean-field dynamics and characterize steady states as smooth analogues of Valuation Equilibria. In the high payoff-sensitivity limit, CQL exhibits novel long-run phenomena: it can produce multiple stable strict equilibria, a unique globally stable mixed equilibrium with indifference across classes, or—most strikingly—no stable equilibrium at all, with valuations and choice probabilities converging to a stable limit cycle. These outcomes are driven entirely by coarse aggregation and do not appear in standard alternative-level benchmarks. The paper runs 45 main pages plus 26 appendix pages and is categorized under Theoretical Economics and Computer Science and Game Theory.

Key Points
  • CQL groups alternatives into similarity classes, pooling feedback to class-level valuations.
  • High payoff-sensitivity can lead to multiple stable equilibria, a unique mixed equilibrium, or stable limit cycles.
  • These phenomena are absent in standard Q-learning models that treat each alternative independently.

Why It Matters

This could reshape how AI agents handle complex choices, revealing hidden instability in learned behavior.