Research & Papers

Online Scalarization in Vector-Valued Games

New algorithm lets AI players dynamically adjust preferences mid-game for better outcomes.

Deep Dive

Researchers (Asadollahi, Hawkins, Hale) propose an online scalarization framework for vector-valued games where players adapt their payoff weightings in real time. Using a bi-level bandit learning approach, the method achieves sublinear regret and boosts convergence to preferred equilibria from ~50% to ~80% in experiments. This enables players to dynamically reshape their objectives during repeated interactions.

Key Points
  • Bi-level framework: outer learner chooses scalarization, inner learner selects actions via bandit no-regret learning.
  • Sublinear regret guarantees proven using bandit online mirror descent with stabilized importance weighting.
  • Convergence to preferred equilibrium improved from ~50% to ~80% in vector-valued game experiments.

Why It Matters

Enables multi-agent AI systems to dynamically rebalance conflicting objectives, improving cooperation and negotiation outcomes.