Online Scalarization in Vector-Valued Games
New algorithm lets AI players dynamically adjust preferences mid-game for better outcomes.
Deep Dive
Researchers (Asadollahi, Hawkins, Hale) propose an online scalarization framework for vector-valued games where players adapt their payoff weightings in real time. Using a bi-level bandit learning approach, the method achieves sublinear regret and boosts convergence to preferred equilibria from ~50% to ~80% in experiments. This enables players to dynamically reshape their objectives during repeated interactions.
Key Points
- Bi-level framework: outer learner chooses scalarization, inner learner selects actions via bandit no-regret learning.
- Sublinear regret guarantees proven using bandit online mirror descent with stabilized importance weighting.
- Convergence to preferred equilibrium improved from ~50% to ~80% in vector-valued game experiments.
Why It Matters
Enables multi-agent AI systems to dynamically rebalance conflicting objectives, improving cooperation and negotiation outcomes.