Learning in Proportional Allocation Auctions Games
New game theory paper shows how AI agents learn to bid optimally in repeated resource allocation auctions.
A team of researchers including Younes Ben Mazziane and Cleque-Marlain Mboulou Moutoubi has published a significant game theory paper titled "Learning in Proportional Allocation Auctions Games" on arXiv. The work analyzes the repeated Kelly auction game, a proportional allocation mechanism where a divisible resource (like bandwidth or compute) is split among agents based on their bids. The researchers first derived the game's structure from real-world problems like fairness-throughput trade-offs in wireless network slicing, proving the stage game has a unique Nash Equilibrium (NE).
They then tackled the core question of learning dynamics: what happens when AI agents repeatedly play this auction while trying to learn and optimize? The paper provides strong theoretical guarantees, proving convergence to the NE under three distinct behavioral models. These are: all agents using Online Gradient Descent (OGD), all using a variant of Follow-the-Regularized-Leader called Dual Averaging with a quadratic regularizer (DAQ), and all playing myopic Best Responses (BR). Crucially, convergence holds even when agents use personalized, non-identical learning rates, which is common in practice.
Extensive simulations complemented the theory, comparing the models on convergence speed and time-average utility. The results revealed that myopic Best Response (BR) achieved both the fastest convergence and the highest utility for agents. However, a critical caveat emerged: convergence can fail under heterogeneous update rules, meaning stability isn't guaranteed if agents in the same system use different learning algorithms. This finding is crucial for system design.
This research provides a formal foundation for understanding and designing multi-agent systems where AI entities must repeatedly compete for shared resources through bidding. The proven convergence results offer engineers and economists a toolkit of stable learning algorithms (OGD, DAQ, BR) for applications ranging from cloud computing and 5G network slicing to decentralized finance (DeFi) mechanisms.
- Proves convergence to Nash Equilibrium for agents using OGD, DAQ, or Best Response algorithms, even with personalized learning rates.
- Simulations show myopic Best Response (BR) achieves fastest convergence and highest time-average utility for agents.
- Identifies a key failure mode: convergence breaks under heterogeneous update rules where agents use different algorithms.
Why It Matters
Provides a stable algorithmic foundation for designing AI systems that compete for resources in auctions, critical for networking and cloud infrastructure.