Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values
Using game theory, researchers ensure fairness without knowing individual arm contributions.
A team of researchers has introduced a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits (BCMAB) with full-bandit feedback. Unlike semi-bandit feedback where individual arm contributions are observable, full-bandit feedback only reveals the total reward of the chosen combination. This makes it extremely challenging to determine which arms deserve credit and thus how to allocate resources fairly. To address this, the authors—Shradha Sharma, Swapnil Dhamal, and Shweta Jain—extend the classic Shapley value from cooperative game theory to a novel K-Shapley value. This captures the marginal contribution of an arm restricted to sets of size at most K, ensuring properties like symmetry, linearity, and efficiency.
Based on this K-Shapley value, they propose the K-SVFair-FBF algorithm, which adaptively estimates the value function under full feedback while mitigating noise from Monte Carlo approximations. Theoretically, K-SVFair-FBF achieves an O(T^{3/4}) fairness regret bound. Experiments on federated learning and social influence maximization datasets demonstrate that the algorithm not only ensures fairer resource distribution but also outperforms existing baselines in overall performance. This work opens up practical applications where budget constraints and opaque feedback make fair allocation difficult, such as in advertising campaigns, network resource management, and collaborative AI training.
- Extends classical Shapley value to K-Shapley value, enabling fair credit assignment in full-bandit feedback settings where individual arm contributions are hidden.
- Achieves a theoretical fairness regret bound of O(T^{3/4}), balancing exploration and exploitation under budget constraints.
- Outperforms existing baselines on real-world datasets from federated learning and social influence maximization, proving practical effectiveness.
Why It Matters
Fair resource allocation in budgeted, opaque environments like ad bidding or distributed learning without full feedback.