Q-learning agents underperform random policies by 10-73% on RP and 7-35% on CALT, while Reward Fairness stays above 0.92?

Q-learning agents underperform random policies by 10-73% on RP and 7-35% on CALT, while Reward Fairness stays above 0.92

RP achieves 12-25x speedup over ALT, enabling scalable fairness evaluation for larger agent populations?

RP achieves 12-25x speedup over ALT, enabling scalable fairness evaluation for larger agent populations

Agent Frameworks

New temporal fairness metrics reveal Q-learning agents 73% worse than random

arXiv cs.MA May 15, 2026

⚡12-25x speedup with RP metrics uncovers coordination failures invisible to traditional measures

Deep Dive

A new paper by Nikolaos Al. Papadopoulos tackles the challenge of temporal fairness in multi-agent systems where agents repeatedly compete for limited resources. The work introduces Rotational Periodicity (RP), a family of lightweight metrics, alongside the ALT sliding-window measures, within a unified framework. The Multi-Agent Battle of the Exes (MBoE) benchmark formalizes repeated fair division, with Perfect Alternation (PA) as the ideal solution. RP decomposes fairness into Rotational Score and Waiting Periods Evaluation, achieving O(nu+n) time complexity versus ALT's O(nu*n).

Empirical evaluation across agent counts 2-10 reveals three key findings: Q-learning agents perform 10-73% worse on RP and 7-35% worse on CALT than random policies, yet traditional Reward Fairness remains deceptively high (above 0.92 for n>=3). RP delivers 12-25x computational speedup over ALT, scaling reliably where ALT becomes intractable. The two metric families are complementary—ALT offers richer discrimination for small populations while RP scales to larger ones. Together they form a diagnostic toolkit for temporal fair division, with implications for multi-agent coordination in AI systems.

Key Points

RP metrics decompose temporal fairness into Rotational Score and Waiting Periods Evaluation with O(nu+n) complexity
Q-learning agents underperform random policies by 10-73% on RP and 7-35% on CALT, while Reward Fairness stays above 0.92
RP achieves 12-25x speedup over ALT, enabling scalable fairness evaluation for larger agent populations

Why It Matters

Practical fairness diagnostics for multi-agent AI systems, enabling scalable detection of coordination failures invisible to current metrics.

Read Original Article

New temporal fairness metrics reveal Q-learning agents 73% worse than random

Why It Matters

Related Articles

🚀 Stay Ahead in AI