Ledford and Regli's TAC-MAB cuts communication 23x in threshold learning
Decentralized agents learn coalition thresholds with 23x less chatter
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
A new paper from Ledford and Regli tackles a critical coordination problem: what happens when multi-agent tasks give zero feedback unless the team hits an unknown size threshold? This 'censored feedback' creates an identifiability issue—agents can't tell if they failed due to bad luck or bad coalition size. The authors formalize this as the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB), modeling the structural learning cost under both centralized and decentralized setups.
For centralized coordination, they propose C-TAC, which achieves cumulative regret O(log T) by separating the cost into structural search (finding the threshold) and statistical monitoring (estimating reward values). For the more practical decentralized setting, they introduce D-TAC—an event-triggered protocol where agents only communicate when their structural beliefs change. Empirically, D-TAC delivers a 23x reduction in communication while maintaining near-centralized alignment on feasibility. These results characterize the fundamental cost of learning under censorship and prove that near-optimal efficiency is achievable without constant syc.
- C-TAC achieves cumulative regret O(log T) with terms for structural search and statistical monitoring.
- D-TAC reduces communication overhead by 23x through event-triggered synchronization only when structural beliefs change.
- Addresses the identifiability problem in multi-agent systems with fully censored feedback.
Why It Matters
Enables efficient multi-agent coordination (swarms, sensor nets) under unknown coalition thresholds with minimal communication.