New method optimizes LLM multi-agent prompts with temporal and structural credit assignment
Researchers slash query costs while boosting reasoning accuracy across multi-agent LLM teams.
Multi-agent systems (MAS) amplify LLM reasoning but suffer from an optimization bottleneck: the discrete, non-differentiable computation graph and sparse global feedback make it hard to pinpoint why a team failed. A new paper from Li et al. (arXiv, May 2026) tackles this head-on by introducing a structured credit assignment framework. The authors decompose the objective along two axes: temporal credit, which uses state-space bottlenecks to flag critical decision rounds, and structural credit, which leverages stationary role policies to isolate each agent's contribution.
This decomposition feeds into a discrete, verbalized block coordinate descent algorithm. Instead of blind global updates, the method alternates between optimizing role prompts and aggregation protocols, using LLM-generated 'proxy gradients' to target only the identified weak links. Results across diverse reasoning benchmarks show substantial reductions in query complexity alongside performance gains, offering a principled path toward self-improving multi-agent systems.
- Temporal credit assignment uses state-space bottlenecks to identify critical rounds in multi-agent interaction.
- Structural credit isolates agent contributions via stationary role policies, enabling targeted prompt updates.
- Verbalized block coordinate descent with LLM proxy gradients reduces query complexity while improving performance on reasoning benchmarks.
Why It Matters
Enables efficient self-optimization of LLM teams, unlocking smarter multi-agent collaboration without brute-force compute.