CANTANTE optimizes multi-agent LLM prompts via credit attribution, beating baselines by 18.9 points
New method solves the credit assignment problem for agentic AI systems.
Multi-agent LLM systems excel at complex tasks like software engineering and retrieval-augmented generation, but their configuration remains a manual, brittle process. The core bottleneck is credit assignment: local agent behaviors contribute to a global score, but researchers cannot easily trace which agent helped or hurt. CANTANTE tackles this by treating agent prompts as learnable parameters optimized via task rewards. Its algorithm uses local optimizers to suggest prompt variants, runs contrastive rollouts on the same queries, then employs an attributer to decompose the global reward into per-agent credit signals. These signals are fed to a local optimizer (here CAPO, from prior AutoML work), enabling data-driven prompt updates.
CANTANTE was evaluated against the DSPy-solutions GEPA and MIPROv2 on three benchmarks: MBPP (programming), GSM8K (mathematical reasoning), and HotpotQA (retrieval). It achieved the best average rank, outperforming the strongest baseline by +18.9 points on MBPP and +12.5 on GSM8K. Crucially, inference time remained unchanged versus unoptimized prompts. This approach moves multi-agent systems from hand-crafted demos to truly autonomous, trustworthy configurations. The paper and open-source code are available on arXiv and GitHub.
- Solves credit assignment in multi-agent LLMs by decomposing global rewards into per-agent update signals using contrastive rollouts.
- Outperforms DSPy baselines (GEPA, MIPROv2) by up to 18.9 points on MBPP and 12.5 points on GSM8K.
- Maintains inference time cost compared to unoptimized prompts, making it practical for production use.
Why It Matters
Automates prompt engineering for reliable multi-agent AI, reducing manual trial-and-error and unlocking autonomous agent systems.