MMoA framework uses LSTM recurrence to cut agent costs 4.6%
New recurrent MoA architecture dynamically activates fewer agents without losing accuracy.
The Mixture-of-Agents (MoA) framework aggregates outputs from multiple LLM agents to boost performance, but existing implementations rely on static routers that ignore temporal and contextual dependencies across aggregation layers. This leads to inefficient agent activation and suboptimal resource use. To address this, researcher Rui Chu introduces MMoA, a recurrent MoA architecture that integrates LSTM-based gating into the agent selection process. The recurrence router adaptively modulates agent contributions based on both current inputs and historical routing decisions, enabling more context-aware aggregation.
MMoA was evaluated on standard instruction-following benchmarks including AlpacaEval 2.0, MT-Bench, and Arena-Hard. Results show comparable accuracy to traditional MoA while reducing computational overhead by dynamically activating fewer agents. On AlpacaEval 2.0, MMoA achieves a win rate of 58.0%, compared with 59.8% for MoA, while improving runtime efficiency by up to 4.6%. These findings suggest MMoA provides a scalable and efficient approach for adaptive multi-agent LLM systems, particularly valuable for production environments where cost and latency matter.
- MMoA replaces static routers with LSTM-based gating to capture temporal dependencies across aggregation layers.
- On AlpacaEval 2.0, MMoA achieves 58.0% win rate vs 59.8% for traditional MoA, a minimal 1.8% drop.
- Runtime efficiency improves up to 4.6% by dynamically activating fewer agents per query.
Why It Matters
MMoA offers a practical path to cheaper, faster multi-agent LLM inference without sacrificing much accuracy.