Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models
New method treats prompts as 'agents' that hunt for optimal configurations using a bio-inspired leader-follower algorithm.
A research team led by Xudong Wang has published a paper titled 'Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models,' accepted to ACL 2026. The work addresses a critical bottleneck in AI: the heavy reliance on manually crafted, static prompts for complex reasoning tasks. Current methods are sensitive to decoding settings and task variations, causing performance to fluctuate. Existing automatic prompt optimizers also typically use a single-agent approach, which fails to simultaneously fine-tune both the prompt text and the model's decoding hyperparameters (like temperature) within one system.
To solve this, the team developed Agent-GWO, a novel framework that treats each combination of a prompt template and its associated decoding settings as an 'agent.' It then applies a bio-inspired Grey Wolf Optimizer (GWO) algorithm, which mimics the social hierarchy and hunting behavior of a wolf pack. In this simulation, the three best-performing agents are designated as leader wolves (alpha, beta, delta). These leaders guide the updates and movements of the remaining 'follower' agents through a collaborative search process, iteratively converging toward a robust, optimal configuration for a given task.
The results are significant. Extensive testing on multiple mathematical and hybrid reasoning benchmarks across various LLM backbones showed that Agent-GWO consistently delivers more accurate and stable performance than prior prompt optimization techniques. By automating the search for the perfect prompt-and-settings combo, it reduces manual tuning and makes LLM reasoning more reliable and transferable. The team has committed to releasing the code publicly, which could allow developers and researchers to integrate this advanced optimization directly into their AI workflows.
- Unifies prompt text and decoding hyperparameters (like temperature) into a single 'agent' configuration for simultaneous optimization.
- Uses a Grey Wolf Optimizer (GWO) algorithm, where three leader agents guide a collaborative search, mimicking a pack hunting for the best solution.
- Demonstrated consistent accuracy and stability improvements on complex reasoning benchmarks across multiple LLMs, outperforming existing single-agent methods.
Why It Matters
Automates the hunt for optimal LLM prompts, making complex AI reasoning more reliable, efficient, and less dependent on expert manual tuning.