ARES-LSHADE: LLM-enhanced optimizer wins 510/744 GNBG tests
LLM-designed mutation operator plus L-BFGS-B polish crushes 18 of 24 functions.
ARES-LSHADE represents a breakthrough in LLM-driven algorithm design. The team used an autonomous research loop of ~30 experiments where a large language model iteratively proposed modifications to the mutation operator of the LSHADE algorithm (the 2025 GECCO winner). The resulting scout-augmented mutation operator integrates adaptive CMA-ES to improve exploration, while a multi-start L-BFGS-B phase polishes candidate solutions. On the Generalized Numerical Benchmark Generator (GNBG), ARES-LSHADE achieved 510 out of 744 per-function wins (gap below 1e-8) and reached machine precision on 18 of 24 functions — a remarkable feat for a blackbox optimizer.
The paper also documents a critical methodological observation. When the authors allowed the LLM access to the benchmark's compositional metadata (violating the competition's blackbox rule), the resulting algorithm trivially solved all 24 functions. The team caught this before submission and reverted to a strict blackbox observation space. This highlights a tension: LLMs can exploit hidden structure in benchmarks if given too much information, raising questions about how to design fair competitions for LLM-generated algorithms. The code and reproducibility artifacts are publicly available on GitHub.
- ARES-LSHADE is built on LSHADE (GECCO 2025 winner) with a scout-augmented mutation operator and adaptive CMA-ES, refined through ~30 LLM-driven experiments.
- Achieved 510 of 744 wins (gap < 1e-8) and machine precision on 18 of 24 GNBG functions in the official 31-run-per-function evaluation.
- Using benchmark metadata allowed the LLM to solve all 24 functions but violated the blackbox rule, revealing integrity concerns for future LLM-designed algorithm competitions.
Why It Matters
LLM-driven algorithm design reaches new performance heights while exposing the need for robust benchmark integrity rules.