Relation Reasoning with LLMs in Expensive Optimization
No more retraining: RL-tuned Qwen2.5 solves EOPs 2x faster on edge devices
A new paper from Ye Lu, Bingdong Li, Aimin Zhou, and Hao Hao introduces R2SAEA, a reinforcement-trained relation-based large language model (LLM) surrogate designed for expensive optimization problems (EOPs). These black-box tasks have costly objective evaluations and no gradient access, making evaluation budgets critical. Traditional surrogate-assisted evolutionary algorithms (SAEAs) reduce evaluations but require frequent retraining, adding overhead. R2SAEA bypasses this by treating surrogate modeling as an in-context pairwise reasoning task, using an anchor-based iterative context construction strategy that slashes prompt complexity from quadratic O(n²) to linear O(n) in population size. A voting-based aggregation scheme converts predicted relations into scores for offspring selection, enabling efficient inference within evolutionary loops.
The team fine-tuned Qwen2.5 using Group Relative Policy Optimization (GRPO) on evolutionary trajectory data, producing a model that outperforms strong SAEA baselines and general-purpose LLMs on single- and multi-objective benchmarks. Notably, quantization allows deployment on edge devices with minimal performance loss, supporting a zero-shot surrogate paradigm that eliminates per-generation retraining. This breakthrough makes LLM-powered optimization practical for real-world engineering, finance, and design tasks where evaluation budgets are tight and computational resources limited.
- Anchor-based context construction reduces prompt complexity from O(n²) to O(n) in population size, enabling efficient evolutionary loop inference
- Fine-tuned Qwen2.5 with GRPO achieves state-of-the-art optimization performance over strong SAEA baselines and general LLMs
- Quantization enables efficient edge deployment, supporting a zero-shot surrogate paradigm without per-generation retraining
Why It Matters
LLMs can now optimize expensive real-world problems without retraining, cutting costs and enabling deployment on edge devices.