QD-LLM evolves prompt embeddings, boosting LLM output diversity by 46%
Tiny 32K-parameter embeddings steer 70B+ LLMs to explore new solutions without retraining
Large Language Models often suffer from mode collapse, generating repetitive outputs that fail to explore valid solution spaces. To address this, researchers present QD-LLM, a parameter-efficient neuroevolution framework that evolves prompt embeddings—compact neural interfaces of only ~32K parameters—to steer frozen LLMs (70B+ parameters) within a Quality-Diversity (QD) optimization framework. This gradient-free approach enables behavioral steering without model fine-tuning, combining semantic and explicit feature characterization with formal coverage bounds (Theorem 1, NMI = 0.08 ± 0.02). Co-evolutionary variation operators, including targeted behavioral mutation via finite-difference gradient estimation, further enhance diversity.
On HumanEval (164 problems), MBPP, and creative writing benchmarks, QD-LLM achieves 46.4% higher coverage and 41.4% higher QD-Score than QDAIF (p < 0.001, 30 runs, Vargha-Delaney A = 0.94). The diverse archives produced by QD-LLM demonstrate clear downstream utility: they improve test generation by 34% (uncovering more edge cases) and fine-tuning data quality by 8.3% accuracy gain. The method validates across open-source LLMs including Llama-3-70B and Mistral-Large, establishing prompt embedding evolution as an effective paradigm that bridges neuroevolution and modern LLMs.
- Evolves only ~32K parameters per LLM to steer generation, leaving the 70B+ model frozen and untouched.
- Achieves 46.4% higher coverage and 41.4% higher QD-Score over QDAIF across code and creative writing benchmarks.
- Downstream gains include 34% more edge cases in automated test generation and 8.3% accuracy improvement in fine-tuning data.
Why It Matters
A practical, parameter-efficient fix for LLM mode collapse, boosting diversity without expensive fine-tuning.