AP-BMM: Approximating Capability-Efficiency Pareto Sets of LLMs via Asynchronous Prior-guided Bayesian Model Merging
Asynchronous Bayesian model merging cuts optimization time in half while preserving accuracy.
A team of researchers from the University of Science and Technology of China and Anhui University have introduced AP-BMM (Asynchronous Prior-guided Bayesian Model Merging), a novel algorithm for approximating capability-efficiency Pareto sets in Large Language Models (LLMs). The method addresses key limitations in existing model merging techniques, which often rely on coarse model-level operators or synchronous optimization that struggles with uneven evaluation latency. AP-BMM uses a discrepancy-derived importance prior to initialize the surrogate geometry, reducing the need for black-box exploration, and an event-driven optimization loop based on pending-aware hypervolume improvement. This asynchronous approach allows the system to handle the high-dimensional fusion space more efficiently, achieving 40% higher hypervolume and broader coverage of the trade-off frontier compared to both synchronous layer-wise baselines and representative model-level merging methods. Under a common evaluation budget, AP-BMM also cuts wall-clock time by approximately 2x, making it a practical solution for real-world deployment.
The paper, currently available on arXiv, provides code for reproducibility and demonstrates the method's effectiveness across multiple LLM benchmarks. By focusing on layer-wise merging with asynchronous optimization, AP-BMM offers a more expressive and efficient way to navigate the trade-off between model capability (e.g., reasoning, accuracy) and efficiency (e.g., inference speed, memory usage). This is particularly relevant as organizations seek to deploy LLMs in resource-constrained environments without sacrificing performance. The approach could accelerate the development of customized models for specific tasks, enabling faster iteration and better resource allocation in AI research and production.
- AP-BMM achieves 40% higher hypervolume in Pareto-set approximations compared to synchronous Bayesian baselines.
- The method reduces wall-clock time by approximately 2x through asynchronous, event-driven optimization.
- Uses a discrepancy-derived importance prior to initialize surrogate geometry, avoiding black-box exploration.
Why It Matters
Enables faster, more efficient LLM merging for better performance-efficiency trade-offs in production.