Compass: Optimizing Compound AI Workflows for Dynamic Adaptation
New system dynamically switches AI configurations to handle variable loads, cutting latency while maintaining accuracy.
A team of researchers has introduced Compass, a novel framework designed to solve a critical bottleneck in deploying compound AI systems. These systems, which orchestrate multiple specialized AI models and software components into cohesive workflows, often run on fixed infrastructure where simply adding more servers (horizontal scaling) isn't an option. Traditionally, such deployments use a single, static configuration optimized for either maximum accuracy or minimum latency, forcing a trade-off that fails under variable user loads. Compass tackles this by enabling dynamic adaptation, allowing a system to intelligently switch between pre-optimized configurations at runtime to balance accuracy, latency, and cost as demand fluctuates.
The framework operates in two phases. First, the COMPASS-V algorithm performs an offline search to discover multiple "Pareto-optimal" configurations—different ways to run the workflow that offer the best possible trade-offs between accuracy and speed. This search uses a guided method that reduces the number of configurations needing evaluation by 57.5% compared to an exhaustive search, with efficiency gains reaching 95.3% for strict accuracy requirements. Second, at runtime, the Elastico controller monitors incoming request queues. Using policies derived from queuing theory, it automatically switches the live system to a faster, slightly less accurate configuration when traffic builds up to prevent slowdowns, and back to a high-accuracy mode when load subsides.
In testing across two compound AI workflows, this dynamic approach proved highly effective. It achieved between 90% and 98% compliance with Service Level Objectives (SLOs) for latency under changing load patterns. This represents a 71.6% improvement over using a static, high-accuracy configuration. Notably, it also managed to improve accuracy by 3-5% over static configurations optimized purely for speed, demonstrating that smart adaptation can enhance both performance and reliability. The work, accepted at the IEEE CCGrid 2026 conference, provides a crucial tool for making complex, multi-model AI applications more robust and cost-effective in production.
- COMPASS-V algorithm reduces configuration search evaluations by 57.5% on average using finite-difference guided search.
- Runtime Elastico controller improves SLO compliance by 71.6% over static baselines by dynamically switching configurations based on queue depth.
- System enables a 3-5% accuracy improvement over static fast configurations while maintaining high performance under variable load.
Why It Matters
Enables reliable, cost-effective deployment of complex AI applications on fixed infrastructure, crucial for enterprise adoption.