Hestia: Hyperthread-Level Scheduling for Cloud Microservices with Interference-Aware Attention
New scheduling framework analyzes 32,408 microservice instances to predict CPU contention with self-attention.
A research team from Shanghai Jiao Tong University has introduced Hestia, a novel AI-powered scheduling framework that tackles the critical challenge of performance interference in cloud microservices. Accepted for publication at DAC 2026, Hestia addresses the inefficiencies of existing schedulers that rely on coarse core-level profiling by operating at the hyperthread level. The system was developed after analyzing massive production traces encompassing 32,408 microservice instances across 3,132 servers, identifying two dominant contention patterns: sharing-core (SC) and sharing-socket (SS). This granular analysis revealed strong asymmetry in how these patterns impact performance, a nuance previous methods missed.
Hestia's core innovation is a self-attention-based CPU usage predictor that models both SC/SS contention and underlying hardware heterogeneity. This allows it to estimate pairwise contention risks between microservices and make intelligent placement decisions. The framework was evaluated through large-scale simulation and a real production deployment, where it demonstrated remarkable results: reducing the 95th-percentile service latency by up to 80% and lowering overall CPU consumption by 2.3% under identical workloads. It surpassed five state-of-the-art schedulers by up to 30.65% across diverse contention scenarios. This represents a significant leap in cloud resource efficiency, enabling providers to pack more services onto servers without sacrificing the latency guarantees critical for user-facing applications.
- Reduces 95th-percentile service latency by up to 80% through hyperthread-level placement
- Lowers overall CPU consumption by 2.3% by modeling sharing-core and sharing-socket contention
- Outperforms five state-of-the-art schedulers by up to 30.65% in diverse scenarios
Why It Matters
Enables cloud providers to dramatically improve server utilization and performance for latency-sensitive applications like real-time APIs.