New EMA system cuts ML adaptation costs by up to 42% for dynamic systems
Reduces GPU training time by 14.9–42.4% while boosting network throughput up to 31.3%
Machine learning is increasingly used to optimize system performance in tasks like resource management and network simulation—but these systems often operate in heterogeneous, long-running, and dynamic environments where input conditions and objectives shift over time. Traditional approaches require costly retraining and extensive data collection, leading to degraded performance and slow responsiveness. The new paper, presented by Daiyang Yu, Xinyu Chen, Yihan Zhang, Yan Liang, Yaqi Qiao, and Fan Lai, introduces EMA (Efficient Model Adaptation), a system-driven, data-centric framework that tackles these challenges head-on. EMA uses state transformers to map a new environment's input state to previously similar states, allowing models to warm-start adaptation without full retraining. It also addresses the expensive process of data labeling by prioritizing high-utility data, balancing the tradeoff between training and labeling cost.
Evaluated on eight representative learning-based systems, EMA delivers significant gains: it reduces adaptation costs (e.g., GPU training time) by 14.9–42.4% while improving system performance (e.g., network throughput) by 6.9–31.3%. The paper is set to appear at SIGCOMM 2026 and demonstrates that practical adaptation for ML systems is achievable without massive overhead. For professionals deploying ML in networked or dynamic environments, EMA offers a concrete method to keep models effective as conditions evolve, potentially reducing operational costs and improving service quality.
- EMA is the first model adaptation system designed for learning-based systems in dynamic, long-running environments.
- Introduces state transformers to align input states across environments, enabling warm-start adaptation that reduces GPU training time by 14.9–42.4%.
- Prioritizes high-utility data labeling to balance training and labeling costs, improving system performance (e.g., network throughput) by 6.9–31.3% across eight benchmarks.
Why It Matters
Makes ML in networked systems practical by slashing adaptation costs and boosting performance in shifting environments.