Research & Papers

FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning

New method improves AI training on private data while cutting communication costs by 10x and resisting attacks.

Deep Dive

A research team has introduced FedEMA-Distill, a novel server-side approach that significantly enhances federated learning (FL) performance while addressing its core challenges. The method combines an exponential moving average (EMA) of the global model with ensemble knowledge distillation, where clients upload only compressed prediction logits from a small public proxy dataset rather than full model weights. This architecture maintains compatibility with existing client-side implementations while supporting heterogeneous model architectures across devices—a crucial feature for real-world deployment where different devices may run different AI models. The technique specifically tackles the persistent problems of non-IID data distribution and adversarial clients that plague traditional FL systems.

The technical implementation delivers substantial improvements across multiple benchmarks. On CIFAR-10 and CIFAR-100 datasets with challenging Dirichlet-0.1 label skew, FedEMA-Distill achieves accuracy gains of up to 5% and 6% respectively compared to baseline methods, while reaching target accuracy in 30-35% fewer communication rounds. Perhaps most impressively, it reduces per-round client uplink payloads to just 0.09-0.46 MB—approximately an order of magnitude less than transmitting full model weights. The system incorporates coordinate-wise median or trimmed-mean aggregation of logits at the server, providing robust defense against up to 10-20% Byzantine (malicious) clients while maintaining well-calibrated predictions under attack. This combination of temporal smoothing and logits-only aggregation creates a deployment-friendly pipeline that's compatible with secure aggregation and differential privacy frameworks, positioning FedEMA-Distill as a practical solution for privacy-preserving AI training at scale.

Key Points
  • Improves accuracy by up to 6% on CIFAR-100 under non-IID data conditions
  • Reduces client upload payloads by 10x to just 0.09-0.46 MB per communication round
  • Resists attacks from up to 20% malicious clients while supporting heterogeneous model architectures

Why It Matters

Enables more efficient, secure, and accurate AI training across devices without compromising user privacy or requiring software updates.