Sample Selection Using Multi-Task Autoencoders in Federated Learning with Non-IID Data
Multi-task autoencoders filter noisy data in federated learning, improving accuracy up to 7.02%.
Federated learning enables collaborative model training across devices while preserving data privacy, but its performance suffers from noisy, redundant, or malicious samples. In a new paper published in Engineering Science and Technology, an International Journal, researchers Emre Ardıç and Yakup Genç introduce sample selection methods using multi-task autoencoders to address this issue. Their approach estimates sample contributions through loss and feature analysis, employing unsupervised outlier detection techniques like one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) managed by a central server to filter noisy samples on client devices. They also propose a multi-class deep support vector data description (SVDD) loss to enhance feature-based selection.
Validated on CIFAR10 and MNIST datasets across varying client counts, non-IID distributions, and noise levels up to 40%, the methods showed significant accuracy improvements. Loss-based selection with OCSVM achieved gains up to 7.02% on CIFAR10, while adaptive threshold methods yielded 1.83% on MNIST. The federated SVDD loss further improved feature-based selection by up to 0.99% on CIFAR10 with OCSVM. This work offers a practical framework for improving model robustness in decentralized environments, with code available on GitHub.
- Multi-task autoencoders filter noisy samples via loss and feature analysis
- Accuracy gains up to 7.02% on CIFAR10 using loss-based selection with OCSVM
- Validated across varying client counts and noise levels up to 40%
Why It Matters
This method improves federated learning robustness, enabling more reliable AI training on decentralized, noisy data.