FedSQ: Optimized Weight Averaging via Fixed Gating
New technique freezes structural knowledge to reduce training rounds while preserving accuracy in non-i.i.d. data scenarios.
A research team from Spanish universities has introduced FedSQ (Federated Structural-Quantitative learning), a novel approach to federated learning that addresses the persistent challenge of statistical heterogeneity. The method is specifically designed for cross-silo deployments where FL is typically warm-started from strong pretrained backbones like ImageNet-1K models. FedSQ's innovation lies in its DualCopy architecture, which separates structural knowledge (ReLU-like gating regimes) from quantitative knowledge (parameter values), based on evidence that structural knowledge stabilizes earlier during training.
FedSQ works by freezing a structural copy of the pretrained model to create fixed binary gating masks throughout federated fine-tuning, while only a quantitative copy undergoes local optimization and aggregation across rounds. This approach reduces learning to within-regime affine refinements, which significantly stabilizes the aggregation process under non-i.i.d. data distributions. Experiments conducted on two convolutional neural network backbones using both i.i.d. and Dirichlet splits demonstrate that FedSQ improves robustness and can reduce the number of training rounds needed to reach best validation performance compared to standard baselines, all while preserving accuracy in transfer learning scenarios.
The technique addresses a critical pain point in federated learning: client drift caused by statistical heterogeneity across different organizations' data. By fixing the gating structure early, FedSQ prevents the instability that typically arises from naive weight averaging when clients have significantly different data distributions. This makes the method particularly valuable for real-world applications where data privacy concerns prevent centralized training but where organizations still need to collaboratively improve models on diverse, non-uniform datasets.
- Uses DualCopy architecture separating structural and quantitative knowledge, freezing structural copy to create fixed gating masks
- Reduces rounds-to-best validation performance while maintaining accuracy in transfer learning with pretrained backbones
- Improves stability under heterogeneous data partitions (non-i.i.d.) common in cross-silo federated learning deployments
Why It Matters
Enables more stable and efficient collaborative AI training across organizations without sharing sensitive data, crucial for healthcare and finance.