Research & Papers

Data-Free Contribution Estimation in Federated Learning using Gradient von Neumann Entropy

New technique uses gradient entropy to fairly reward clients without accessing their data

Deep Dive

Federated learning trains models across decentralized clients without centralizing raw data, but fairly estimating each client's contribution remains a challenge. Traditional methods rely on server-side validation data or self-reported metrics, both of which risk privacy leakage or manipulation. A new paper from MBZUAI researchers introduces a data-free alternative: measuring the matrix von Neumann (spectral) entropy of the final-layer gradient updates. This entropy signal captures the diversity of information contributed by each client, serving as an unbiased proxy for model improvement without requiring access to client data or metadata.

The authors instantiate two practical schemes: SpectralFed uses normalized entropy as aggregation weights, while SpectralFuse fuses entropy with class-specific alignment via a rank-adaptive Kalman filter for per-round stability. Across CIFAR-10/100, FEMNIST, and FedISIC benchmarks under diverse non-IID data distributions, entropy-derived scores consistently correlate highly with standalone client accuracy. Compared to existing data-free baselines, spectral entropy proves a robust indicator of contribution, enabling fairer federated aggregation and reward distribution without sacrificing privacy. The paper will appear at the CVPR 2026 FedVision Workshop.

Key Points
  • SpectralFed uses normalized von Neumann entropy of gradient updates as aggregation weights for fairer federated learning
  • SpectralFuse combines entropy with class-specific alignment via a rank-adaptive Kalman filter for stable per-round contribution estimation
  • Tested on CIFAR-10/100, FEMNIST, and FedISIC benchmarks, entropy scores correlate highly with client accuracy under non-IID data without validation data

Why It Matters

Enables fair, privacy-preserving client contribution estimation in federated learning without server-side data or self-reports