Research & Papers

Secure and Privacy-Preserving Vertical Federated Learning

New framework splits aggregator role across servers, drastically cutting MPC computation and communication overhead.

Deep Dive

A team of researchers led by Shan Jin has published a novel framework titled 'Secure and Privacy-Preserving Vertical Federated Learning' on arXiv. The work addresses a critical challenge in vertical federated learning (VFL), where different parties hold different features of the same data samples, and labels are not universally shared. The core innovation is an end-to-end privacy-preserving system that instantiates three efficient protocols for various deployment scenarios, covering both input and output privacy. This is achieved by fundamentally redesigning the FL architecture: instead of a single aggregator, the role is distributed among multiple servers. These servers then run secure multiparty computation (MPC) protocols to perform model and feature aggregation, with differential privacy (DP) applied to the final released model to provide a robust privacy guarantee.

Crucially, the researchers move beyond a naive solution where clients would delegate all training to run entirely within MPC between servers—an approach that is notoriously heavy in computation and communication. Their optimized framework supports both purely global and global-local model updates in a privacy-preserving manner, and is designed to drastically reduce the MPC overhead. The experimental results presented in the paper demonstrate the protocol's effectiveness, showing it is a practical step forward for enabling secure, multi-party AI training on vertically partitioned data without compromising sensitive raw information. This work sits at the intersection of cryptography (cs.CR), artificial intelligence (cs.AI), and distributed computing (cs.DC), pushing the boundary of what's possible in collaborative yet confidential machine learning.

Key Points
  • Distributes the FL aggregator role across multiple servers running MPC protocols for secure aggregation.
  • Applies differential privacy to the final model, ensuring output privacy alongside input protection.
  • Optimized design drastically reduces MPC computation/communication vs. naive approaches, supporting global and hybrid updates.

Why It Matters

Enables businesses like banks and retailers to collaboratively train powerful AI models on combined customer data without sharing raw, sensitive information.