SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning
New system lets clients set custom cut layers, reducing communication overhead by 40%.
SplitFT, developed by Yimeng Shan, Zhaorui Zhang, Sheng Di, and colleagues, tackles two core problems in federated fine-tuning of large language models: device heterogeneity and communication cost. Unlike prior systems that force a fixed cut layer on all clients, SplitFT adaptively selects cut layers per client based on its GPU memory, network bandwidth, and data characteristics. This means a client with a weaker GPU can offload more layers to the server, while a powerful client can keep more layers local. The system also reduces the LoRA rank at the cut layer, shrinking the data that must be sent between client and server by up to 60% in experiments.
The team also introduces a length-based Dirichlet distribution to simulate realistic non-IID data splits across clients, addressing a gap in prior benchmarks. Evaluated on models like LLaMA-2-7B and Mistral-7B, SplitFT achieves 1.5x faster fine-tuning time compared to the strongest baseline (Federated Split Learning with fixed cut layers) while maintaining or improving accuracy on GLUE, SQuAD, and other benchmarks. The paper is available on arXiv (2604.26388) and represents a practical step toward making LLM fine-tuning feasible for resource-constrained organizations, such as hospitals or banks that cannot share raw data.
- SplitFT adaptively selects cut layers per client based on compute resources and model performance, solving device heterogeneity.
- Reducing LoRA rank at the cut layer cuts communication overhead by up to 60% without sacrificing accuracy.
- New length-based Dirichlet data split better simulates real-world non-IID distributions, improving benchmark realism.
Why It Matters
Enables resource-constrained organizations to fine-tune LLMs collaboratively without sharing raw data, unlocking private AI for healthcare and finance.