Surviving the Edge: Federated Learning under Networking and Resource Constraints
New chaos engineering study reveals TCP handshake timeouts cause catastrophic FL failure at the edge.
Researchers have systematically mapped federated learning (FL) breaking points under extreme network constraints using the Flower framework and chaos engineering. They found FL fails catastrophically at 5-second one-way latency (TCP handshake timeouts), above 50% packet loss (buffer exhaustion), and 90% client dropout rates. Adjusting just three TCP parameters significantly reduced training time under extreme latency, providing concrete thresholds for edge deployments.
- FL fails at 5-second one-way latency due to TCP handshake timeouts – a fundamental mismatch with burst-idle traffic patterns.
- Packet loss above 50% triggers buffer exhaustion, causing catastrophic training failure.
- Adjusting just three TCP management parameters (e.g., handshake retries, timeout intervals) significantly reduces training time under extreme latency.
Why It Matters
As FL moves to edge devices in emerging markets, these thresholds guide when to add reliability techniques or tune network stacks.