Research & Papers

Floe: Federated Specialization for Real-Time LLM-SLM Inference

This new framework could make real-time AI on your phone faster and more private than ever.

Deep Dive

Researchers propose Floe, a hybrid federated learning framework that pairs a cloud-based LLM with lightweight Small Language Models (SLMs) on edge devices. It keeps personal data and fine-tuning on-device for privacy while using the cloud model for general knowledge. A key innovation is a logit-level fusion mechanism for real-time coordination. Experiments show Floe significantly improves performance and reduces inference latency under real-time constraints compared to baselines.

Why It Matters

It enables faster, more private, and personalized AI assistants on phones and IoT devices without sacrificing power.