FedGUI: Benchmarking Federated GUI Agents across Heterogeneous Platforms, Devices, and Operating Systems
First comprehensive benchmark tackles real-world heterogeneity across mobile, web, and desktop platforms.
A research team led by Wenhao Wang has introduced FedGUI, the first comprehensive benchmark designed to train and evaluate federated GUI agents across heterogeneous real-world environments. The work addresses a critical gap: traditional centralized training for agents that interact with graphical user interfaces (GUIs) faces prohibitive costs and scalability issues, while federated learning (FL) has lacked proper benchmarks to handle the complexity of different platforms. FedGUI systematically provides six curated datasets to study four crucial dimensions of heterogeneity that agents encounter in practice: cross-platform (mobile vs. web vs. desktop), cross-device, cross-operating system, and cross-data source.
Extensive experiments with FedGUI yielded key insights that will guide future development. First, the research demonstrates that enabling collaboration across different platforms—such as having agents learn from both mobile and desktop interfaces—actually improves overall performance, extending the benefits of federated learning beyond mobile-only applications. Second, the benchmark quantifies the distinct impact of each heterogeneity dimension, identifying the platform (e.g., Android app vs. website) and the operating system as the two most influential factors affecting an agent's learning and performance. By making the code and data publicly available, FedGUI establishes a vital, standardized foundation for the research community to build more robust, scalable, and privacy-preserving AI agents capable of operating in the fragmented digital ecosystem.
- Introduces the first benchmark (FedGUI) for federated GUI agents across mobile, web, and desktop platforms.
- Provides six datasets to study four key heterogeneity types: platform, device, OS, and data source.
- Finds cross-platform collaboration boosts performance and identifies platform & OS as top influencing factors.
Why It Matters
Enables development of scalable, privacy-preserving AI assistants that can reliably operate across any app or website.