Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey
A new 45-page academic survey details how to run massive AI models like LLMs on phones, cars, and factories.
A major academic survey, published in the prestigious IEEE Communications Surveys & Tutorials journal, provides a crucial roadmap for the next generation of AI infrastructure. Authored by a team of 11 researchers led by Jing Liu, the 45-page paper, 'Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey,' systematically examines how to efficiently run powerful AI models—including massive large language models (LLMs)—across a hybrid network of centralized cloud servers and decentralized edge devices like smartphones, autonomous vehicles, and industrial sensors. It tackles the core challenge of balancing immense computational demands with the need for low-latency, real-time processing and energy efficiency in applications that cannot rely solely on the cloud.
The survey offers a structured tutorial on fundamental system architectures and then dives deep into critical technical approaches. It analyzes model optimization techniques such as compression (making models smaller), adaptation (tailoring models to specific devices), and neural architecture search (automated model design). Furthermore, it explores AI-driven strategies for managing computational resources dynamically and addresses the paramount concerns of privacy and security in distributed systems. The paper evaluates practical deployments across sectors like autonomous driving, healthcare, and smart factories, and establishes performance benchmarking standards. Crucially, it identifies key future research directions, including deploying LLMs at the edge, integration with 6G networks, and the potential of neuromorphic and quantum computing, providing a clear agenda for overcoming persistent hurdles in system heterogeneity and scalability.
- The 45-page IEEE survey provides a complete tutorial on architectures and tech for running AI models across cloud and edge devices, analyzing over 10 tables and 13 figures of data.
- It details specific model optimization methods like compression and adaptation to deploy LLMs and deep learning models in latency-sensitive applications like autonomous vehicles.
- The paper identifies critical future research vectors including LLM deployment at the edge, 6G network integration, and the role of emerging neuromorphic and quantum computing.
Why It Matters
This survey is the essential technical blueprint for engineers building the low-latency, efficient AI systems that will power everything from real-time translation on devices to autonomous industrial robots.