Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference
Cloud-based DNN inference matches or surpasses on-device performance for safety-critical control loops.
Pragya Sharma, Hang Qiu, and Mani Srivastava revisited the assumption that cloud inference is too slow for real-time control. They developed an analytical model of distributed inference latency, factoring in sensing frequency, platform throughput, network delay, and safety constraints. Using emergency braking for autonomous driving as a test case, simulations showed cloud inference can adhere to safety margins more reliably than on-device inference under certain conditions, challenging prevailing design strategies.
- Authors developed a formal analytical model for distributed inference latency considering sensing frequency, throughput, network delay, and safety constraints.
- Simulations on emergency braking for autonomous driving showed cloud inference can meet safety margins more reliably than on-device inference under specific conditions.
- The paper challenges the traditional preference for on-device inference by showing that high-throughput cloud resources can amortize network and queueing delays.
Why It Matters
This could reshape how autonomous vehicles and other CPS balance compute between edge and cloud, potentially lowering hardware costs.