Robotics

Kairos: New system cuts Physical AI latency by up to 66.5%

Digital serving fails robots. Kairos rethinks inference for physical AI fleets.

Deep Dive

Physical AI is booming, but its inference patterns are fundamentally different from digital AI—tasks involve multiple rounds of inference interleaved with action execution, generating chunks of actions each round. Existing digital AI serving systems are ill-suited for this, creating a critical bottleneck for robot fleets. Enter Kairos, designed by a team of researchers from academia and industry (including Yinwei Dai, Ganesh Ananthanarayanan, Landon Cox, Xenofon Foukas, Bozidar Radunovic, and Ravi Netravali). Kairos is the first serving system that makes the generate-execute loop a first-class citizen, actively managing the execution phase rather than treating it as an afterthought.

Across a wide range of physical AI models and robots, Kairos slashes average end-to-end task latency by 31.8% to 66.5% compared to the best digital AI serving practices. Notably, the latency reduction grows as the robot fleet scales, making it ideal for large deployments. By addressing the unique demands of physical AI—asynchronous inference, action batching, and tight coordination—Kairos unlocks faster, more scalable robot operations. For professionals deploying robotics at scale, this could mean quicker cycle times and more efficient use of compute resources.

Key Points
  • First multi-robot serving system designed specifically for Physical AI's multi-round inference patterns.
  • Reduces end-to-end task latency by 31.8% to 66.5% over state-of-the-art digital AI serving systems.
  • Performance gains increase with robot fleet size, enabling scalable deployment across general environments.

Why It Matters

Kairos removes a key bottleneck for physical AI, enabling faster and more scalable robot fleets in real-world automation.