Robotics

LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

arXiv cs.RO March 05, 2026

⚡New system achieves 150ms latency for vision-language-action models on embedded Jetson hardware.

Deep Dive

A research team including Justin Williams and Kishor Datta Gupta has published a paper introducing LiteVLA-Edge, a practical systems framework for deploying compact Vision-Language-Action (VLA) models directly on embedded robotic hardware. The core announcement is a fully on-device inference pipeline that achieves reactive control speeds, with a mean end-to-end latency of 150.5 milliseconds, translating to an operational frequency of approximately 6.6 Hz. This work addresses a critical bottleneck in robotics, where many powerful multimodal AI models remain tethered to cloud servers due to their computational demands, creating issues with latency, reliability, and privacy for real-world applications.

The technical contribution is not a new AI model architecture, but rather an optimized deployment path. The system takes a standard VLA model, fine-tunes it in FP32 precision, and then applies aggressive 4-bit GGUF quantization—a technique that drastically reduces model size and memory footprint. It leverages the efficient llama.cpp runtime for GPU-accelerated inference on cost-effective, power-constrained hardware like the NVIDIA Jetson Orin. Crucially, LiteVLA-Edge maintains modular interfaces between perception, reasoning, and action modules, allowing it to integrate seamlessly into existing Robot Operating System 2 (ROS 2) workflows. This provides a reproducible baseline that proves the timing feasibility for language-guided robotic tasks like "pick up the blue block" to be processed locally, paving the way for more autonomous and responsive machines in factories, warehouses, and homes.

Key Points

Achieves 150.5ms mean latency (6.6Hz) for full VLA inference on embedded NVIDIA Jetson Orin hardware.
Uses 4-bit GGUF quantization and the llama.cpp runtime to enable efficient, offline on-device execution.
Provides a modular, ROS 2-integrated pipeline, establishing a practical baseline for reactive language-conditioned robot control.

Why It Matters

Enables robots to understand and act on natural language commands in real-time without an internet connection, critical for reliable automation.

Read Original Article

LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

Why It Matters

Stay Ahead in AI