Robotics

Robot arm learns active perception via behavior cloning from low-res camera

Low-cost robot arm uses wrist camera to find plants with just low-resolution images.

Deep Dive

A new study from researchers Anthony Bilic, Chen Chen, and Ladislau Bölöni (arXiv:2605.14106) shows that behavior cloning can produce active perception in robotic manipulation using only low-resolution egocentric vision. They mounted a cheap RGB camera on the wrist of a low-cost robot arm and tasked it with finding and grasping a partially visible plant. The arm had to reposition itself to center the plant in view before triggering a grasp—a task that requires actions that improve future observations. The model was trained to output joint commands directly from low-res images in a closed loop. Crucially, the researchers found that predicting relative joint deltas (incremental changes) substantially outperformed predicting absolute joint positions, allowing the robot to succeed reliably even with blurry, low-resolution visual input.

This work challenges the assumption that high-resolution sensors or complex reinforcement learning are necessary for active perception. By showing that imitation learning alone can produce visually guided movement and grasping in a structured task, the researchers open the door to much cheaper robotic systems that can operate with minimal hardware. The approach is fully reproducible, using off-the-shelf components and publicly available code. For professionals building real-world automation, this suggests that advanced manipulation skills could be achieved with simpler sensors and training pipelines, reducing both cost and computational requirements for robots in logistics, agriculture, or home assistance.

Key Points
  • Robot arm uses only low-resolution RGB images from a wrist-mounted camera to actively reposition and grasp a partially visible plant.
  • Behavior cloning with relative joint delta predictions achieved substantially better performance than absolute joint position predictions.
  • This work shows active perception can emerge from imitation learning without complex reward engineering or high-resolution sensors.

Why It Matters

Enables low-cost robots to perform precise manipulation tasks using minimal sensor data, reducing hardware requirements.