Robotics

Researchers' New Framework Lets Robots Ask for Help to Fix Reward Misalignment

Robots detect underspecified features and request targeted demonstrations in natural language.

Deep Dive

Learning reward functions from human demonstrations typically assumes the demonstrations cover all task-relevant features. In practice, humans often under-emphasize certain features due to cognitive load or physical difficulty, or the training data lacks coverage of edge cases. This leads to underspecified features and misaligned robot behavior at deployment. The researchers' key insight is that well-specified features show low variation across demonstrations, while underspecified features vary widely. Using this statistical signal, the robot identifies which features are ambiguous, then explains its uncertainty in natural language and actively solicits targeted corrective demonstrations that address the specific gaps.

In evaluations on a simulated tabletop manipulation domain and a user study with a real Franka robot, the targeted, explanation-guided queries significantly outperformed both random querying and passive data collection. The approach reduced ambiguity that would otherwise persist when learning from imperfect demonstrations, leading to more accurate reward recovery and aligned behavior. This framework moves beyond passive imitation learning toward an interactive, collaborative process where robots can proactively clarify human intent, making them safer and more reliable in real-world settings where demonstrations are rarely perfect.

Key Points
  • Detects underspecified features by analyzing variance across demonstrations
  • Robot explains uncertainty in natural language and requests specific corrective demos
  • Significantly improves reward recovery over random queries in simulation and with a real Franka robot

Why It Matters

Enables robots to learn reliably from imperfect human demos, reducing costly misalignment at deployment.