Robotics

RFM-HRI : A Multimodal Dataset of Medical Robot Failure, User Reaction and Recovery Preferences for Item Retrieval Tasks

A new multimodal dataset reveals how 41 users react when medical robots fail during critical item retrieval tasks.

Deep Dive

A research team led by Yashika Batra has published the Robot Failures in Medical HRI (RFM-HRI) Dataset, a first-of-its-kind public resource for studying how humans react when medical assistance robots fail. The dataset captures 214 multimodal interaction samples from 41 participants across lab and hospital settings, where a robot embodied in a crash cart systematically experienced one of four failure types during item retrieval tasks. These failures—speech, timing, comprehension, and search—were derived from three years of real-world crash-cart interaction data. The dataset includes rich signals like facial action units, head pose, speech transcripts, and post-interaction self-reports, creating a comprehensive picture of user state.

Analysis of the RFM-HRI data reveals that robot failures have a significant negative impact on users. Compared to successful interactions, failures degraded affective valence and reduced users' perceived control. Failures were strongly associated with emotions like confusion, annoyance, and frustration, while successful interactions elicited surprise, relief, and confidence. Crucially, the study found that emotional responses evolve with repeated failures: confusion decreased while frustration increased over time. The work also documents user preferences for how robots should recover from these errors, providing concrete guidance for system designers.

The RFM-HRI dataset and its accompanying analysis offer a critical foundation for the next generation of robust human-robot interaction (HRI) systems, particularly in high-stakes environments like healthcare. By understanding the specific behavioral signatures of user frustration and preferred recovery paths, engineers can develop robots capable of real-time failure detection and context-aware recovery strategies. This moves the field beyond simple task completion metrics toward building robotic assistants that maintain user trust and cooperation even when things go wrong, which is inevitable in complex real-world deployments.

Key Points
  • Contains 214 multimodal interaction samples capturing facial cues, speech, and self-reports from 41 users during robot failures.
  • Identifies four key failure types (speech, timing, comprehension, search) that increase user frustration and reduce perceived control.
  • Shows emotional responses shift over time, with confusion decreasing and frustration increasing after repeated failures.

Why It Matters

Provides the data needed to build medical robots that can detect their own failures and recover gracefully, preserving user trust in critical scenarios.