Robotics

Stress tests reveal SPARK humanoid safety filters fail under real-world chaos

Researchers expose critical flaws in humanoid robot safety filters during adversarial stress testing

Deep Dive

A team of researchers led by Saurav Ghosh replicated the SPARK benchmark case G1SportMode_D1_WG_SO_v1 in the MuJoCo physics simulator to evaluate the robustness of six safety filters designed for humanoid robots: RSSA, RSSS, SSA, CBF, PFM, and SMA. These filters are meant to modify control actions that might violate collision-avoidance constraints, but nominal benchmark scores often hide weaknesses in harder environments. The team built a post-processing pipeline to convert raw SPARK logs into goal-tracking, minimum-distance, and collision-step metrics, providing a more detailed view of each filter's performance under controlled random seeds.

Stress tests introduced obstacle crowding, noisy distance estimates, and delayed obstacle information—conditions typical of real-world humanoid deployment. Results showed that no single filter excelled everywhere: some tracked the goal more closely while others reduced collision steps more effectively. Critically, safety behavior changed under stress, revealing failure modes that nominal benchmarks miss. The findings underscore that humanoid autonomy must be evaluated beyond standard benchmarks with metrics that expose weaknesses before deployment. This work provides a replicable methodology for stress-testing safety filters in high-dimensional, collision-prone environments.

Key Points
  • Replicated SPARK benchmark case G1SportMode_D1_WG_SO_v1 in MuJoCo for controlled adversarial stress testing
  • Tested six safety filters (RSSA, RSSS, SSA, CBF, PFM, SMA) under obstacle crowding, noisy distance, and delayed obstacle info
  • Safety behavior changed significantly under stress, with no single filter performing reliably across all conditions

Why It Matters

Humanoid robots need stress testing before real-world deployment to avoid dangerous safety filter failures.