Robotics

EpisodeVault: Open-source tool to debug LeRobot model regressions

Identify exactly which episodes caused your model to regress with one command.

Deep Dive

Rohan-Prabhakar released EpisodeVault, an open-source Python library designed to pinpoint why a LeRobot model's performance degraded after retraining. The tool addresses a common pain point: developers often know a model got worse but lack visibility into which specific dataset changes caused the regression. EpisodeVault provides four commands: `track` to initialize a dataset, `commit` to version it, `diff` to compare two versions with detailed metrics, and `blame` to trace a model back to its training dataset version.

In a real-world test on two LeRobot datasets, the diff command revealed that version 2.0 removed 7 episodes, with a 75% drop in kitchen_grasp episodes (4 to 1) and a success rate plunge from 0.88 to 0.38. The 'blame' command integrates with training scripts via `import episodevault as ev; ev.log_training_run(...)` and later outputs the dataset diff automatically. EpisodeVault is compatible with any local LeRobot v3 dataset and has been tested on four Hub datasets (e.g., aloha, so100) with parse times as low as 0.35s for 25 episodes. Install with `pip install episodevault` (Python 3.10+).

Key Points
  • Detects distribution shifts (e.g., 200% increase in factory_pick) and quality metrics (avg episode length, success rate, camera sync score).
  • 'Blame' command traces a model version back to its exact training dataset version and shows the diff automatically.
  • Tested on four real LeRobot datasets (aloha, so100) with fast parsing (0.35s for 25 episodes) and zero dependency on external tools.

Why It Matters

Saves robotics teams hours of debugging by quickly identifying dataset issues that cause model regressions.

📬 Get the top 10 AI stories daily