Hephaes: Open-source ROS1/2 Logs to Parquet/TFRecord converter
New Python package transforms robotics bag/mcap files into analysis-ready formats with manifest tracking.
Developers yood2 and a collaborator have released Hephaes, an open-source Python package designed to address a specific pain point in robotics data processing. The tool converts ROS1 and ROS2 log files—specifically bag and mcap formats—into more accessible data structures like Parquet for analytical workflows and TFRecord for seamless integration with TensorFlow machine learning pipelines. Each conversion process automatically generates a manifest.json file that serves as a basic indexing layer, tracking output files and associated metadata to maintain dataset organization.
The project is currently in its early stages, built primarily to streamline the developers' own robotics data workflows. Looking ahead, the roadmap includes ambitious features like VLM (Vision-Language Model)-based tagging for automated, richer dataset annotation and support for converting live data streams directly from ROS, moving beyond just recorded logs. The developers are actively seeking blunt feedback from the robotics community on core questions: whether this solves real workflow pain points, if Parquet and TFRecord are the right target formats, what metadata should be included in the manifest, and the practical utility of live stream conversion. The package is available on GitHub and PyPI for testing and contribution.
- Converts ROS1/2 bag/mcap logs to Parquet for analysis & TFRecord for TensorFlow ML
- Generates a manifest.json file for indexing outputs and tracking metadata automatically
- Roadmap includes VLM-based tagging for annotation and live ROS stream conversion support
Why It Matters
It standardizes messy robotics log data into structured formats, accelerating development of ML models and data analysis.