Human Pose Estimation in Trampoline Gymnastics: Improving Performance Using a New Synthetic Dataset
A new synthetic dataset of trampoline flips improves 3D pose estimation accuracy by nearly 20%.
A research team led by Léa Drolet-Roy from Université de Montréal has published a novel method to significantly improve AI's ability to track human poses in extreme athletic scenarios. Their paper, "Human Pose Estimation in Trampoline Gymnastics: Improving Performance Using a New Synthetic Dataset," addresses a core weakness in state-of-the-art computer vision models like ViTPose, which struggle with the unconventional viewpoints and dynamic flips inherent to trampoline routines. The team's solution was to generate a specialized Synthetic Trampoline Pose (STP) dataset.
They developed a pipeline that starts with motion capture recordings of actual trampoline routines. This noisy mocap data is then fitted to a parametric human model (SMPL) and used to generate multi-view, photorealistic synthetic images. By fine-tuning the pre-trained ViTPose model exclusively on this synthetic data, they achieved state-of-the-art 2D results on real trampoline footage. Most importantly, this 2D accuracy translated to a major leap in 3D performance after triangulation, reducing the Mean Per Joint Position Error (MPJPE) by 12.5 millimeters. This represents a 19.6% accuracy improvement over the base model, effectively closing the performance gap for extreme poses.
The work demonstrates a powerful and efficient framework for domain adaptation in computer vision. Instead of the costly and difficult process of collecting and manually annotating thousands of real-world trampoline images, the team generated limitless, perfectly labeled synthetic data. This approach not only solves a specific problem in sports analytics but also provides a blueprint for improving AI perception in any niche domain with scarce or challenging real-world data, from industrial safety to wildlife monitoring.
- Fine-tuning ViTPose on synthetic data reduced 3D pose error by 12.5 mm (19.6% improvement).
- The Synthetic Trampoline Pose (STP) dataset was generated from mocap data fitted to a parametric human model.
- The method bridges the performance gap for extreme poses, achieving SOTA results on challenging real-world trampoline footage.
Why It Matters
This proves synthetic data can cheaply solve niche AI vision problems, with applications from sports tech to robotics and biomechanics.