Research & Papers

ModTrack: Sensor-Agnostic Multi-View Tracking via Identity-Informed PHD Filtering with Covariance Propagation

A new modular tracking system achieves 95.5 IDF1 on WildTrack, matching end-to-end models while being sensor-agnostic.

Deep Dive

A research team has introduced ModTrack, a novel modular framework for Multi-View Multi-Object Tracking (MV-MOT) that challenges the dominance of complex, end-to-end neural models. The system, developed by Aditya Iyer, Jack Roberts, and Nora Ayanian, confines machine learning solely to the initial detection and feature extraction stage. For the critical tasks of sensor fusion, data association, and temporal tracking, ModTrack employs principled, closed-form analytical methods. This design reduces each sensor's output to calibrated position-covariance pairs, which are then fused using precision-weighted techniques to create unified, uncertainty-aware estimates for each object.

This modular approach yields remarkable performance and flexibility. On the challenging WildTrack dataset, ModTrack scored 95.5 IDF1 (a metric for identity consistency) and 91.4 MOTA (Multi-Object Tracking Accuracy), surpassing all prior modular methods by over 21 points and rivaling the best end-to-end models. Crucially, the same analytical tracking core can be transferred without modification to entirely new datasets like MultiviewX and RadarScenes. To adapt to new sensor modalities—such as switching from cameras to radar—only the front-end perception module needs replacement, not the entire system. The tracker uses an identity-informed Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter to maintain object identities through heavy occlusion and missed detections, providing traceable uncertainty estimates that black-box neural models lack.

Key Points
  • Achieves 95.5 IDF1 & 91.4 MOTA on WildTrack, rivaling end-to-end SOTA models while being fully modular.
  • Employs a closed-form, identity-informed GM-PHD filter for tracking, providing principled uncertainty estimates missing in neural approaches.
  • The same tracker core transfers unchanged across datasets (MultiviewX, RadarScenes); only the perception module needs swapping for new sensors.

Why It Matters

Enables robust, real-world tracking systems that can generalize across sensor types and layouts without costly retraining, crucial for autonomous vehicles and security.