Open-source pipeline uses ViT for fine-grained vehicle classification at 94% accuracy
New two-stage pipeline classifies 6 vehicle types with 94% accuracy and abstains when uncertain.
A new open-source computer vision pipeline addresses a critical gap in road safety research: automated classification of vehicles into body types that correlate with cyclist injury severity during overtaking crashes. Current object detection systems only distinguish coarse categories (car, truck, bus), while fine-grained models often fail in real-world deployment due to domain shifts. The proposed two-stage pipeline first uses a pre-trained RT-DETR detector to localize vehicles, then a fine-tuned Vision Transformer (ViT-Base/16) classifies each into one of six injury-risk-relevant categories: passenger car, SUV, pickup truck, minivan, large van, and commercial truck.
To enhance robustness, the pipeline includes a confidence-based abstention mechanism. When the ViT's softmax output falls below 0.60, the system outputs 'unknown' instead of forcing a potentially wrong classification. Evaluated on 3,805 annotated overtaking events from a bicycle lane corridor in Ann Arbor, Michigan, the pipeline achieved 0.94 accuracy with per-class F1 scores ranging from 0.91 (minivan) to 0.97 (SUV). On an independent out-of-distribution dataset (311 events from an open cycling dataset, without retraining), accuracy remained strong at 0.89, with three of four well-represented categories maintaining F1 ≥ 0.90. The largest degradation was for minivan (F1 dropped to 0.72), but this was driven by increased abstention (from 2.4% to 25.0%) rather than active misclassification, demonstrating the mechanism's effectiveness. All code, model weights, and evaluation utilities are released as open-source software to foster reproducibility and adoption in roadside video archives and cyclist safety research.
- Combines RT-DETR for detection and ViT-Base/16 for classification into 6 vehicle body types
- Achieves 0.94 accuracy in-distribution (F1: 0.91-0.97) and 0.89 out-of-distribution
- Confidence-based abstention (threshold 0.60) prevents silent misclassifications, with minivan errors driven by uncertainty not false positives
Why It Matters
Open-source, robust classification of vehicle types enables automated cyclist safety analysis across diverse recording sites.