Research & Papers

ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

arXiv cs.CV April 03, 2026

⚡A new AI model analyzes 733 practice videos to flag dangerous tackles before injuries happen.

Deep Dive

A team of researchers has published a new method for automatically detecting dangerous tackles in American football, using advanced AI to enhance player safety. The work, led by Syed Ahsan Masud Zaidi, William Hsu, and Scott Dietrich, employs a Vision Transformer (ViT) model—a type of neural network excelling at image and video analysis—to classify actions in practice footage. A key innovation is the creation of a substantially larger, annotated dataset containing 733 single-athlete-dummy tackle clips, each labeled with components from the standardized Assessment for Tackling Technique (SATT-3). This dataset is over four times larger than the 178 clips used in prior work, providing a more robust foundation for training.

The model specifically focuses on the "risky" tackle class, a rare but critical event. To handle the inherent class imbalance, the researchers used imbalance-aware training techniques. The results, validated through cross-validation, show the system achieves a risky recall of 0.67 and a Risky F1 score of 0.59. This marks an improvement of more than 8 percentage points in risky recall compared to previous baselines tested on a smaller dataset. The paper, accepted to the 28th International Conference on Pattern Recognition (ICPR 2026), demonstrates that transformer-based video analysis can reliably identify safety-critical patterns.

This research provides a concrete pathway toward deploying AI as a coach-centered injury prevention tool. By processing practice video in near real-time, the system could flag potentially hazardous techniques as they occur, allowing for immediate corrective feedback. This moves player safety from retrospective review to proactive, data-driven intervention, potentially reducing the incidence of concussions and other contact injuries in sports.

Key Points

The model is built on a Vision Transformer (ViT) architecture and trained on a new dataset of 733 annotated tackle videos, a 4x expansion over prior work.
It achieves a risky recall score of 0.67 and an F1 score of 0.59, improving risky recall by over 8% compared to previous benchmarks.
The system is designed for practical deployment as a coach's tool, enabling timely intervention to prevent injuries during practice sessions.

Why It Matters

It transforms player safety from reactive to proactive, using AI to provide real-time feedback and potentially prevent serious injuries in contact sports.

Read Original Article

ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Why It Matters

Stay Ahead in AI