Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing
Researchers built the largest subjective video cropping dataset to preserve meaning in mobile-friendly formats.
Researchers from the University of Texas at Austin and Google have released the LIVE-YT VC database, the largest publicly-available subjective video portrait region cropping dataset, containing 1,800 videos annotated by 90 human subjects. Sourced from the YouTube-UGC and LSVQ databases, this resource addresses a critical gap in enabling AI to intelligently crop landscape videos to portrait or other aspect ratios without distorting content or losing meaning. The team also introduced LIVE-YT VC++, a post-processed version that applies a novel intra-frame temporal filter to smooth subjective annotations across frames, reducing noise in human labeling.
To validate the dataset, the researchers deployed the SmartVidCrop algorithm and fine-tuned state-of-the-art video grounding models for aspect ratio change tasks, demonstrating significant improvements in preserving visual quality and semantic intent. The labels bear similarities to video saliency annotations, prompting additional analysis to explore their overlap. The project is under review at IEEE Transactions on Image Processing, and the code, models, and dataset will be open-sourced, providing a benchmark for future research in mobile-friendly video transformation.
- LIVE-YT VC database: 1,800 videos annotated by 90 human subjects for portrait region cropping
- LIVE-YT VC++ applies a temporal filter to smooth annotations across video frames
- Fine-tuned video grounding models and SmartVidCrop algorithm showed improved aspect ratio transformation
Why It Matters
Enables AI to crop landscape videos to portrait mode without distorting content, improving mobile viewing experiences.