Image & Video

Goal-Oriented Framework for Optical Flow-based Multi-User Multi-Task Video Transmission

A new AI framework transmits only essential motion data, cutting bandwidth needs and boosting video quality by 13.5%.

Deep Dive

A team of researchers from institutions including the University of New South Wales and Friedrich-Alexander-Universität Erlangen-Nürnberg has introduced a novel framework called OF-GSC (Optical Flow-based Goal-oriented Semantic Communication). This system fundamentally rethinks how video is transmitted for AI tasks by focusing on sending only the semantically important information—specifically, motion data extracted via optical flow—rather than raw pixel data. At its core, a semantic encoder identifies and selects patch-level motion representations, while a transformer-based decoder reconstructs high-quality video or performs classification. This approach drastically reduces the data burden on wireless networks.

For practical performance, OF-GSC delivers substantial gains. In video reconstruction tasks, it improves the Structural Similarity Index Measure (SSIM) by 13.47% compared to the DeepJSCC baseline, meaning received videos are of significantly higher quality. For AI-driven video classification, it achieves a Top-1 accuracy slightly surpassing the powerful VideoMAE model while requiring only 25% of the data under the same conditions. Furthermore, the team developed a Deep Deterministic Policy Gradient (DDPG) algorithm to dynamically allocate bandwidth among multiple users, reducing the maximum transmission time by 25.97% compared to a simple equal-bandwidth scheme. This makes the system efficient for real-world, multi-user scenarios like surveillance or teleoperation.

Key Points
  • Cuts required data for video classification by 75% while matching VideoMAE's accuracy.
  • Improves reconstructed video quality (SSIM score) by 13.47% over the DeepJSCC method.
  • Uses a DDPG AI agent to allocate bandwidth, reducing max transmission time by 25.97%.

Why It Matters

Enables high-quality, low-latency video for drones, AR/VR, and remote operations over congested wireless networks.