Research & Papers

CoStream: Codec-Guided Resource-Efficient System for Video Streaming Analytics

Researchers' new system cuts GPU compute by 87% for video analytics by repurposing compression metadata.

Deep Dive

A research team including Yulin Zou, Francisco Romero, and Dmitrii Ustiugov has introduced CoStream, a novel system designed to drastically reduce the computational cost of running vision-language models on live video streams. The core innovation is leveraging metadata already generated by standard video codecs (like H.264/AVC or HEVC) during compression. This metadata, which describes temporal and spatial redundancies, is repurposed as a free, low-overhead signal to guide the AI inference pipeline. This approach bypasses the expensive overhead of methods that require offline profiling, model training, or complex online computation just to identify which parts of a video stream are redundant.

CoStream uses this codec guidance in two key ways: it performs 'patch pruning' before feeding data into a Vision Transformer (ViT), skipping visual patches that haven't changed, and it enables 'selective key-value cache refresh' during the LLM prefilling stage, avoiding redundant processing of unchanged visual context. Operating directly on compressed bitstreams also inherently reduces data transmission needs. In experiments, the system delivered a throughput improvement of up to 3x and reduced GPU compute by up to 87% compared to state-of-the-art baselines, while maintaining high accuracy with only a 0-8% drop in F1 score.

Key Points
  • Uses existing video codec metadata as a free signal to identify temporal/spatial redundancy, eliminating costly profiling overhead.
  • Achieves up to 3x throughput improvement and 87% GPU compute reduction for end-to-end video analytics pipelines.
  • Maintains competitive accuracy with only a 0-8% F1 score drop, enabling efficient real-time multimodal AI on video streams.

Why It Matters

Dramatically lowers the cost and infrastructure barrier for real-time AI video analysis in security, retail, and autonomous systems.