Image & Video

Generalizable 3D Gaussian Splatting enabled Semantic Coding for Real-Time Immersive Video Communications

Real-time 3D telepresence gets a unified codec that slashes redundancy and boosts compression.

Deep Dive

A new framework called GS-SCNet, proposed by researchers Dingxi Yang, Wenqi Guo, Yue Liu, Jungong Han, and Zhijin Qin, tackles the twin challenges of real-time dynamic scene reconstruction and efficient data transmission for 3D telepresence. Unlike conventional approaches that decouple multi-view video coding from 3D reconstruction—leading to suboptimal compression and high computational overhead—GS-SCNet integrates both into a single end-to-end pipeline. The system is built on two key innovations: a Disparity-Guided Parallel Semantic Codec that leverages epipolar geometry for cross-view interaction and real-time stereo processing, and a Lightweight Gaussian Parameter Predictor that directly maps semantic latents to 3D Gaussian attributes, bypassing intermediate pixel-domain reconstruction.

By coupling these components, GS-SCNet extracts geometric correlations once, eliminating redundant computation. Extensive evaluations on synthetic and real-world human datasets demonstrate a superior trade-off among compression efficiency, rendering quality, and real-time performance. The framework also shows strong cross-domain generalization and robustness against compression artifacts on out-of-domain data, outperforming conventional decoupled transmission paradigms. This work is currently under review and represents a significant step toward practical, high-fidelity immersive video communications.

Key Points
  • GS-SCNet is the first end-to-end framework unifying generalizable 3D Gaussian Splatting with deep semantic coding for real-time immersive video.
  • It introduces a Disparity-Guided Parallel Semantic Codec for real-time stereo processing and a Lightweight Gaussian Parameter Predictor that skips pixel-domain reconstruction.
  • Tests on synthetic and real-world datasets show superior compression efficiency, rendering quality, and cross-domain robustness versus decoupled systems.

Why It Matters

This framework could enable practical, high-fidelity 3D telepresence with lower bandwidth and real-time performance.