Two-View Accumulation as the Primary Training Lever for Hybrid-Capture Gaussian Splatting: A Variance-Decomposition View of When Gradient Surgery Helps
Rendering two views per step outperforms GradNorm, confidence gating, and active pairing. Here's why.
Sungjun Cho's latest paper tackles a persistent problem in hybrid-capture novel view synthesis: combining images from vastly different camera distances (e.g., aerial drone and ground-level views) causes standard 3D Gaussian Splatting (3DGS) to under-fit the minority regime by 1–3 dB on all five benchmarks tested. The author systematically compares compute-matched alternatives—vanilla 60K iterations, GradNorm, direction-aware near/far gradient surgery, projective preconditioning, confidence-gated sample-level surgery, and a random two-view-per-step control. The winner is the simplest: rendering two views per optimizer step.
The key insight? The pairing rule (geometry-defined near/far, random, or active loss-disparity) does not change PSNR beyond seed variance on any scene. The structural change of having two views per step is what matters. Cho proposes a variance-decomposition framework that explains why: under bimodal camera regimes, between-regime gradient variance is small relative to within-regime variance in 3DGS. This makes structured and random pairings variance-equivalent, and the variance halving from two-view accumulation itself becomes the dominant effect. The finding transfers cleanly to Scaffold-GS and Pixel-GS backbones, and the paper honestly reports that direction-aware projection, magnitude correction, confidence gating, and active loss-disparity pairing all fall within seed variance of random two-view pairing. This work provides a clear, minimal training-side lever for improving hybrid-capture 3DGS.
- Two-view accumulation improves PSNR by 1–3 dB over standard 30K-iteration training on hybrid-capture benchmarks.
- The pairing rule (near/far, random, or active) has no effect beyond seed variance on any scene tested.
- A variance-decomposition framework shows within-regime variance dominates, making two-view halving the primary lever.
Why It Matters
Simplifies 3DGS training for hybrid-capture scenes, enabling high-quality novel view synthesis with just a two-view-per-step change.