Towards Controllable Video Synthesis of Routine and Rare OR Events
Researchers' diffusion model creates synthetic OR footage to train safety AI, achieving 70% recall for detecting near-misses.
A multi-institutional research team has developed a novel AI framework for generating realistic, controllable videos of surgical procedures, specifically targeting the critical challenge of data scarcity for rare and safety-critical operating room events. The work, titled 'Towards Controllable Video Synthesis of Routine and Rare OR Events,' presents a specialized video diffusion model that can synthesize both standard workflows and counterfactual 'near-miss' scenarios, such as violations of the sterile field. This capability addresses a major bottleneck in developing ambient intelligence for surgical safety, as curating real-world datasets of rare, dangerous events is both operationally difficult and ethically fraught. The framework was accepted for presentation at IPCAI 2026.
The technical approach integrates three core components: a geometric abstraction module that converts OR scenes into simplified representations, a conditioning module to guide the synthesis, and a fine-tuned diffusion model for final video generation. The system demonstrated superior performance over off-the-shelf video diffusion baselines, achieving better scores on standard video quality metrics like Fréchet Video Distance (FVD) and Structural Similarity Index (SSIM). Crucially, the researchers used the framework to create a synthetic dataset, which was then used to train an AI model for detecting safety violations. This model achieved a recall of 70.13%, proving the synthetic data's utility for developing real-world safety tools. The work paves the way for generating vast, controlled datasets to train robust AI assistants for the operating room without compromising patient privacy or safety.
- Framework uses a geometric abstraction and conditioning module to guide a fine-tuned video diffusion model for controllable synthesis.
- Outperforms standard video diffusion models, achieving lower FVD/LPIPS and higher SSIM/PSNR scores for in- and out-of-domain data.
- AI model trained on the synthetic data achieved 70.13% recall for detecting near safety-critical events like sterile-field violations.
Why It Matters
Enables creation of ethical, scalable training data for surgical AI, accelerating development of safety systems that can prevent rare but critical errors.