Research & Papers

InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation

arXiv cs.CV April 13, 2026

⚡New model edits videos starting mid-clip using only ~100K training samples, beating open-source benchmarks.

Deep Dive

A research team from Tencent and the Hong Kong University of Science and Technology has introduced InsEdit, a novel AI model that transforms video generation backbones into powerful video editors using minimal training data. Built on Tencent's HunyuanVideo-1.5 foundation, InsEdit addresses the critical data scarcity problem in video editing by requiring only approximately 100,000 video editing samples—orders of magnitude less than typical approaches. The breakthrough comes from its Mutual Context Attention (MCA) architecture, which creates precisely aligned video pairs where edits can initiate from any frame, not just the beginning, enabling more natural and flexible editing workflows.

This data-efficient adaptation method allows InsEdit to achieve state-of-the-art performance on video instruction editing benchmarks among open-source models. The training recipe strategically incorporates image editing data alongside video samples, giving the final model the added capability to perform high-quality image editing without any architectural changes or additional training. This dual functionality makes InsEdit a versatile tool for content creators who need consistent editing across both mediums, potentially lowering the barrier to professional-grade video manipulation through simple text instructions.

Key Points

Built on Tencent's HunyuanVideo-1.5 and requires only ~100K video samples for training, solving data scarcity
Uses Mutual Context Attention (MCA) to enable edits starting mid-clip, not just from the first frame
Achieves SOTA among open-source methods on video editing benchmarks and supports image editing without modification

Why It Matters

Dramatically reduces the data and cost barrier for creating professional AI video editors, enabling more accessible content creation tools.

Read Original Article

InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation

Why It Matters

Stay Ahead in AI