Research & Papers

HECTOR: Hybrid Editable Compositional Object References for Video Generation

arXiv cs.CV March 11, 2026

⚡Researchers' new pipeline lets users control individual objects' trajectories in generated videos with hybrid image/video references.

Deep Dive

A research team from Johns Hopkins University and collaborating institutions has introduced HECTOR (Hybrid Editable Compositional Object References for Video Generation), a breakthrough AI pipeline that addresses a fundamental limitation in current video generation. While most existing models like Runway's Gen-2 or Pika Labs synthesize scenes holistically, HECTOR enables explicit compositional manipulation at the object level. This means users can treat videos as dynamic compositions of distinct physical objects rather than monolithic scenes.

HECTOR's key innovation is its support for hybrid reference conditioning, allowing generation to be guided simultaneously by both static images and dynamic videos. Users can explicitly specify the trajectory of each referenced element, controlling its precise location, scale, and speed throughout the generated video. This design enables the model to synthesize coherent videos that satisfy complex spatiotemporal constraints while maintaining high-fidelity adherence to reference materials.

The researchers demonstrated through extensive experiments that HECTOR achieves superior visual quality, stronger reference preservation, and improved motion controllability compared to existing approaches. The model represents a significant step toward more controllable and predictable video generation, moving beyond the black-box nature of current systems where users have limited influence over individual elements' behavior within generated scenes.

Key Points

Enables object-level control with explicit trajectory specification for each element
Supports hybrid conditioning using both static images and dynamic videos as references
Outperforms existing models in visual quality, reference preservation, and motion controllability

Why It Matters

Enables precise, predictable video generation for creative professionals, replacing guesswork with object-level control.

Read Original Article

HECTOR: Hybrid Editable Compositional Object References for Video Generation

Why It Matters

Stay Ahead in AI