HECTOR: Hybrid Editable Compositional Object References for Video Generation
Researchers' new pipeline lets users control individual objects' trajectories in generated videos with hybrid image/video references.
A research team from Johns Hopkins University and collaborating institutions has introduced HECTOR (Hybrid Editable Compositional Object References for Video Generation), a breakthrough AI pipeline that addresses a fundamental limitation in current video generation. While most existing models like Runway's Gen-2 or Pika Labs synthesize scenes holistically, HECTOR enables explicit compositional manipulation at the object level. This means users can treat videos as dynamic compositions of distinct physical objects rather than monolithic scenes.
HECTOR's key innovation is its support for hybrid reference conditioning, allowing generation to be guided simultaneously by both static images and dynamic videos. Users can explicitly specify the trajectory of each referenced element, controlling its precise location, scale, and speed throughout the generated video. This design enables the model to synthesize coherent videos that satisfy complex spatiotemporal constraints while maintaining high-fidelity adherence to reference materials.
The researchers demonstrated through extensive experiments that HECTOR achieves superior visual quality, stronger reference preservation, and improved motion controllability compared to existing approaches. The model represents a significant step toward more controllable and predictable video generation, moving beyond the black-box nature of current systems where users have limited influence over individual elements' behavior within generated scenes.
- Enables object-level control with explicit trajectory specification for each element
- Supports hybrid conditioning using both static images and dynamic videos as references
- Outperforms existing models in visual quality, reference preservation, and motion controllability
Why It Matters
Enables precise, predictable video generation for creative professionals, replacing guesswork with object-level control.