Image & Video

OmniWeaving for ComfyUI

r/StableDiffusion April 05, 2026

⚡Unofficial port brings Tencent's powerful HunyuanVideo 1.5 model to popular ComfyUI workflow with image/video reference capabilities.

Deep Dive

A developer known as ifilipis has successfully ported Tencent's HY-OmniWeaving model to ComfyUI through an unofficial GitHub pull request (#13289). This integration brings Tencent's HunyuanVideo 1.5 capabilities to the popular node-based interface, introducing two specialized nodes: 'HunyuanVideo 15 Omni Conditioning' and 'Text Encode HunyuanVideo 15 Omni.' These nodes enable users to link both images and videos as references for generation tasks, supporting the same multi-modal workflows demonstrated in Tencent's original research.

The port requires users to clone a specific branch and download the model from Hugging Face repositories, maintaining compatibility with existing HunyuanVideo 1.5 workflows. According to testing, the model demands significant computational resources—requiring 30-50 generation steps and careful CFG (Classifier-Free Guidance) tuning to produce quality results, even on high-end hardware like the RTX 6000. The developer recommends pairing outputs with LTX upscaling for higher resolution and notes particularly strong performance in FFLF (Free-form Layout Following) and video-to-video editing tasks.

While the unofficial port shows some limitations in reproducing complex scenes like those from Seedance 2.0, it represents a significant advancement for the open-source AI video community. The integration enables previously unavailable capabilities including multi-image references, combined image+video references (tiv2v), and sophisticated camera motion guidance. As the first open implementation of such comprehensive video generation tools, this port provides researchers and creators with accessible experimentation capabilities that were previously limited to proprietary systems.

Key Points

Unofficial port adds Tencent's OmniWeaving model to ComfyUI via GitHub PR #13289 with two new specialized nodes
Supports multi-modal tasks including text2vid, img2vid, video editing, and combined image+video references (tiv2v)
Requires heavy computation (30-50 steps on RTX 6000) but enables previously unavailable open-source video generation capabilities

Why It Matters

Brings advanced multi-modal video generation to open-source workflows, enabling new creative possibilities previously locked in proprietary systems.

Read Original Article

OmniWeaving for ComfyUI

Why It Matters

Stay Ahead in AI