Image & Video

Hi all, i built an Video/image caption node For Comfyui node that handles everything for LTX-Video Captioning / image captioning + Audio transcribing

This one-click node ends the 'node spaghetti' nightmare for AI video training.

Deep Dive

A developer has released a powerful 'OmniTag' node for ComfyUI that automates the entire dataset preparation pipeline for LTX-Video and image training. It handles video extraction, scaling to 24 FPS, and captioning using the uncensored Qwen2.5-VL model, which describes any scene without safety filters. It also transcribes audio with Whisper and appends dialogue to files. The node is VRAM efficient, using only ~7GB via 4-bit quantization.

Why It Matters

This drastically simplifies and speeds up creating high-quality, uncensored training data for AI video models, a major bottleneck for creators.