Image & Video

Built this over the weekend because dataset prep was annoying af

Built over a weekend to fix the most annoying part of training diffusion models.

Deep Dive

Training diffusion models for animations often involves tedious dataset preparation—trimming clips, resizing, ensuring frame counts follow model-specific rules (e.g., 8n+1 for LTX), and adding captions. To solve this, Oqura-ai built Diff-Forge over a weekend and open-sourced it. The tool runs locally with a simple UI and FastAPI backend, using FFmpeg under the hood. Users simply drop raw videos; Diff-Forge automatically checks for issues, fixes them (allowing manual tweaks if needed), and outputs a clean dataset ready for training. It also supports bulk captioning across the entire dataset.

Currently, Diff-Forge supports LTX and WAN models, with plans to add more. The developer notes it has already made their own workflow much smoother. Beyond the tool, Oqura-ai is building a small community Discord for sharing ideas, discussing features, and collaborating on similar open-source projects. The GitHub repo (github.com/Oqura-ai/diff-forge) and Discord invite are available in the post.

Key Points
  • Diff-Forge automates video trimming, resizing, frame count compliance (8n+1), and bulk captioning for diffusion model training.
  • Open-source, local-first tool built with FastAPI + FFmpeg; supports LTX and WAN models with more coming.
  • Developer offers community Discord for collaboration on similar open-source AI tools.

Why It Matters

Saves AI developers hours of manual data prep, making custom animation model training accessible.