Built this over the weekend because dataset prep was annoying af
Built over a weekend to fix the most annoying part of training diffusion models.
Training diffusion models for animations often involves tedious dataset preparation—trimming clips, resizing, ensuring frame counts follow model-specific rules (e.g., 8n+1 for LTX), and adding captions. To solve this, Oqura-ai built Diff-Forge over a weekend and open-sourced it. The tool runs locally with a simple UI and FastAPI backend, using FFmpeg under the hood. Users simply drop raw videos; Diff-Forge automatically checks for issues, fixes them (allowing manual tweaks if needed), and outputs a clean dataset ready for training. It also supports bulk captioning across the entire dataset.
Currently, Diff-Forge supports LTX and WAN models, with plans to add more. The developer notes it has already made their own workflow much smoother. Beyond the tool, Oqura-ai is building a small community Discord for sharing ideas, discussing features, and collaborating on similar open-source projects. The GitHub repo (github.com/Oqura-ai/diff-forge) and Discord invite are available in the post.
- Diff-Forge automates video trimming, resizing, frame count compliance (8n+1), and bulk captioning for diffusion model training.
- Open-source, local-first tool built with FastAPI + FFmpeg; supports LTX and WAN models with more coming.
- Developer offers community Discord for collaboration on similar open-source AI tools.
Why It Matters
Saves AI developers hours of manual data prep, making custom animation model training accessible.