Image & Video

Built this over the weekend because dataset prep was annoying af

r/StableDiffusion May 04, 2026

⚡Built over a weekend to fix the most annoying part of training diffusion models.

Deep Dive

Training diffusion models for animations often involves tedious dataset preparation—trimming clips, resizing, ensuring frame counts follow model-specific rules (e.g., 8n+1 for LTX), and adding captions. To solve this, Oqura-ai built Diff-Forge over a weekend and open-sourced it. The tool runs locally with a simple UI and FastAPI backend, using FFmpeg under the hood. Users simply drop raw videos; Diff-Forge automatically checks for issues, fixes them (allowing manual tweaks if needed), and outputs a clean dataset ready for training. It also supports bulk captioning across the entire dataset.

Currently, Diff-Forge supports LTX and WAN models, with plans to add more. The developer notes it has already made their own workflow much smoother. Beyond the tool, Oqura-ai is building a small community Discord for sharing ideas, discussing features, and collaborating on similar open-source projects. The GitHub repo (github.com/Oqura-ai/diff-forge) and Discord invite are available in the post.

Key Points

Diff-Forge automates video trimming, resizing, frame count compliance (8n+1), and bulk captioning for diffusion model training.
Open-source, local-first tool built with FastAPI + FFmpeg; supports LTX and WAN models with more coming.
Developer offers community Discord for collaboration on similar open-source AI tools.

Why It Matters

Saves AI developers hours of manual data prep, making custom animation model training accessible.

Read Original Article

Built this over the weekend because dataset prep was annoying af

Why It Matters

Stay Ahead in AI