IMG Dataset Refiner v4.0 Pro - The Ultimate Dataset Engineering Suite for LoRAs (Flux, SDXL, etc...)
Auto-caption, clean duplicates, and balance subsets with local AI — all free.
Reddit user nicolas1801 released v4.0 Pro of their dataset manager, a massive update that turns it into a complete, desktop-like Data Engineering suite for preparing AI model training. It's 100% open-source, runs locally, and is free. Key new features: connect to Ollama or LM Studio for local VLM/LLM integration, allowing auto-captioning, hunting hallucinated tags, the Concept Isolator (describes background while ignoring subject, perfect for character LoRAs), and Booru-to-natural-language translation for Flux. Also adds an interactive Word Library for mass batch editing, a Live Translation Assistant, preprocessing tools (visual duplicate scanner with Perceptual Hashing, Smart Face Crop with OpenCV, transparent PNG conversion to white backgrounds, one-click mass resizing/renaming), advanced analytics like co-occurrence heatmaps and resolution bucketing, and the Recipe Book feature with a greedy algorithm for balanced subset selection. Built with Gradio plus custom JS/CSS for a native desktop feel and fast keyboard navigation. The creator also included their system prompt file for easy updates or forking using Claude, Gemini, or ChatGPT.
- Auto-caption images from scratch using local vision models via Ollama/LM Studio, with concept isolation for character LoRAs.
- Co-occurrence heatmaps and logical contradiction detection eliminate concept bleeding in training datasets.
- Recipe Book with greedy algorithm balances subset percentages (e.g., 50% solo, 50% multiple) for optimal LoRA exports.
Why It Matters
Free, open-source, local-only tool democratizes professional-grade dataset preparation for AI model training.