Runs entirely offline using a local Python server and browser UI, ensuring user privacy and data security?

Runs entirely offline using a local Python server and browser UI, ensuring user privacy and data security.

Automates cropping with MediaPipe face detection and captioning with local Qwen-VL AI models (7B or 2B)?

Automates cropping with MediaPipe face detection and captioning with local Qwen-VL AI models (7B or 2B).

Exports a ready-to-train ZIP file with cropped images and matching caption files for Kohya_ss or other trainers?

Exports a ready-to-train ZIP file with cropped images and matching caption files for Kohya_ss or other trainers.

Image & Video

LoRA Dataset Architect automates image prep with local AI for faster model training

r/StableDiffusion March 02, 2026

⚡New local app handles cropping, captioning, and color grading for LoRA datasets using your GPU.

Deep Dive

A developer has released LoRA Dataset Architect, a new local application designed to automate the most labor-intensive part of creating custom AI image models: dataset preparation. The tool addresses the pain points of manually cropping images, writing captions, and standardizing lighting by running a complete pipeline offline on a user's own machine. It provides a browser-based UI powered by a local Python server, ensuring privacy as no images or data are sent to the cloud. This solves a critical bottleneck for hobbyists and professionals training LoRAs (Low-Rank Adaptations) for models like Stable Diffusion, where dataset quality directly impacts final model performance.

The app's technical workflow is comprehensive. It first uses Google's MediaPipe to automatically detect and crop faces from images of varying sizes into uniform squares at resolutions like 512, 1024, or 1280. It then employs a quality filter to score and allow users to quickly discard subpar crops. For consistency, one-click color grading applies presets like 'Realistic' or 'Cinematic' across the entire set. The core feature is local AI captioning via the Qwen-VL model (available in 7B or 2B parameter versions), which generates detailed captions in either booru-style tags or natural sentences. Users set a trigger word, review and edit captions in a grid, and finally export a ZIP file of cropped images and matching .txt files ready for trainers like Kohya_ss. The tool requires Python, Node.js, Git, and an Nvidia GPU with 8GB+ VRAM, positioning it as a powerful, private alternative to cloud-based preprocessing services.

Key Points

Runs entirely offline using a local Python server and browser UI, ensuring user privacy and data security.
Automates cropping with MediaPipe face detection and captioning with local Qwen-VL AI models (7B or 2B).
Exports a ready-to-train ZIP file with cropped images and matching caption files for Kohya_ss or other trainers.

Why It Matters

Drastically reduces the hours-long manual process of dataset preparation to minutes, making custom AI model training more accessible and efficient.

Read Original Article

LoRA Dataset Architect automates image prep with local AI for faster model training

Why It Matters

Related Articles

🚀 Stay Ahead in AI