Batch caption your entire image dataset locally (no API, no cost)
Open-source Python tool uses LM Studio to caption datasets for LoRA training without API costs.
Developer vizsumit has released an open-source Python utility called Image Captioner, designed to solve a common bottleneck in AI image model training: preparing large, accurately captioned datasets. The tool automates the tedious process of generating descriptive text for thousands of images, a critical step for training specialized models like LoRAs (Low-Rank Adaptations). It operates entirely offline by leveraging a locally installed instance of LM Studio running in server mode, which allows it to use free, open-source vision-language models such as Google's Gemma 4 or Alibaba's Qwen 3.5. This architecture completely bypasses the need for paid cloud API services, making the workflow both faster for bulk processing and free of recurring costs.
The tool addresses specific pain points in the machine learning pipeline, where existing solutions were often too slow for generation or cumbersome for batch editing. By providing a script-based, local approach, it gives researchers and hobbyists full control over their data and model choice. This is particularly impactful for the growing community of Stable Diffusion and custom model trainers who require high-quality, tailored datasets. The project is hosted on GitHub, offering a practical, cost-effective solution that democratizes access to efficient dataset preparation, potentially accelerating experimentation and development in personalized AI image generation.
- Runs entirely locally using LM Studio's API mode with free vision LLMs like Gemma 4 or Qwen 3.5.
- Eliminates cloud API costs and slow manual editing for bulk image captioning in training datasets.
- Specifically designed to accelerate dataset preparation for LoRA and other AI image model training.
Why It Matters
It democratizes efficient AI dataset prep, saving time and money for researchers and hobbyists training custom image models.