Image & Video

Qwen3.5-4B-Base-ZitGen-V1

A 4B-parameter model fine-tuned to generate Stable Diffusion prompts from images using a novel LLM-based dataset.

Deep Dive

Developer lolzinventor has released Qwen3.5-4B-Base-ZitGen-V1, a fine-tuned version of Alibaba's Qwen 3.5 4B model, specifically optimized for generating Stable Diffusion prompts from images. The model's key innovation lies in its training dataset, which was not manually curated but instead generated by a novel, iterative LLM-driven process. This process involved using other LLMs to analyze a target image and a previously generated image, produce a detailed comparison, and then craft a new prompt to minimize visual differences via the ComfyUI API and Z-Image Turbo. This cycle was repeated 4-6 times per image, theoretically creating prompts perfectly adapted to the specific Stable Diffusion model's behavior.

The resulting prompt-image pairs were then filtered to remove errors like watermarks or residual artifacts and formatted into the ShareGPT dataset format for training. The goal is to create a compact, 4B-parameter model that can act as an intelligent captioner within workflows like ComfyUI, converting any input image into a high-quality, actionable prompt for image regeneration or variation. The developer is currently seeking community input on integrating this LLM-based captioning node into ComfyUI workflows, highlighting its potential as a tool for automated prompt engineering and image analysis.

Key Points
  • Model is a fine-tuned version of Alibaba's Qwen 3.5 4B, optimized specifically for Stable Diffusion prompt generation from images.
  • Training dataset was generated via a novel iterative process where LLMs used the ComfyUI API to refine prompts over 4-6 rounds per image.
  • Aims to provide an automated 'image-to-prompt' node for ComfyUI workflows, tailoring prompts to the specific SD model in use.

Why It Matters

Automates and refines prompt engineering for AI image generation, potentially improving workflow efficiency and output quality for creators.