Image & Video

A Primer on the Most Important Concepts to Train a LoRA - part 2: Captioning

Captioning is what makes or breaks your LoRA quality—here's how to do it right.

Deep Dive

This guide is Part 2 of a series on training a LoRA (Low-Rank Adaptation), a technique for fine-tuning AI image models. It emphasizes that captioning is the most critical step in the process. During training, the model learns by adding noise to images and then trying to reconstruct them based on the caption's signal. Captions tell the model what to associate with the LoRA trigger word, what to keep variable (like lighting or pose), and what to ignore. Poor captioning leads to poor LoRA quality, while good captioning ensures the model learns the right concepts without overtraining.

The guide advises using natural language for most models (except SD1.5 and SDXL, which prefer tags). Captions should be short, factual, and include: a unique trigger word, expression, camera angle, lighting, pose, background, outfit, accessories, hairstyle, and action. A recommended template is: '<camera shot type> of <trigger> seen from <camera angle> at <elevation> with <hair color and style> wearing <outfit and accessories>. She is <position or action> and is expressing <emotion>. <Light description>, <short background description>.' This structured approach helps the model generalize well and produce consistent, high-quality outputs.

Key Points
  • Captioning provides context for the LoRA, defining what to learn and what to keep variable.
  • Use natural language captions (except for SD1.5/SDXL) that are short, factual, and include trigger word, angle, lighting, and expression.
  • A template like '<shot type> of <trigger> seen from <angle> at <elevation> with <hair> wearing <outfit>...' ensures consistency.

Why It Matters

Proper captioning is the difference between a LoRA that works flawlessly and one that fails—critical for AI artists.