FP8 optimizations allow 1.5× faster training steps and fit a 9B LoRA model in ~9.6GB, enabling 16GB card training?

FP8 optimizations allow 1.5× faster training steps and fit a 9B LoRA model in ~9.6GB, enabling 16GB card training

Context LoRA training lets multiple LoRAs coexist (e.g., face on style) without retraining?

Context LoRA training lets multiple LoRAs coexist (e.g., face on style) without retraining

Includes repair studio, exploration tool, profiling, and rank extraction — all interoperable in one open-source app?

Includes repair studio, exploration tool, profiling, and rank extraction — all interoperable in one open-source app

Image & Video

Fizgig 1.2.4 speeds up LoRA training on 16GB GPUs

r/StableDiffusion June 02, 2026

⚡Train a full 9B LoRA on a 16GB card with 1.5× faster steps

Deep Dive

Fizgig, a free open-source tool for Klein 9b training, LoRA surgery, and LoRA exploration, is optimized for 16GB GPUs with FP8 support. It runs frozen-base matmuls in FP8 on RTX 40/50-series tensor cores for 1.5× faster steps, and the FP8 model uses ~9.6GB, enabling full 9B LoRA training on 16GB cards. Features include Context LoRA training, bilingual captions, distilled 4-step previews, a self-tuning adaptive learning rate, pause/resume that frees GPU mid-run, and a Repair Studio for fixing LoRAs block-by-block without retraining. Available on GitHub.

Key Points

FP8 optimizations allow 1.5× faster training steps and fit a 9B LoRA model in ~9.6GB, enabling 16GB card training
Context LoRA training lets multiple LoRAs coexist (e.g., face on style) without retraining
Includes repair studio, exploration tool, profiling, and rank extraction — all interoperable in one open-source app

Why It Matters

Fizgig makes high-quality LoRA training accessible to budget GPU owners, opening creative AI to more users.

Read Original Article

Fizgig 1.2.4 speeds up LoRA training on 16GB GPUs

Why It Matters

Related Articles

🚀 Stay Ahead in AI