Training LoRA on 5060 Ti 16GB .. is this the best speed or is there any way to speed up iteration time?
This hobbyist's setup could change how we think about local AI training speeds.
Deep Dive
A user trained a Stable Diffusion XL LoRA model in just 1 hour using a 16GB RTX 5060 Ti, consuming 13GB VRAM. With a batch size of 4 over 2000 steps, they achieved a solid result but are asking the community if they can push iteration time down to 2-3 seconds per step. The post details their full kohya_ss configuration, sparking a technical debate on optimizing local fine-tuning for consumer hardware.
Why It Matters
It proves high-quality model personalization is becoming accessible, pushing the community to find the absolute limits of consumer hardware.