Flux 2 Klein 9B is now up to 2× faster with multiple reference images (new model)
New KV-caching technique and NVIDIA-built FP8 quantization deliver major performance gains for AI image generation.
Black Forest Labs has launched a major performance upgrade for their open-source image generation model, Flux 2 Klein 9B. The headline feature is a new KV-caching technique that dramatically accelerates workflows involving multiple reference images. KV-caching (Key-Value caching) allows the model to skip redundant computations on shared visual elements across reference images. The more reference images a user provides, the greater the speedup, with inference times improving by over 2x for multi-reference editing tasks. This makes iterative design and style-consistent image generation significantly more efficient.
Alongside the architectural optimization, Black Forest Labs is releasing FP8 quantized weights for the model, built in collaboration with NVIDIA. FP8 (8-bit floating point) quantization reduces the model's memory footprint and computational requirements without a substantial loss in output quality, enabling the faster speeds to be realized on more accessible hardware. This combination of smarter caching and efficient quantization represents a dual approach to performance: one software-based and one hardware-focused. For creators and developers, it translates to quicker iteration cycles and the ability to handle more complex, multi-image prompts in practical applications.
- KV-caching technique skips redundant computation on reference images, enabling up to 2x faster inference for multi-image edits.
- Released FP8 quantized weights built with NVIDIA reduce model size and improve hardware efficiency.
- Performance gains scale with the number of reference images used, benefiting style-consistent generation workflows.
Why It Matters
Faster, more efficient open-source models lower the barrier for creative AI applications, enabling quicker prototyping and more complex image generation tasks.