Image & Video

FLUX.2 Klein LoRA turns image editing into CV tasks – with mixed results

Researcher trains FLUX.2 Klein LoRAs to extract depth, normals, pose, and segmentation maps

Deep Dive

Researcher nomadoor has released FLUX.2 Klein 9B Schematic LoRA, a set of six LoRA adapters trained on the FLUX.2 Klein 9B text-to-image model to tackle classic computer vision tasks — depth estimation, surface normal mapping, body pose, full binary segmentation, and amodal segmentation — by framing them as image-to-image editing problems. This mirrors Google's recent Vision Banana approach, but runs locally on an open-source model. The LoRAs are available on Hugging Face along with the training dataset and a blog post detailing the methodology.

The results are a mixed bag: depth and normal outputs are decent, but pose predictions break down on fine details, and segmentation remains the most unstable task. Amodal segmentation, which estimates hidden object shapes, did show signs of working, confirming the viability of the concept. However, the author admits quality is not yet good enough for practical use due to budget and time constraints. The broader insight is that text-to-image models can be surprisingly effective for non-standard tasks if we redefine what counts as 'image editing' — a direction worth exploring further.

Key Points
  • Six LoRAs trained on FLUX.2 Klein 9B cover depth, normals, body pose, full binary segmentation, and amodal segmentation
  • Depth and normal estimation worked relatively well, but pose and segmentation were unstable and not production-ready
  • Amodal segmentation, which infers hidden object parts, showed promising behavior, confirming the concept's feasibility

Why It Matters

Repurposing image generation models for CV tasks offers a new, lightweight alternative to specialized models — with potential for rapid prototyping.