Revisiting WAN 2.2 for real-person realism, consented LoRA, retuned settings
Fine-tuned captions and inference settings deliver stunning character consistency on WAN 2.2
Reddit user lerqvid shared an updated iteration of their WAN 2.2 identity LoRA, achieving markedly better realism by retraining and re-tuning inference settings. Originally trained on a dataset of 40 consented real-person images with paired captions, the earlier round suffered from over-complicated, environment-heavy captions and a low step count. The newer pass tightened captions to focus on the character rather than the scene, adjusted CFG/conditioning behavior, and used a HighNoise + LowNoise custom Docker setup on RunPod (A100 40GB) with ComfyUI. The result: significantly improved character retention and more believable outputs.
Beyond this single LoRA, lerqvid envisions a modular system where identity, pose/scene, and surface details (skin, tattoos) are controlled by separate stacked LoRAs. This would allow consistent character IDs while independently varying poses, scenes, and fine details. The post invites the community to share experiences with LoRA stacking on models like Klein or Z-Image, particularly for maintaining identity stability while adding accessories or fine realism layers. This approach could be a game-changer for AI-generated content pipelines, making reusable character assets far more practical.
- Trained a WAN 2.2 LoRA on 40 consented real-person photos with refined captions for better identity retention
- Used an A100 40GB GPU, ComfyUI workflow, and custom Docker setup with optimized CFG/conditioning settings
- Proposes modular LoRA stacking: separate layers for identity, pose/scene, and fine details for greater control
Why It Matters
Modular LoRA stacking could enable reusable, consistent AI characters for films, games, and virtual production.