Image & Video

TIL you can chain (combine) multiple Z-image controlnets

A new workflow reveals how to connect multiple QwenImageDiffsynthControlnets for superior image generation.

Deep Dive

A viral tutorial has revealed a powerful, underutilized capability within Stability AI's Z-Image model: the ability to chain multiple ControlNets. Unlike the model's typical use of a single model-patch for control, users can now connect the output from one QwenImageDiffsynthControlnet directly to the input of another. This chaining technique, familiar to SDXL users but newly discovered for Z-Image, provides far more granular creative control than simply blending two preprocessed images. It allows artists to preserve desired elements from a reference—like a specific composition or color palette—while retaining the flexibility to change others.

The guide provides concrete examples of how different ControlNet combinations affect the final output. For instance, using a Depth ControlNet alone at high strength can enforce a zoomed-out composition but may result in less detailed images and overly rigid poses. Combining a low-strength Depth ControlNet with a Canny ControlNet can enforce a full-body pose while adding richer background details, though it may override specific prompt elements like a 'wooden screen.' The tutorial emphasizes that there is no one-size-fits-all setting; optimal strength values and combinations must be tailored to each reference image and creative goal, unlocking a new level of precision for AI-assisted image generation.

Key Points
  • Z-Image supports chaining multiple QwenImageDiffsynthControlnets by connecting model outputs, a method not previously documented.
  • Combinations like Depth + Canny allow enforcement of composition (e.g., zoomed-out shots) while adding detailed backgrounds.
  • The technique enables using separate reference images for different attributes, like one for room depth and another for character pose.

Why It Matters

This gives digital artists and designers much finer-grained control over AI-generated imagery, improving consistency and creative flexibility.