[Release] ComfyUI DiffAid Patches — inference-time adaptive interaction denoising for rectified text-to-image generation
New nodes adaptively modulate text guidance per token and timestep, improving prompt adherence and image quality.
A new, unofficial implementation of a promising AI image generation technique has hit the popular ComfyUI platform. The 'ComfyUI DiffAid Patches' are a reverse-engineered approximation of the 'Diff-Aid' method described in a 2026 arXiv paper. The core innovation moves beyond the standard, single 'guidance scale' knob. Instead, it adaptively modulates how strongly text conditions the image denoising process—varying the influence per individual word (token), per model block, and per denoising timestep. This targeted approach aims to improve how closely a model follows complex prompts and enhances overall output quality.
The release includes two distinct nodes to cover major model architectures. The 'Flux.2 Diff-Aid Sparse Patch' is designed for FLUX-family models (like FLUX.2 klein), implementing a sparse enhancement strategy validated in the original paper where modifying just a few key blocks yields significant benefits. A separate 'SDXL Diff-Aid Cross-Attention Patch' adapts the principle for SDXL's different U-Net architecture, hooking into its cross-attention path. The developer's initial tests, using a prompt to change a subject's clothing while preserving pose, report noticeable improvements in prompt adherence, color, and lighting when the patch node is placed before the sampler.
- Implements adaptive, multi-dimensional conditioning (per token/block/timestep) instead of a single global guidance scale, based on the Diff-Aid paper.
- Offers two nodes: a sparse patch for FLUX MMDiT models and a cross-attention patch adapted for SDXL U-Nets.
- Early user tests indicate perceptible gains in image quality, color/lighting, and prompt adherence for complex edits like clothing swaps.
Why It Matters
Provides a practical, inference-time tool to significantly improve prompt following and output quality for leading open-source image models like FLUX and SDXL.