Research & Papers

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows

New system breaks monolithic diffusion workflows into independent nodes, enabling 8x better burst tolerance.

Deep Dive

A research team led by Lingyun Yang and Suyi Li has introduced LegoDiffusion, a novel system for serving complex text-to-image diffusion workflows. Published on arXiv, the system addresses a critical bottleneck: current serving platforms treat multi-model workflows (e.g., a pipeline using a base model like Stable Diffusion, a safety checker, and an upscaler) as single, opaque units. This monolithic approach forces all components to be provisioned and scaled together, obscuring internal data flow, preventing model sharing between different workflows, and leading to inefficient, coarse-grained resource management.

LegoDiffusion's core innovation is decomposing these workflows into loosely coupled, independently managed model-execution nodes, akin to a microservices architecture for AI inference. This explicit management of individual models unlocks cluster-scale optimizations previously impossible. The system can perform per-model autoscaling based on actual load, enable a single model instance (like a commonly used upscaler) to be shared across multiple concurrent workflows, and apply adaptive model parallelism strategies tailored to each component. The results are substantial: LegoDiffusion sustains up to 3 times higher request rates and can tolerate burst traffic spikes up to 8 times higher than existing monolithic serving systems, dramatically improving throughput and cost-efficiency for AI image generation services.

Key Points
  • Decomposes monolithic AI image workflows into independent micro-services for each model (e.g., base model, safety filter).
  • Enables cluster optimizations like per-model scaling and cross-workflow model sharing, sustaining 3x higher request rates.
  • Demonstrates superior burst tolerance, handling traffic spikes up to 8x higher than current serving systems.

Why It Matters

Enables more scalable, efficient, and cost-effective deployment of commercial AI image generation services at high volume.