GenEvolve treats image generation as a tool-orchestrated trajectory, combining evidence gathering, reference selection, and prompt construction?

GenEvolve treats image generation as a tool-orchestrated trajectory, combining evidence gathering, reference selection, and prompt construction.

Uses Visual Experience Distillation to provide dense token-level supervision from best-worst trajectory comparisons, enabling self-evolution?

Uses Visual Experience Distillation to provide dense token-level supervision from best-worst trajectory comparisons, enabling self-evolution.

Achieves state-of-the-art performance on public benchmarks and the new GenEvolve-Bench, outpacing existing agentic generation methods?

Achieves state-of-the-art performance on public benchmarks and the new GenEvolve-Bench, outpacing existing agentic generation methods.

Research & Papers

GenEvolve: Self-evolving AI agents beat SOTA in image generation via tool orchestration

arXiv cs.CV May 22, 2026

⚡AI that learns from its own mistakes to generate better images without human retraining.

Deep Dive

GenEvolve redefines open-ended image generation by moving beyond simple prompt-to-image models. Instead, it treats each generation as a tool-orchestrated trajectory where the agent gathers evidence, selects references, invokes skills, and composes a prompt-reference program. The key innovation lies in comparing multiple trajectories for the same request and abstracting best-worst differences into structured visual experience. This experience is fed only to a privileged teacher branch, which then supervises the student model with dense token-level feedback—a technique called Visual Experience Distillation. The result is an agent that continuously self-evolves without manual retraining, improving its search, knowledge activation, reference selection, and prompt construction capabilities over time.

The framework is backed by two new datasets: GenEvolve-Data for training and GenEvolve-Bench for evaluation. Experiments show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. By enabling AI to learn from its own generation attempts and tool use, GenEvolve moves closer to generalist image agents that adapt to increasingly diverse and demanding user requests—without requiring human-labeled data or static fine-tuning.

Key Points

GenEvolve treats image generation as a tool-orchestrated trajectory, combining evidence gathering, reference selection, and prompt construction.
Uses Visual Experience Distillation to provide dense token-level supervision from best-worst trajectory comparisons, enabling self-evolution.
Achieves state-of-the-art performance on public benchmarks and the new GenEvolve-Bench, outpacing existing agentic generation methods.

Why It Matters

GenEvolve paves the way for AI image generators that improve autonomously, reducing the need for human retraining.

Read Original Article

GenEvolve: Self-evolving AI agents beat SOTA in image generation via tool orchestration

Why It Matters

Related Articles

🚀 Stay Ahead in AI