Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging
New AI system 'sees' and interacts with GUI apps like a human to debug code, raising success rates.
A research team has introduced VF-Coder, a novel multi-agent system that uses visual feedback to generate and debug code for graphical user interfaces (GUIs). The system directly addresses a major limitation of current Large Language Model (LLM)-based agents, which rely solely on text-based outputs and struggle with the visual, event-driven nature of GUI applications. To enable rigorous testing, the team also created InteractGUI Bench, a comprehensive benchmark comprising 984 real-world desktop GUI tasks designed to evaluate both interaction logic and visual structure.
VF-Coder works by perceiving the visual output of a running program and simulating user interactions to trigger GUI element logic, much like a human tester. This allows it to identify issues that are invisible to text-only models, such as misaligned buttons, incorrect colors, or broken event handlers. In evaluations on the new benchmark, VF-Coder significantly boosted the performance of the Gemini-3-Flash model, raising its task success rate from 21.68% to 28.29% and improving its visual score from 0.4284 to 0.5584—a 30% relative increase. This demonstrates the critical role of visual perception in automating complex software engineering tasks that involve graphical interfaces.
- Introduces VF-Coder, a vision-feedback multi-agent system that 'sees' and interacts with GUI apps to debug code.
- Presents InteractGUI Bench, a new benchmark with 984 real-world desktop GUI tasks for fine-grained evaluation.
- Boosts Gemini-3-Flash's GUI task success rate from 21.68% to 28.29% and visual score by 30% through visual feedback.
Why It Matters
This bridges a critical gap in AI-assisted development, enabling automated testing and debugging of visual, interactive software that powers most modern applications.