Agent Frameworks

StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning

New framework achieves near-perfect 100% success rate on end-to-end GPU programming tasks.

Deep Dive

A research team from the University of Connecticut and University of Minnesota has introduced StitchCUDA, a groundbreaking multi-agent AI framework that automates the complex process of end-to-end GPU programming. The system addresses a critical bottleneck in machine learning deployment: while LLMs show promise for kernel generation, existing methods focus on single-kernel optimization and fail at complete program synthesis. StitchCUDA's novel architecture employs three specialized agents working in concert—a Planner for system design, a Coder for implementation, and a Verifier for correctness and performance profiling using NVIDIA's Nsys/NCU tools—to bridge this gap between isolated optimization and practical deployment.

The framework's breakthrough comes from its rubric-based agentic reinforcement learning approach, which trains the Coder agent on two atomic skills: task-to-code generation and feedback-driven optimization. This prevents common failure modes like reward hacking (where AI might simply copy PyTorch code) while enabling mastery of advanced CUDA techniques like custom kernel fusion and cublas epilogue programming. In rigorous testing on the KernelBench benchmark, StitchCUDA achieved a remarkable 100% success rate on end-to-end tasks, outperforming multi-agent baselines by 1.72x and reinforcement learning models by 2.73x in execution speed. This represents a significant leap toward fully automated GPU code generation that could dramatically accelerate ML research and deployment pipelines.

Key Points
  • Achieves 100% success rate on KernelBench end-to-end GPU programming tasks
  • Outperforms RL baselines by 2.73x and multi-agent baselines by 1.72x in execution speed
  • Uses three specialized AI agents with rubric-based reinforcement learning to prevent reward hacking

Why It Matters

Automates the most complex GPU programming tasks, potentially cutting weeks from ML deployment cycles and making advanced optimization accessible.