KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
Researchers' new framework uses evolutionary search and meta-prompting to automatically generate faster GPU code.
A research team from TU Munich and MIT has developed KernelFoundry, a novel evolutionary framework that tackles the complex challenge of GPU kernel optimization using AI. Unlike traditional LLM-based approaches that rely on simple prompting and feedback loops, KernelFoundry employs three sophisticated mechanisms: a MAP-Elites quality-diversity search to explore diverse optimization strategies, meta-prompt evolution that co-evolves prompts with kernels to discover task-specific optimizations, and template-based parameter tuning that adapts kernels to specific hardware and inputs. The system generates both SYCL (as a cross-platform model) and CUDA kernels for comparison.
In evaluations across KernelBench, robust-kbench, and custom tasks, KernelFoundry consistently outperformed baseline methods, achieving an average 2.3x speedup on KernelBench for SYCL kernels. The framework is implemented as a distributed system with remote access to diverse hardware, enabling rapid benchmarking and practical application beyond research. This represents a significant advancement over current AI coding assistants, moving from simple code generation to hardware-aware performance optimization that understands parallel computing architectures and can explore the complex design space of GPU programming.
- Uses evolutionary MAP-Elites search with kernel-specific behavioral dimensions to maintain diverse optimization strategies
- Achieved average 2.3x speedup on KernelBench benchmarks compared to baseline methods
- Implemented as distributed framework with remote hardware access for practical real-world deployment
Why It Matters
Enables AI to automatically generate highly optimized GPU code, potentially accelerating scientific computing and AI model inference.