Research & Papers

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

arXiv cs.DC March 16, 2026

⚡Researchers' new framework uses evolutionary search and meta-prompting to automatically generate faster GPU code.

Deep Dive

A research team from TU Munich and MIT has developed KernelFoundry, a novel evolutionary framework that tackles the complex challenge of GPU kernel optimization using AI. Unlike traditional LLM-based approaches that rely on simple prompting and feedback loops, KernelFoundry employs three sophisticated mechanisms: a MAP-Elites quality-diversity search to explore diverse optimization strategies, meta-prompt evolution that co-evolves prompts with kernels to discover task-specific optimizations, and template-based parameter tuning that adapts kernels to specific hardware and inputs. The system generates both SYCL (as a cross-platform model) and CUDA kernels for comparison.

In evaluations across KernelBench, robust-kbench, and custom tasks, KernelFoundry consistently outperformed baseline methods, achieving an average 2.3x speedup on KernelBench for SYCL kernels. The framework is implemented as a distributed system with remote access to diverse hardware, enabling rapid benchmarking and practical application beyond research. This represents a significant advancement over current AI coding assistants, moving from simple code generation to hardware-aware performance optimization that understands parallel computing architectures and can explore the complex design space of GPU programming.

Key Points

Uses evolutionary MAP-Elites search with kernel-specific behavioral dimensions to maintain diverse optimization strategies
Achieved average 2.3x speedup on KernelBench benchmarks compared to baseline methods
Implemented as distributed framework with remote hardware access for practical real-world deployment

Why It Matters

Enables AI to automatically generate highly optimized GPU code, potentially accelerating scientific computing and AI model inference.

Read Original Article

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Why It Matters

Stay Ahead in AI