Google's AI-PROPELLER boosts warehouse code performance by 1.6%
First-ever fine-grained interprocedural code layout for massive industrial applications yields real gains.
Traditional post-link optimizers like Propeller and BOLT rely on intraprocedural techniques, leaving significant performance on the table due to the combinatorial complexity of interprocedural layout. AI-PROPELLER, detailed in a new arXiv paper by researchers from Google and UC Riverside, overcomes this using an agentic workflow called Magellan. Magellan evolves Propeller's heuristic into a fine-grained interprocedural optimizer, fine-tuning hyperparameters through iterative mutation and selection. Critically, the system moves beyond approximate static cost models by executing multiple layout variants on actual hardware, measuring real performance counters to provide precise reward signals for the evolutionary loop.
Evaluated on several benchmarks including large warehouse-scale applications, AI-PROPELLER delivered performance improvements of 0.23% to 1.6% over already-optimized binaries using state-of-the-art feedback-directed optimization (FDO) and post-link optimization (PLO). While the percentage may seem modest, these gains are highly significant for real-world, heavily optimized binaries running at massive scale. This represents the first-ever successful deployment of fine-grained interprocedural code layout in industrial settings, proving that the global potential of cross-function code reorganization can be unlocked through AI-guided evolution and hardware-in-the-loop tuning.
- First successful application of fine-grained interprocedural code layout at warehouse scale, previously considered intractable.
- Uses Magellan, an agentic workflow that evolves Propeller's compiler heuristic and validates layouts on real hardware via actual performance counters.
- Achieves 0.23% to 1.6% additional performance improvement over state-of-the-art FDO and PLO on large industrial binaries.
Why It Matters
Unlocks a new performance ceiling for hyperscale binary optimization, directly reducing compute costs and latency in data centers.