Tripled Gemini Flash's ARC-AGI accuracy from 32.5% to 89.5% using LLM-based agent architecture search?

Tripled Gemini Flash's ARC-AGI accuracy from 32.5% to 89.5% using LLM-based agent architecture search.

Discovered cloud scheduling algorithms that reduce costs by 40%?

Discovered cloud scheduling algorithms that reduce costs by 40%.

Generated CUDA kernels where 87% match or beat PyTorch's performance?

Generated CUDA kernels where 87% match or beat PyTorch's performance.

Research & Papers

Researchers unveil optimize_anything: a universal LLM API for text optimization

arXiv cs.NE May 20, 2026

⚡This single LLM system triples ARC-AGI scores and slashes cloud costs by 40%.

Deep Dive

A team of 15 researchers (including Dan Klein, Ion Stoica, Joseph Gonzalez, and Matei Zaharia) has open-sourced optimize_anything, a universal API that frames optimization as improving a text artifact evaluated by a scoring function. Rather than building separate tools for agent architectures, cloud scheduling, CUDA kernels, or circle packing, a single LLM-based search system tackles all six domains. Key results include nearly tripling Gemini Flash's ARC-AGI accuracy from 32.5% to 89.5%, discovering scheduling algorithms that cut cloud costs by 40%, and generating CUDA kernels where 87% match or beat PyTorch's performance. The system even outperforms AlphaEvolve's reported circle packing solution for n=26.

The researchers found that providing actionable side information (e.g., error messages or partial solutions) leads to faster convergence and higher final scores compared to score-only feedback. Additionally, multi-task search—training on related problems simultaneously—consistently outperforms independent optimization given the same per-problem budget, with benefits scaling as more related tasks are added. The work demonstrates for the first time that text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks that previously required domain-specific algorithms. optimize_anything is released as part of the GEPA project with support for multiple backends, making it accessible for researchers and practitioners to adapt to their own optimization problems.

Key Points

Tripled Gemini Flash's ARC-AGI accuracy from 32.5% to 89.5% using LLM-based agent architecture search.
Discovered cloud scheduling algorithms that reduce costs by 40%.
Generated CUDA kernels where 87% match or beat PyTorch's performance.

Why It Matters

A single LLM-based optimizer could replace bespoke algorithms for scheduling, code generation, and more.

Read Original Article

Researchers unveil optimize_anything: a universal LLM API for text optimization

Why It Matters

Related Articles

🚀 Stay Ahead in AI