ACE uses a solver-adversary architecture where the LLM both writes code and generates adversarial unit tests to actively induce failures?

ACE uses a solver-adversary architecture where the LLM both writes code and generates adversarial unit tests to actively induce failures.

Training requires no ground-truth code or external reward models; supervision is derived purely from execution outcomes?

Training requires no ground-truth code or external reward models; supervision is derived purely from execution outcomes.

Achieves 3-7% absolute gains in pass@1 on CodeContests, MBPP, and LiveCodeBench, with larger improvements on out-of-distribution tasks?

Achieves 3-7% absolute gains in pass@1 on CodeContests, MBPP, and LiveCodeBench, with larger improvements on out-of-distribution tasks.

Developer Tools

ACE Framework Lets LLMs Self-Improve by Generating Adversarial Tests

arXiv cs.SE May 19, 2026

⚡No ground-truth code needed—ACE boosts LLM coding by 3-7% with adversarial tests.

Deep Dive

Current LLM code generation relies heavily on large-scale annotated solutions and verification-based supervision, which limits scalability and sustained self-improvement. Existing solver-verifier frameworks use program execution as automatic supervision, but their effectiveness plateaus as solvers improve—verifier-generated tests increasingly confirm semantic correctness instead of exposing failure modes. ACE introduces a solver-adversary architecture where a single LLM alternates between generating candidate programs and producing adversarial unit test inputs optimized to cause execution failures (runtime errors, exceptions, non-termination). Supervision is derived purely from execution outcomes: robust programs are selected for supervised fine-tuning, while adversarial tests are optimized via Kahneman-Tversky Optimization using execution-derived preferences. No ground-truth code or external reward models are required, making the entire loop self-supervised.

The framework's self-evolving loop enables continuous improvement without human annotation. On benchmarks like CodeContests, MBPP, and LiveCodeBench, ACE consistently outperforms strong solver-verifier baselines with 3-7% absolute gains in pass@1, and achieves even larger improvements on out-of-distribution benchmarks—all while maintaining competitive inference efficiency. This adversarial approach shifts the paradigm from passive test confirmation to active failure discovery, potentially enabling more robust and autonomous code generation. For tech professionals, ACE represents a significant step toward self-improving coding agents that learn from their own mistakes without expensive human feedback or curated datasets.

Key Points

ACE uses a solver-adversary architecture where the LLM both writes code and generates adversarial unit tests to actively induce failures.
Training requires no ground-truth code or external reward models; supervision is derived purely from execution outcomes.
Achieves 3-7% absolute gains in pass@1 on CodeContests, MBPP, and LiveCodeBench, with larger improvements on out-of-distribution tasks.

Why It Matters

Self-evolving LLMs that learn from their own mistakes could reduce reliance on expensive human-annotated coding data.

Read Original Article

ACE Framework Lets LLMs Self-Improve by Generating Adversarial Tests

Why It Matters

Related Articles

🚀 Stay Ahead in AI