ConcoLixir: LLM discovery oracles boost Python test coverage 17 points
LLM discovery oracle boosts concolic testing coverage 17 points for just $1.63.
Concolic testing combines concrete execution with symbolic constraint solving, but Python programs often hit limitations when library calls downgrade symbolic variables, or when complex operations like regex, checksums, or parsers resist solver analysis. The paper introduces ConcoLixir, a reactive LLM extension for Python concolic execution that acts as a discovery oracle rather than a solver replacement. It generates initial seeds, proposes concrete inputs after solver failures, and targets uncovered code when coverage plateaus. Each candidate is executed concolically, with observed coverage and path constraints guiding subsequent exploration.
Across benchmarks including synthetic, real-world, and library targets, ConcoLixir boosted mean line coverage by 8.6, 15.1, and 17.0 percentage points over the baseline concolic tester without an LLM oracle. The largest gains occurred near semantic and library boundaries. Crucially, the full evaluation cost only $1.63 in API charges, demonstrating that bounded LLM usage can complement symbolic reasoning without replacing it. This approach offers a practical, low-cost way to enhance automated software testing for Python projects.
- ConcoLixir uses an LLM as a discovery oracle to generate seeds, propose inputs after solver failures, and target uncovered code.
- Improved line coverage by 8.6, 15.1, and 17.0 percentage points across synthetic, real-world, and library targets.
- Total evaluation API cost was only $1.63, showing practical low-cost enhancement.
Why It Matters
LLMs can cheaply enhance automated testing by overcoming symbolic solver blind spots, especially near complex library boundaries.