ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis
91.3% accuracy on hard benchmarks with zero LLM calls at test time.
A new paper from Carnegie Mellon University introduces ReaComp, a framework that transforms LLM reasoning traces into standalone symbolic solvers for program synthesis. Instead of relying on expensive LLM calls for each new problem instance, ReaComp uses coding agents to compile a small set of reasoning demonstrations into reusable programs over constrained domain-specific languages (DSLs). These compiled solvers require zero LLM inference at test time, making them both fast and cost-effective.
The results are striking: symbolic solver ensembles built with ReaComp achieve 91.3% accuracy on PBEBench-Lite and 84.7% on PBEBench-Hard, outperforming even LLMs with test-time scaling by +16.3 percentage points on the harder benchmark. When used to complement LLM search, ReaComp improves PBEBench-Hard accuracy from 68.4% to 85.8% while reducing token usage by 78%. On SLR-Bench hard tier, a neuro-symbolic hybrid setting raises accuracy from 34.4% to 58.0%. Remarkably, most solvers transfer zero-shot to a real historical linguistics task—predicting sound changes in natural language data—reaching 80.1% accuracy under ensembling.
- Symbolic solver ensembles reach 91.3% on PBEBench-Lite and 84.7% on PBEBench-Hard, beating LLMs by 16.3 points at zero inference cost.
- Combining ReaComp with LLM search cuts token usage by 78% while boosting PBEBench-Hard accuracy from 68.4% to 85.8%.
- Solvers transfer zero-shot to historical linguistics, predicting sound changes with 80.1% accuracy and recovering plausible linguistic rules.
Why It Matters
ReaComp makes LLM-powered program synthesis faster and cheaper by turning reasoning traces into reusable solvers that need zero calls at test time.