Procedural Refinement by LLM-driven Algorithmic Debugging for ARC-AGI-2
A new neuro-symbolic technique combines LLMs with formal debugging to solve complex AI reasoning tasks.
A team of researchers has introduced a novel neuro-symbolic framework called Abduction-Based Procedural Refinement (ABPR) that significantly improves how large language models (LLMs) debug and repair code. The method addresses a key weakness: LLMs often rely on "plausible reasoning" for fixes, which can fail on complex tasks. ABPR instead integrates an LLM with a meta-interpreter grounded in Udi Shapiro's Algorithmic Program Debugging (APD) theory. This system materializes a program's execution into compact, declarative, tree-structured traces, creating a formal foundation for stepwise procedural refinement.
The researchers evaluated ABPR on the demanding ARC-AGI-2 benchmark, which tests strong abstraction and algorithmic reasoning. They used Prolog as the target language due to its declarative semantics, which are well-suited to APD. Remarkably, when paired with Google's Gemini-3-Flash model, ABPR achieved a Pass@2 score of 56.67%. This performance is notable because contemporary LLMs typically underperform in Prolog, highlighting the method's ability to overcome model limitations through structured reasoning.
This work represents a meaningful shift towards integrating classical formal methods with modern neural models. The resulting system is more auditable and reliable than conversation-based code repair, as the debugging process becomes an explicit, traceable procedure. It demonstrates that coupling LLMs with symbolic reasoning engines can unlock new capabilities in complex program synthesis, moving beyond the black-box nature of pure LLM approaches.
- ABPR combines an LLM (Gemini-3-Flash) with a formal algorithmic debugger based on Udi Shapiro's APD theory.
- The system achieved a 56.67% Pass@2 score on the challenging ARC-AGI-2 benchmark using Prolog.
- It creates auditable, tree-structured execution traces for stepwise repair, moving beyond LLM "plausible reasoning."
Why It Matters
This hybrid approach makes AI code generation more reliable and auditable for complex, real-world software engineering tasks.