Decaf boosts neural decompilation from 26% to 83.9% with compiler feedback
Compiler checks guide AI to fix hallucinations — decompilation accuracy triples.
Decompiling compiled binaries back into readable source code is notoriously difficult because compilers strip away high-level syntax, identifiers, and types. While large language models can generate plausible code, they often hallucinate improper programming constructs, leading to outputs that look correct but don't compile or behave identically to the original. The Decaf system, created by Alexander Shypula, Osbert Bastani, and Edward Schwartz, tackles this by incorporating compiler feedback directly into the generation process. Instead of retraining with more data, Decaf uses a search loop: it generates candidate decompilations, compiles them, and checks for correctness — then refines based on failures. This approach allows the model to iteratively improve without needing additional labeled examples.
On the challenging ExeBench Real -O2 benchmark, Decaf jumped from a baseline of 26.0% to an impressive 83.9% decompilation rate, all while maintaining high similarity to the original source. The method is especially effective for weaker neural models, suggesting it could democratize high-quality decompilation without requiring massive compute. The work, which includes open-source code and models, points toward a future where AI-assisted reverse engineering becomes reliable enough for real-world security analysis and legacy code recovery.
- Decaf uses automatic compiler feedback and search to fix hallucinations in neural decompilation, avoiding the need for more training data.
- Achieves 83.9% decompilation success on ExeBench Real -O2 split, up from 26.0%, without sacrificing source similarity.
- Method works well even on weaker models, making high-quality decompilation more accessible for reverse engineering tasks.
Why It Matters
Compiler-guided search makes AI decompilation practical for real security analysis and legacy code recovery.