SCDBench contains 600 real-world Solidity contracts with bytecode, source code, and replayable semantic checkpoints for reproducible evaluation?

SCDBench contains 600 real-world Solidity contracts with bytecode, source code, and replayable semantic checkpoints for reproducible evaluation.

Evaluation covers four stages?

format completeness, compilability, ABI recovery, and semantic consistency via differential replay.

Best frontier model (Claude Opus 4.7) perfectly decompiles only 42/600 contracts; same-model compilation repair substantially boosts performance at low cost?

Best frontier model (Claude Opus 4.7) perfectly decompiles only 42/600 contracts; same-model compilation repair substantially boosts performance at low cost.

Developer Tools

SCDBench: New Benchmark Shows LLMs Fail 93% at Smart Contract Decompilation

arXiv cs.SE May 29, 2026

⚡Frontier models like Claude Opus 4.7 only perfectly decompile 42 of 600 contracts

Deep Dive

SCDBench is a new benchmark designed to rigorously evaluate LLM-based smart contract decompilers. The dataset includes 600 real-world Solidity contracts with paired bytecode inputs, ground-truth source code, and replayable semantic checkpoints. It assesses decompiler outputs across four cumulative stages: format completeness, compilability, Application Binary Interface (ABI) recovery, and semantic consistency via differential replay. The benchmark addresses the growing problem of LLMs generating plausible-looking but semantically incorrect Solidity code that compiles and appears valid.

Testing frontier models—Claude Opus 4.7, GPT-5.3-Codex, and GLM-5 (with and without extended reasoning)—reveals that semantic consistency is far from solved. The best-performing model perfectly decompiled only 42 out of 600 contracts (7%). However, introducing same-model compilation repair significantly improved performance with minimal additional cost. These results highlight a critical gap in current LLM capabilities for blockchain security applications, where reliable decompilation is essential for auditing and transparency.

Key Points

SCDBench contains 600 real-world Solidity contracts with bytecode, source code, and replayable semantic checkpoints for reproducible evaluation.
Evaluation covers four stages: format completeness, compilability, ABI recovery, and semantic consistency via differential replay.
Best frontier model (Claude Opus 4.7) perfectly decompiles only 42/600 contracts; same-model compilation repair substantially boosts performance at low cost.

Why It Matters

Reliable smart contract decompilation is critical for blockchain security; current LLMs fall far short.

Read Original Article

SCDBench: New Benchmark Shows LLMs Fail 93% at Smart Contract Decompilation

Why It Matters

Related Articles

🚀 Stay Ahead in AI