Developer Tools

Researchers' SACS dataset tackles AI's code smell problem with 30k+ semi-automatic samples

⚡New semi-automatic method creates high-quality dataset for training AI to spot buggy code patterns.

Deep Dive

Researchers Hanyu Zhang and Tomoji Kishi built SACS, a new open-source dataset for training AI models to detect 'code smells'—buggy patterns like Long Method and Large Class. Using a semi-automatic generation approach, it contains over 10,000 labeled samples for each of three common smells. This provides a large-scale, high-quality benchmark to improve automated code refactoring and software maintenance tools.

Why It Matters

Enables better AI coding assistants by providing reliable training data to spot and fix bad code patterns automatically.

📬 Get the top 10 AI stories daily