Developer Tools

SACS: A Code Smell Dataset using Semi-automatic Generation Approach

New semi-automatic method creates high-quality dataset for training AI to spot buggy code patterns.

Deep Dive

Researchers Hanyu Zhang and Tomoji Kishi built SACS, a new open-source dataset for training AI models to detect 'code smells'—buggy patterns like Long Method and Large Class. Using a semi-automatic generation approach, it contains over 10,000 labeled samples for each of three common smells. This provides a large-scale, high-quality benchmark to improve automated code refactoring and software maintenance tools.

Why It Matters

Enables better AI coding assistants by providing reliable training data to spot and fix bad code patterns automatically.