SACS: A Code Smell Dataset using Semi-automatic Generation Approach
New semi-automatic method creates high-quality dataset for training AI to spot buggy code patterns.
Deep Dive
Researchers Hanyu Zhang and Tomoji Kishi built SACS, a new open-source dataset for training AI models to detect 'code smells'—buggy patterns like Long Method and Large Class. Using a semi-automatic generation approach, it contains over 10,000 labeled samples for each of three common smells. This provides a large-scale, high-quality benchmark to improve automated code refactoring and software maintenance tools.
Why It Matters
Enables better AI coding assistants by providing reliable training data to spot and fix bad code patterns automatically.