PITMuS generates fresh bug datasets by reconstructing source-level mutants from PIT
New tool converts bytecode-level mutations into structured source-code bug pairs for LLM training
LLM-based software engineering relies on paired buggy/fixed code artifacts, but existing benchmarks like Defects4J are static and increasingly vulnerable to contamination as models train on public corpora. PITMuS, developed by Tasfia Tasnim and Soneya Binta Hossain, solves this by automatically generating fresh, cutoff-aware datasets from any Java system. It leverages PIT, a state-of-the-practice mutation testing tool that injects mutants at bytecode level, but reconstructs each mutant back to source code using debug information from compiled class files. This produces structured records with buggy/fixed code, method under test, documentation, and metadata for downstream training and evaluation.
The tool was evaluated on eight open-source Java systems but is designed to work on any Java project where PIT can be integrated. By producing cutoff-aware datasets, it helps prevent data leakage and contamination in model training. PITMuS outputs structured JSON-like records suitable for training bug localization, repair, and test generation models. This approach offers a practical, automated alternative to static benchmarks, enabling continuous generation of fresh training data as software evolves. The paper is available on arXiv under the Software Engineering (cs.SE) subject.
- Combines PIT mutation testing XML metadata with bytecode debug info to reconstruct source-level edits
- Automatically generates structured datasets with paired buggy/fixed code, context, and metadata
- Evaluated on eight open-source Java systems; applicable to any Java project with PIT integration
Why It Matters
Enables continuous generation of fresh, cutoff-aware bug datasets to combat contamination in LLM-based software engineering.