Reproduction Test Generation for Java SWE Issues
250 Java bugs get automated reproduction tests, closing the Python gap...
The paper introduces TDD-Bench-Java, the first benchmark for repository-level reproduction test generation in Java, with 250 instances from popular open-source repositories. Its solution, e-Otter++ for Java, adapts a state-of-the-art Python reproduction test generator to create execution-based tests that confirm bug presence before fixes and absence after. Results include both empirical performance on TDD-Bench-Java and validation on a contamination-free proprietary dataset, promising better diagnosis and validation for Java software development.
- TDD-Bench-Java is the first reproduction test generation benchmark for Java, with 250 instances from popular open-source repos
- e-Otter++ for Java adapts a Python SOTA generator to produce execution-based tests for bug verification
- Validation includes both benchmark results and a contamination-free proprietary dataset from industry
Why It Matters
Automates bug test creation for Java, the backbone of enterprise software, speeding up diagnosis and validation.