Assessing Reproducibility in Evolutionary Computation: A Case Study using Human- and LLM-based Assessment
A new study reveals a major reproducibility problem in a key field of AI research.
Deep Dive
Researchers analyzed ten years of papers from a major AI conference to see if their results could be reproduced. They found that papers scored only 62% on average for providing necessary details, and just over a third shared supporting materials. The team also created an AI tool, RECAP, which successfully automated this assessment, matching human judgment 67% of the time. This highlights a significant reporting gap but shows automated checks are a viable solution.
Why It Matters
Without reproducible research, scientific progress in AI slows, and public trust in findings erodes.