Research & Papers

Assessing Reproducibility in Evolutionary Computation: A Case Study using Human- and LLM-based Assessment

A new study reveals a major reproducibility problem in a key field of AI research.

Deep Dive

Researchers analyzed ten years of papers from a major AI conference to see if their results could be reproduced. They found that papers scored only 62% on average for providing necessary details, and just over a third shared supporting materials. The team also created an AI tool, RECAP, which successfully automated this assessment, matching human judgment 67% of the time. This highlights a significant reporting gap but shows automated checks are a viable solution.

Why It Matters

Without reproducible research, scientific progress in AI slows, and public trust in findings erodes.