Struggling to reproduce paper results before improving them — stuck below reported accuracy [R]
Reported 77% accuracy, but after careful tuning, only 73% achieved.
A PhD student in AI/computer vision shares a common but frustrating wall: before improving a published paper's accuracy, they need to reproduce its baseline. The paper claims ~77% accuracy, but after repeated runs and careful tuning of implementation details, preprocessing, hyperparameters, and even random seeds, they consistently hit only ~73%. They've double-checked every documented step and tried reaching out to the paper's author for missing details, but received no response. The student feels unable to justify any 'improvement' when their baseline is already below the reported number.
This reproducibility gap is a known issue in AI research, especially in computer vision where small differences in evaluation protocol, data splits, or even library versions can shift results by several percent. Without access to the original code or detailed response from authors, students and researchers face an uphill battle to validate prior work. The post resonates with many who have experienced similar frustrations, and the advice often includes trying alternative codebases, relaxing precision targets, or treating the reported baseline as a rough guide rather than an exact target. The core lesson: always build a margin for reproducibility into project planning.
- Reported ~77% accuracy, student gets ~73% after multiple runs and tuning
- Student checked implementation details, preprocessing, hyperparameters, and random seeds
- Author unresponsive to requests for missing details leaving student stuck
Why It Matters
Highlights reproducibility challenges in AI research that can waste months of effort.