[D] Papers with no code
Viral post exposes empty GitHub repos and unreproducible SOTA claims in top-tier AI research.
A viral discussion on Reddit's Machine Learning community is exposing a critical flaw in modern AI research: the widespread acceptance of papers at major conferences without accompanying code. The post, titled 'Papers with no code,' argues that this practice makes it impossible to verify groundbreaking claims, especially for expensive models that claim SOTA (state-of-the-art) performance. The author points to three major risks: entirely fabricated results, accidental training on test data, or hidden evaluation errors.
The criticism centers on a specific example: a paper presenting a method to generate protein MSAs (Multiple Sequence Alignments) using RAG (Retrieval-Augmented Generation) at orders of magnitude faster speed than traditional software—a breakthrough that would be transformative for bioML. Despite providing a link to a GitHub repository on the OpenReview page, the repo is completely empty, and the authors have been unresponsive to requests for code or a release timeline.
This incident is framed not as an isolated case but as a symptom of a systemic problem. As training costs soar into the millions, the barrier to independent verification grows. The community is left to trust claims without the fundamental scientific principle of reproducibility. The post has sparked widespread agreement, with many researchers sharing similar frustrations about the 'publish-or-perish' culture prioritizing novel claims over robust, open science. The implication is a potential erosion of trust in published benchmarks and a slowdown in genuine progress, as researchers cannot build upon or validate supposed advancements.
- Major AI conferences are accepting papers with SOTA claims but no released code, blocking reproducibility.
- A cited protein MSA generation paper promises revolutionary RAG-based speed but links to an empty GitHub repo.
- Without code, the community cannot check for fabricated results, test data contamination, or methodological errors in expensive-to-train models.
Why It Matters
This undermines scientific progress, wastes research resources, and erodes trust in published AI breakthroughs that others cannot verify or build upon.