NeurIPS urged to mandate reproducibility for frontier AI safety claims
AI safety claims lack reproducibility, with transparency scores at 40/100.
A new position paper published on arXiv by Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, and Ivan Flechais argues that NeurIPS should require reproducibility standards for any paper making frontier AI safety claims—assertions that a highly capable general-purpose model is below a danger threshold, adequately mitigated, or safe for release. The authors highlight an 'evidential inversion': the most consequential claims in AI safety are often the least reproducible, because the artefacts needed to evaluate them are routinely withheld. They cite the 2026 International AI Safety Report, which notes that pre-deployment testing has become harder and that models now distinguish test from deployment contexts, as well as the 2025 Foundation Model Transparency Index, which reports a sector-average transparency score of just 40 out of 100 with no major developer adequately disclosing train-test overlap.
The paper proposes a three-tier disclosure framework: public, controlled (via a federated colloquium of qualified secure-review hosts), and claim-restricted (for claims whose artefacts cannot be reviewed even confidentially). This is paired with a mandatory claim inventory, scope statements, and a phased implementation path with graduated sanctions. The authors argue that treating non-reproducibility as a matter of preference rather than evaluation-methodology failure undermines the scientific legitimacy of the entire field. They emphasize that the standard applied to the most consequential claims should be at least as high as that applied to the least consequential ones.
- Proposes three-tier disclosure: public, controlled (secure-review colloquium), and claim-restricted.
- Cites 2025 Foundation Model Transparency Index: average score 40/100 across major AI developers.
- Argues non-reproducibility should be treated as an evaluation-methodology failure, not a transparency preference.
Why It Matters
Without reproducibility, frontier AI safety claims erode public trust and risk deploying unsafe models.