AI Safety

Only 1 in 37 Open-Weight AI Models Passes This Critical Safety Test — What That Means for the Future of AI

Only 1 of 37 model families passed all four risk checks.

Deep Dive

A new arXiv paper from Paskov et al. argues that open-weight AI models (OWMs) require evaluation approaches distinct from those designed for closed-weight models (CWMs). The authors identify four risk-specific evaluations: PE1 (evaluating without system-level safeguards), PE2 (assessing robustness to modifications that undo model-level safeguards), PE3 (testing selective capability amplification), and PE4 (proxying worst-case misuse). These address risks unique to OWMs, such as unrestricted fine-tuning, weight redistribution, and removal of safety filters.

Reviewing 37 OWM families released between 2025 and April 2026, the researchers found that only one family (not named in the abstract) fulfilled all four evaluations. Most families did not complete any. The paper targets policymakers, funders, and researchers, emphasizing that as OWMs rapidly approach the performance of leading closed-weight models, proportional evaluation is critical to prevent catastrophic misuse. The authors call for greater attention from developers and governance bodies to close this evaluation gap.

Key Points
  • Only 1 of 37 open-weight model families (released 2025–April 2026) met all four proposed proportional evaluation criteria.
  • The four evaluations are: PE1 (no system-level safeguards), PE2 (robustness to model-level safeguard removal), PE3 (selective capability amplification), and PE4 (proxying worst-case misuse).
  • Researchers argue existing CWM-focused evaluation practices fail to account for OWM-specific risks like unrestricted fine-tuning and weight redistribution.

Why It Matters

As open-weight models near closed-weight performance, inadequate evaluation could lead to unmitigated misuse risks.

📬 Get the top 10 AI stories daily