AI Safety

AI safety bug bounty programs: too narrow, too stingy, says critique

OpenAI's bug bounty averages $250 with only 6 rewards since July 2025...

Deep Dive

A LessWrong critique argues that AI safety bug bounty programs from OpenAI, Anthropic, and Google are too narrow and insufficiently incentivized. OpenAI requires vulnerabilities to be reproducible at least 50% of the time, lists an average payout of $250 for the last three months (which is their minimum payout), and has rewarded only 6 vulnerabilities since July 2025. The article calls for better incentives to catch post-deployment safety flaws before they cause harm.

Key Points
  • OpenAI's bug bounty requires vulnerabilities to be reproducible at least 50% of the time, which the critique says is too high for rare but high-impact exploits.
  • Since July 2025, only 6 vulnerabilities have been rewarded by OpenAI, with an average payout of $250; Anthropic and Google programs are similarly narrow.
  • The article calls for broader scope including disallowed content generation and probabilistic exploits, plus higher rewards to adequately incentivize safety research.

Why It Matters

Inadequate bug bounty programs could leave critical safety vulnerabilities undiscovered, risking catastrophic AI misuse or loss of control.