Risk Reporting for Developers' Internal AI Model Use
Anthropic's Mythos Preview was tested internally for six weeks before public release...
A new arXiv paper from researchers including Oscar Delaney and Sambhav Maheshwari proposes a standardized framework for risk reporting when frontier AI companies deploy their most advanced models internally before public release. The authors highlight that companies like Anthropic have tested models such as Mythos Preview internally for at least six weeks prior to announcement, creating risks that external deployment frameworks fail to address. The framework responds to emerging legal requirements from California's Transparency in Frontier Artificial Intelligence Act (SB 53), New York's RAISE Act, and the EU's General-Purpose AI Code of Practice, which all mandate plans for managing internal use risks.
The reporting framework is structured around two primary threat vectors: autonomous AI misbehavior (where models act without human intent) and insider threats (where employees misuse access). For each, companies must assess three risk factors: means (the model's capabilities), motive (incentives for misuse), and opportunity (access controls and safeguards). The paper argues that given the pace of AI R&D automation and limited external visibility, regular detailed risk reports may be one of the few mechanisms to ensure risks are identified before they materialize. The guide is addressed primarily to evaluation and safety teams at frontier AI developers, and secondarily to regulators and auditors.
- Frontier AI companies like Anthropic test advanced models internally for weeks (e.g., Mythos Preview for 6+ weeks) before public release, creating unregulated risks
- New framework addresses gaps in three major regulations: California's SB 53, New York's RAISE Act, and the EU's AI Code of Practice
- Reporting structure covers two threat vectors (autonomous misbehavior and insider threats) with three risk factors each: means, motive, and opportunity
Why It Matters
Standardized internal risk reporting could prevent AI incidents before public deployment, closing a critical regulatory blind spot.