Anthropic's Mythos model evaluation sparks U.S. agency clash over AI control
METR evaluation shows Mythos near saturation, while Commerce and Intelligence battle for oversight.
The article from 'Cyber Lack of Security and AI Governance' highlights the convergence of two major AI stories: the evaluation of Anthropic's latest frontier model, code-named Mythos (likely Claude Mythos Preview), and the escalating bureaucratic battle over AI regulation in the U.S. government. METR’s evaluation of an early Mythos Preview placed its 50%-success-rate time horizon at over 16 hours, pushing the limits of METR’s current task suite. At an 80% success rate, Mythos remained under 4 hours, and even at 99% success rate, under 5 minutes on simpler tasks. This suggests the model is modestly above trend, but evaluators caution that results are affected by task selection and reliability thresholds. Notably, the UK AISI found a significant gap between the early version it originally reviewed and the final Mythos release, implying ongoing invisible improvements behind the scenes.
The political dimension centers on the Trump administration’s reluctant move toward acknowledging catastrophic AI risks and the need for regulatory oversight. Two competing factions have emerged: the Department of Commerce, which controls export licenses and commercial access, and the Intelligence community, which prioritizes national security. Both are fighting for authority to supervise frontier model releases and decide who gets access to the most powerful models. The article also references a 'Prior Restraint Era' and a 'Quest for Sane Regulations,' suggesting that early steps—like requiring government approval for certain deployments—are being debated. With Mythos’s capabilities threatening to outpace current evaluation methods, the governance question becomes urgent: how to balance innovation, security, and oversight when the models themselves are evolving faster than the regulatory system can adapt.
- METR evaluated an early Claude Mythos Preview with a 50% time horizon of at least 16 hours, near the maximum measurable range with current tasks.
- UK AISI identified a substantial capability gap between the early preview and final version of Mythos, indicating rapid continuous improvement.
- The Trump administration is being forced into AI governance, with Commerce and Intelligence agencies competing for control over frontier model access and oversight.
Why It Matters
Infighting over AI regulation risks leaving frontier model oversight fragmented as capabilities outpace governance frameworks.