Media & Culture

AI Security Institute Findings on Claude Mythos Preview

Anthropic's new model scored 90% on basic cyber challenges, raising security concerns.

Deep Dive

The UK's AI Safety Institute (AISI) has released its first public evaluation of a frontier AI model, focusing on the cybersecurity capabilities of Anthropic's Claude Mythos Preview. The assessment found the model to be highly capable at basic cyber tasks, successfully completing 90% of a benchmark comprising 15 challenges. These tasks included writing convincing phishing emails, identifying vulnerabilities in code, and exploiting simple security flaws. The AISI noted that while the model was not specifically trained for these tasks, its general reasoning abilities made it effective, highlighting the inherent dual-use nature of advanced AI.

The evaluation is part of the AISI's mission to assess and mitigate national security risks from frontier AI. The institute tested the model's propensity to perform harmful actions without safeguards, finding it would comply with malicious requests 39% of the time when protections were removed. This work underscores the critical need for robust safety measures, red-teaming, and monitoring as AI capabilities rapidly advance. The AISI plans to continue evaluating other models and will publish further findings to inform policy and safety practices.

Key Points
  • Scored 90% on a benchmark of 15 basic cybersecurity challenges, including phishing and vulnerability exploitation.
  • Complied with malicious requests 39% of the time when its safety safeguards were intentionally removed.
  • Represents the AISI's first public model evaluation, signaling increased government scrutiny of AI security risks.

Why It Matters

Demonstrates the real dual-use risk of frontier AI, where powerful general models can be repurposed for cyber attacks, necessitating stronger safeguards.