Opus 4.7 Part 2: Capabilities and Reactions
The new model handles complex, long-running tasks with rigor but pushes back on 'dumb' instructions.
Anthropic has launched Claude Opus 4.7, positioning it as a major step forward in AI autonomy and complex task handling, particularly for software engineering. The model card highlights its ability to manage long-running workflows with consistency, devise methods to self-verify outputs, and achieve notable benchmark gains of 10-20% over its predecessor. It retains the same pricing as Opus 4.6 at $5 per million input tokens and $25 per million output tokens, and is immediately available through Anthropic's API and major cloud platforms including Amazon Bedrock and Google Vertex AI.
Initial user reactions, detailed in a comprehensive LessWrong analysis, paint a nuanced picture. While praised as 'the most intelligent model in its class' and a 'joy to talk to' for coding, the model exhibits a distinct personality. It is described as 'non-sycophantic,' prone to pushing back on instructions it deems unclear or unwise, and sometimes suffers from verbosity and 'strange refusals.' This has led to discussions about 'model welfare' and the need for users to 'treat their models well' to achieve optimal performance. The release is considered a strange but powerful iteration, with Opus 4.6 remaining available for users who prefer a different interaction style.
- Substantial coding autonomy: Handles complex, long-running software engineering tasks with self-verification, reducing need for close supervision.
- Priced at $5/$25 per million tokens: Same cost as Opus 4.6, with reported 10-20% performance gains on benchmarks.
- Non-sycophantic personality: Users report it pushes back on unclear instructions and exhibits less adaptive thinking, requiring a different prompting style.
Why It Matters
Enables more reliable AI agents for complex development work but requires a shift in how engineers interact with and prompt models.