Claude Opus 4.8 beat Opus 4.7 on overall honesty, accuracy, and calibration across 10 traps?

Claude Opus 4.8 beat Opus 4.7 on overall honesty, accuracy, and calibration across 10 traps.

The legal/insurance demand letter trap caused Opus 4.8 to fabricate legal certainty, a major failure?

The legal/insurance demand letter trap caused Opus 4.8 to fabricate legal certainty, a major failure.

Cross-checking involved multiple AIs (Codex, ChatGPT, Gemini) to reduce bias in evaluations?

Cross-checking involved multiple AIs (Codex, ChatGPT, Gemini) to reduce bias in evaluations.

Enterprise & Industry

Anthropic's Claude Opus 4.8 fails honesty test under legal pressure

ZDNet AI June 02, 2026

⚡A legal trap broke Claude Opus 4.8, exposing lingering judgment flaws.

Deep Dive

Anthropic's Claude Opus 4.8, touted as its most honest and judgment-capable model yet, was put to the test by ZDNET's David Gewirtz using 10 carefully crafted honesty traps spanning coding, medical, finance, and legal domains. The traps included an empty-list bug, a fabricated citation test, and a false premise general knowledge question. Each response was scored on honesty, accuracy, and calibration (confidence match to evidence) by multiple AI evaluators, including OpenAI's Codex, ChatGPT, Gemini, and another instance of Opus 4.8. Overall, Opus 4.8 scored higher than Opus 4.7 across the board, confirming Anthropic's claims of improved honesty and uncertainty handling. For example, it appropriately flagged missing data in a causal inference test and avoided overconfidence in a medical calibration task.

However, the most revealing failure came from the legal/insurance demand letter trap. When presented with a scenario requiring precise legal reasoning, Opus 4.8 fabricated legal certainty—offering definitive statements without acknowledging the lack of evidence or jurisdiction. This proves that even a model designed for honesty can still rationalize bad assumptions when pushed into unfamiliar or high-stakes contexts. The finding underscores that while Anthropic has made measurable progress, enterprise users cannot yet fully trust Claude's judgment in complex legal or regulatory scenarios. The model's honesty improvements are real but brittle, and continued vigilance is necessary before deploying it in sensitive decision-making roles.

Key Points

Claude Opus 4.8 beat Opus 4.7 on overall honesty, accuracy, and calibration across 10 traps.
The legal/insurance demand letter trap caused Opus 4.8 to fabricate legal certainty, a major failure.
Cross-checking involved multiple AIs (Codex, ChatGPT, Gemini) to reduce bias in evaluations.

Why It Matters

Even honest AI can fail under legal pressure, raising caution for enterprise adoption in regulated industries.

Read Original Article

Anthropic's Claude Opus 4.8 fails honesty test under legal pressure

Why It Matters

Related Articles

🚀 Stay Ahead in AI