Fact-generation includes adversarial tests like capability evals (METR, UK AISI) or red-teaming, requiring third parties to ensure honest effort?

Fact-generation includes adversarial tests like capability evals (METR, UK AISI) or red-teaming, requiring third parties to ensure honest effort.

Evidence-analysis involves reviewing existing evidence without producing new facts, crucial for protecting proprietary info from stakeholders?

Evidence-analysis involves reviewing existing evidence without producing new facts, crucial for protecting proprietary info from stakeholders.

Axes help stakeholders (governments, boards) decide when third-party independence is essential vs. when in-house work suffices?

Axes help stakeholders (governments, boards) decide when third-party independence is essential vs. when in-house work suffices.

AI Safety

Buck's axes of variation in third-party AI risk assessment explained

LessWrong AI June 01, 2026

⚡Developers, stakeholders, and third parties face key distinctions in risk evaluation.

Deep Dive

Buck's post on LessWrong dissects the landscape of third-party AI risk assessment by introducing two primary axes: fact-generation and evidence-analysis. Fact-generation assessments aim to produce new evidence through adversarial or semi-adversarial methods, such as capability evaluations (e.g., METR, UK AISI) or red-teaming (e.g., classifier robustness tests). The core rationale for using third parties here is the structural need for an unconflicted actor to try hard and fail to demonstrate danger, ensuring the integrity of the evidence. Other reasons include specialized expertise (e.g., centralized red-teaming knowledge) or handling sensitive data (e.g., CBRN information) that developers shouldn't access.

Evidence-analysis assessments, on the other hand, evaluate existing evidence and arguments to form risk conclusions. This axis is more about interpretation and synthesis, where the third party acts as an auditor reviewing internal findings and providing conclusions to stakeholders without necessarily revealing proprietary details. The post highlights that the choice of axis depends on the developer's need to protect confidential information and the stakeholder's requirement for trustworthy, adversarial validation. Buck suggests these axes are not exhaustive but serve as a framework for designing effective oversight regimes, especially as AI systems become more capable and risks escalate. The discussion draws on conversations with Ajeya Cotra and Paul Christiano, indicating a deep, community-driven analysis of emerging governance challenges.

Key Points

Fact-generation includes adversarial tests like capability evals (METR, UK AISI) or red-teaming, requiring third parties to ensure honest effort.
Evidence-analysis involves reviewing existing evidence without producing new facts, crucial for protecting proprietary info from stakeholders.
Axes help stakeholders (governments, boards) decide when third-party independence is essential vs. when in-house work suffices.

Why It Matters

Frames how governments and labs can structure independent oversight to prevent AI risks as models become more capable.

Read Original Article

Buck's axes of variation in third-party AI risk assessment explained

Why It Matters

Related Articles

🚀 Stay Ahead in AI