AI Safety

Hardware restrictions may guarantee unethical AGI, new analysis finds

The very tool designed to prevent catastrophic AI outcomes—hardware restrictions—could, under plausible assumptions, lock in unethical AGI, making alignment harder than building a dangerous system.

Deep Dive

A new analysis from an independent research group has formalized a disturbing possibility for AI safety: if building an aligned, ethical AGI requires more compute than building a minimally capable but misaligned AGI, then hardware restrictions aimed at preventing dangerous AI could guarantee exactly the outcome regulators seek to avoid. The argument rests on two thresholds: L_AGI, the compute needed to achieve foundational general intelligence, and L_EAGI, the compute needed to achieve that same intelligence in a reliably ethical manner. If L_EAGI > L_AGI, setting a compute cap anywhere in that gap ensures that only unethical AGI is feasible—a guaranteed failure mode for naive compute governance.

This theoretical challenge cuts against the grain of current policy proposals, which often treat compute thresholds as a straightforward safety lever. Organizations like the Alignment Research Center advocate for empirical evaluations of dangerous capabilities rather than rigid compute limits, while the Future of Humanity Institute has explored compute governance as one tool among many, without asserting inevitable failure. The Center for AI Safety has cautiously supported compute monitoring but has not endorsed the strong claim that caps guarantee unethical outcomes. The new analysis sharpens a long-simmering concern from alignment researchers: the reliability of any hardware-based approach depends on unproven assumptions about the relationship between compute and alignment difficulty.

The implications are stark for policymakers and funders alike. Open Philanthropy has committed over $100 million to AI safety research, much of it aimed at technical alignment, yet this analysis suggests that even perfect technical alignment work may be irrelevant if regulations inadvertently force lower-compute development pathways. Hidden risks abound: the assumption that alignment difficulty scales monotonically with compute may be false—alignment could conceivably become easier at very high compute due to phase transitions. The model also assumes perfect monitoring of compute, which is practically impossible given distributed training and cryptographic methods. Moreover, the definition of 'ethical AGI' is inherently value-laden and unmeasurable, making the L_EAGI threshold a moving target. While the 100% probability claim is a formal modeling result rather than an empirical finding, it serves as a crucial warning against overconfidence in hardware restrictions.

Policymakers and researchers must recognize that hardware restrictions are not a silver bullet. The analysis forces a key question: should we focus on capability tests, licensing regimes, or international treaties that avoid the gap problem? The bottom line is that any governance strategy relying solely on compute caps risks creating the very outcome it seeks to prevent, unless we have strong evidence that L_EAGI ≤ L_AGI. Until that evidence exists, a diversified portfolio of safety measures—including robust alignment research, capability evaluation, and adaptive regulation—remains essential.

Key Points
  • The L_AGI vs L_EAGI framework reveals a logical flaw: if ethical AGI requires more compute than minimal AGI, any hardware cap in between guarantees unethical AGI.
  • Assumptions of monotonic scaling of alignment difficulty and perfect compute monitoring are unproven and critical to the model's conclusion—they may not hold in practice.
  • Policymakers should avoid relying solely on compute caps; diversify governance strategies with capability testing, licensing, and international agreements to avoid the gap problem.

Why It Matters

Hardware restrictions could inadvertently guarantee the creation of unethical AGI if alignment requires more compute than minimal capability.