Research & Papers

New study shows AI can be crippled by secretly removing legal actions

Attack deletes decision options before the agent acts, causing irreversible damage across algorithms.

Deep Dive

A new paper by researcher Arahan Kujur, submitted to arXiv, reveals a novel vulnerability in self-play reinforcement learning: adversarial action masking. In this attack, an adversary selectively removes legal actions from a victim agent's action set before the agent can act, effectively deleting decision options. Unlike traditional observation or action perturbations that modify existing inputs, this removal technique eliminates choices entirely, making it harder for the victim to adapt. The attack was tested across multiple RL algorithms, including Q-learning, PPO, NFSP, neural NFSP, and DQN, in environments ranging from poker games with 6 to 5,531 information states to two non-poker domains.

The results are striking: learned masking consistently caused significantly more damage than random masking or learned perturbation baselines. The attack also transferred across different agent types and was amplified by self-play, meaning agents trained against themselves became more vulnerable. Crucially, even when victims were given extended training under masked conditions, they showed no recovery—the damage was permanent. The researcher identified that the adversary targets high-value decision points, measured by two new metrics: reach-weighted contingent action capacity (CAC_w) and a value-weighted refinement (CAC_v). These findings highlight that action availability is a separate, poorly understood robustness surface in multi-agent RL, with serious implications for AI safety.

Key Points
  • Attack removes legal actions from victim's set before acting, causing more damage than perturbations or random masking.
  • Tested across five RL algorithms (Q-learning, PPO, NFSP, etc.) and games with up to 5,531 information states.
  • No recovery under extended masked training; attack transfers between agents and amplifies in self-play.

Why It Matters

Critical for AI safety in multi-agent systems—autonomous driving, trading bots—where removing actions could cause catastrophic failures.