Research & Papers

Mistletoe attack stealthily collapses LLM speculative decoding speed

Researchers discover a vulnerability that kills AI acceleration without hurting output quality.

Deep Dive

Researchers at multiple institutions propose Mistletoe, a stealthy attack on speculative decoding — a key LLM speedup technique. It exploits imperfect draft-target alignment, using null-space projection to reduce average accepted token length and collapse speedup, while preserving output quality and perplexity, revealing a new mechanism-level vulnerability in LLM inference pipelines.

Key Points
  • Exploits drafter-target mismatch in speculative decoding to collapse speedups from 2-3x down to near parity.
  • Jointly optimizes a degradation objective and a semantic-preservation objective via null-space projection.
  • Reduces average accepted length (τ) from ~3 to ~1, slashing throughput while maintaining output perplexity and quality.

Why It Matters

Reveals a new attack surface in LLM acceleration that existing defenses cannot detect—threatening production systems relying on speculative decoding.