Mistletoe attack stealthily collapses LLM speculative decoding speed
Researchers discover a vulnerability that kills AI acceleration without hurting output quality.
Deep Dive
Researchers at multiple institutions propose Mistletoe, a stealthy attack on speculative decoding — a key LLM speedup technique. It exploits imperfect draft-target alignment, using null-space projection to reduce average accepted token length and collapse speedup, while preserving output quality and perplexity, revealing a new mechanism-level vulnerability in LLM inference pipelines.
Key Points
- Exploits drafter-target mismatch in speculative decoding to collapse speedups from 2-3x down to near parity.
- Jointly optimizes a degradation objective and a semantic-preservation objective via null-space projection.
- Reduces average accepted length (τ) from ~3 to ~1, slashing throughput while maintaining output perplexity and quality.
Why It Matters
Reveals a new attack surface in LLM acceleration that existing defenses cannot detect—threatening production systems relying on speculative decoding.