Multi-objective genetic prompt optimization creates realistic multi-turn deceptive sequences, validated by a human study?

Multi-objective genetic prompt optimization creates realistic multi-turn deceptive sequences, validated by a human study.

Three geometric features (angular coverage, distance ratio, linearity) plus pairwise similarity detect deception with 89% recall?

Three geometric features (angular coverage, distance ratio, linearity) plus pairwise similarity detect deception with 89% recall.

Lightweight classifier achieves F1 scores 0.74-0.86 across varied scenarios, enabling transparent screening without heavy training?

Lightweight classifier achieves F1 scores 0.74-0.86 across varied scenarios, enabling transparent screening without heavy training.

Research & Papers

New geometric detection system catches multi-turn LLM deception with 89% recall

arXiv stat.ML May 28, 2026

⚡Multi-turn probing attacks leave a stable geometric footprint detectable by a lightweight classifier.

Deep Dive

A new paper from researchers Surender Suresh Kumar and Mary L. Cummings tackles a critical blind spot in LLM safety: multi-turn deception. While most safety defenses are trained on single-turn prompts, real-world attacks often unfold as indirect, multi-turn probing sequences. The authors introduce a unified pipeline that first generates realistic multi-turn deceptive question sets using multi-objective genetic prompt optimization with co-evolving mutation operators. A human study confirmed the dataset's quality, revealing that early generations produced the most convincing deception and that practical constraints like adherence filtering and ordering effects matter.

The core contribution is a detection method that exploits geometric signatures in embedding space. By analyzing three geometric features—angular coverage, distance ratio, and linearity—alongside pairwise similarity statistics, a lightweight feed-forward classifier achieved consistently high recall (0.89) and test-time F1 scores ranging from 0.74 to 0.86 across base, reworded, and truncated three-turn scenarios. This supports the hypothesis that multi-turn deceptive intent leaves a stable geometric footprint, enabling transparent, low-cost screening without expensive end-to-end training. The authors also discuss responsible uses, limitations, and the need for larger human-evaluated datasets.

Key Points

Multi-objective genetic prompt optimization creates realistic multi-turn deceptive sequences, validated by a human study.
Three geometric features (angular coverage, distance ratio, linearity) plus pairwise similarity detect deception with 89% recall.
Lightweight classifier achieves F1 scores 0.74-0.86 across varied scenarios, enabling transparent screening without heavy training.

Why It Matters

Enables low-cost, transparent screening for multi-turn LLM deception without expensive end-to-end training.

Read Original Article

New geometric detection system catches multi-turn LLM deception with 89% recall

Why It Matters

Related Articles

🚀 Stay Ahead in AI