Catches multi-step attacks?

SRM detects threats like slow data exfiltration that decompose harmful intent across individually-safe actions, which standard per-action gates miss.

Perfect benchmark performance?

When added to the ILION safety system, SRM achieved an F1 score of 1.0000 with 0% false positives on a benchmark of 80 multi-turn attack sessions.

Extremely low overhead?

The module adds deterministic session-level safety with a processing cost of under 250 microseconds per agent turn, requiring no new models or training.

Research & Papers

Session Risk Memory (SRM) stops AI agent attacks with 100% accuracy, 0% false positives

arXiv cs.AI March 25, 2026

⚡New safety module catches multi-step AI threats that slip past single-action checks, adding under 250μs overhead.

Deep Dive

Researcher Florin Adrian Chitan has introduced Session Risk Memory (SRM), a novel module designed to plug a critical security gap in AI agent systems. Current deterministic safety gates check if a single AI action is safe in isolation, but they fail against sophisticated attacks that break harmful intent into a series of individually harmless steps. SRM solves this by adding temporal awareness, maintaining a compact 'semantic centroid' that represents the evolving behavioral profile of an entire agent session. It accumulates a risk signal over time using an exponential moving average, allowing it to detect slow-burn threats like data exfiltration or privilege escalation that unfold across multiple turns.

SRM is a lightweight, deterministic add-on that requires no additional AI models, training, or probabilistic inference. It operates on the same semantic vector representations as the underlying safety gate, making it efficient to integrate. In benchmark testing on 80 multi-turn attack sessions, SRM was paired with the ILION safety system. The results were striking: ILION+SRM achieved a perfect F1 score of 1.0000 with a 0% false positive rate, compared to the stateless ILION alone which scored F1=0.9756 with a 5% false positive rate. Crucially, this near-perfect detection comes with minimal computational cost, adding under 250 microseconds of overhead per agent action. The framework formally distinguishes between 'spatial' (per-action) and 'temporal' (trajectory-level) authorization, providing a new foundation for securing autonomous AI agents.

Key Points

Catches multi-step attacks: SRM detects threats like slow data exfiltration that decompose harmful intent across individually-safe actions, which standard per-action gates miss.
Perfect benchmark performance: When added to the ILION safety system, SRM achieved an F1 score of 1.0000 with 0% false positives on a benchmark of 80 multi-turn attack sessions.
Extremely low overhead: The module adds deterministic session-level safety with a processing cost of under 250 microseconds per agent turn, requiring no new models or training.

Why It Matters

Enables secure deployment of autonomous AI agents by preventing sophisticated, multi-step attacks that current safety systems cannot see.

Read Original Article

Session Risk Memory (SRM) stops AI agent attacks with 100% accuracy, 0% false positives

Why It Matters

Related Articles

🚀 Stay Ahead in AI