Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents
New framework lets AI agents forget specific states, actions, or entire environments to protect privacy.
A research team led by Dayong Ye has introduced a pioneering framework called 'Secure Forgetting' to address a critical gap in deploying LLM-based agents. As these AI agents become integral to real-world applications, they accumulate sensitive or outdated knowledge, raising significant privacy and security concerns. The paper, published on arXiv, formally initiates research into 'LLM-based agent unlearning,' proposing a comprehensive system that allows agents to selectively forget previously learned information.
The framework categorizes unlearning into three distinct contexts: state unlearning (forgetting specific states or items), trajectory unlearning (forgetting sequences of actions), and environment unlearning (forgetting entire environments or categories of tasks). At its core is a natural language-based method that trains a conversion model. This model transforms high-level user requests—like 'forget my credit card details from that transaction'—into actionable prompts that guide the agent through a controlled forgetting process.
To rigorously test the framework's robustness, the researchers also introduced an 'unlearning inference adversary.' This adversary attempts to craft clever prompts and query the agent to infer what knowledge has been erased. Experimental results demonstrate that the Secure Forgetting approach successfully enables agents to forget targeted knowledge while preserving their performance on unrelated tasks. Crucially, it also prevents the adversarial agent from successfully reconstructing the forgotten information.
- Proposes three unlearning contexts: state (specific items), trajectory (action sequences), and environment (entire task categories).
- Uses a natural language conversion model to turn user requests into executable 'forgetting' prompts for the agent.
- Introduces and defends against an 'unlearning inference adversary' to test the robustness of the forgetting process.
Why It Matters
Enables safer deployment of AI agents in finance, healthcare, and customer service by ensuring they can comply with data privacy regulations like 'right to be forgotten.'