From History to State: Constant-Context Skill Learning for LLM Agents
AI agents learn reusable skills without needing full history—privacy and efficiency improved.
LLM agents operating browsers, files, and tools face a fundamental tension: cloud models execute complex multi-step workflows but expose sensitive intermediate context to external APIs, while local models preserve privacy but underperform. Both approaches also waste tokens on long skill prompts and growing histories. A new paper from researchers led by Haoyang Xie introduces constant-context skill learning—a context-to-weights framework that distills recurring workflows into lightweight task-family modules. Instead of conditioning inference on a full history, agents process only the current observation and a compact state block maintained by a deterministic tracker. This tracker maps task progress to a fixed-size state and supplies aligned subgoal rewards, enabling step-level supervised fine-tuning followed by online reinforcement learning.
The method was validated on three benchmarks—ALFWorld, WebShop, and SciWorld—using Qwen3-4B, Qwen3-8B, and Llama-3.1-8B. With Qwen3-8B and SFT+RL, success rates hit 89.6% on unseen ALFWorld tasks, 76.8% on WebShop, and 66.4% on SciWorld, matching or exceeding prior agent-training results while reducing prompt tokens per turn by 2–7× relative to controlled ReAct baselines. By shifting procedural context from prompts into weights, the framework enables private, cost-effective personal assistants that retain cloud-level reliability without leaking history.
- Reduces prompt tokens per turn by 2–7× vs. ReAct baselines, drastically lowering API costs and latency.
- Achieves 89.6% unseen success on ALFWorld and 76.8% on WebShop using Qwen3-8B with SFT+RL.
- Enables local LLM agents to match cloud-level performance without exposing sensitive intermediate context.
Why It Matters
Constant-context learning makes private, cost-efficient personal AI assistants viable without sacrificing capability.