Constant-context skill learning slashes prompt tokens 7x while boosting agent performance
AI agents learn reusable skills without needing full history—privacy and efficiency improved.
LLM agents operating browsers, files, and tools face a fundamental tension: cloud models execute complex multi-step workflows but expose sensitive intermediate context to external APIs, while local models preserve privacy but underperform. Both approaches also waste tokens on long skill prompts and growing histories. A new paper from researchers led by Haoyang Xie introduces constant-context skill learning—a context-to-weights framework that distills recurring workflows into lightweight task-family modules. Instead of conditioning inference on a full history, agents process only the current observation and a compact state block maintained by a deterministic tracker. This tracker maps task progress to a fixed-size state and supplies aligned subgoal rewards, enabling step-level supervised fine-tuning followed by online reinforcement learning.
The method was validated on three benchmarks—ALFWorld, WebShop, and SciWorld—using Qwen3-4B, Qwen3-8B, and Llama-3.1-8B. With Qwen3-8B and SFT+RL, success rates hit 89.6% on unseen ALFWorld tasks, 76.8% on WebShop, and 66.4% on SciWorld, matching or exceeding prior agent-training results while reducing prompt tokens per turn by 2–7× relative to controlled ReAct baselines. By shifting procedural context from prompts into weights, the framework enables private, cost-effective personal assistants that retain cloud-level reliability without leaking history.
- Reduces prompt tokens per turn by 2–7× vs. ReAct baselines, drastically lowering API costs and latency.
- Achieves 89.6% unseen success on ALFWorld and 76.8% on WebShop using Qwen3-8B with SFT+RL.
- Enables local LLM agents to match cloud-level performance without exposing sensitive intermediate context.
Why It Matters
Constant-context learning makes private, cost-efficient personal AI assistants viable without sacrificing capability.