SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support
New system automatically diagnoses and fixes AI agent failures, improving skills by 40% across 3,737 tasks.
A research team including Xingyan Liu, Xiyue Luo, and four others has developed SkillForge, a novel framework designed to solve a critical problem in enterprise AI: creating and maintaining high-quality, domain-specific skills for LLM-powered agents. Unlike generic skill creators that produce poorly aligned outputs, SkillForge's Domain-Contextualized Skill Creator synthesizes initial skills by grounding them directly in company knowledge bases and historical support tickets. This ensures the agent's starting capabilities are closely matched to real-world task requirements, a significant improvement over off-the-shelf solutions.
The framework's true innovation lies in its self-evolving, closed-loop architecture. After deployment, a three-stage pipeline—comprising a Failure Analyzer, Skill Diagnostician, and Skill Optimizer—automatically processes execution failures in batch. It diagnoses the root cause of failures, pinpoints the exact skill deficiency, and then rewrites the skill code to eliminate the flaw. This create-evaluate-refinement cycle runs iteratively, allowing the agent's skills to improve autonomously with every round of operational feedback, moving beyond stagnant, one-time deployments.
In rigorous evaluation across five real-world cloud support scenarios involving 1,883 tickets and 3,737 tasks, SkillForge demonstrated compelling results. The domain-contextualized creator produced significantly better initial skills than generic creators, as measured by consistency with expert responses. More importantly, the self-evolution loop progressively improved skill quality from various starting points—including expert-authored skills—across successive rounds. The research, accepted at ACM SIGIR 2026, shows that systematic, automated evolution can eventually surpass manually curated expert knowledge, pointing toward a future of truly self-optimizing enterprise AI systems.
- SkillForge uses a Domain-Contextualized Skill Creator to ground AI agent skills in knowledge bases and 1,883 historical support tickets for better initial alignment.
- Its three-stage self-evolution pipeline (Failure Analyzer, Skill Diagnostician, Skill Optimizer) automatically diagnoses failures and rewrites skills, creating a closed-loop improvement system.
- Tested on 3,737 tasks, the system showed automated skill evolution can surpass manually curated expert knowledge across successive deployment rounds.
Why It Matters
This enables enterprises to deploy AI support agents that continuously improve without manual intervention, reducing operational costs and increasing resolution accuracy.