Agent Frameworks

SkillGen synthesizes auditable AI agent skills from trajectories with verified impact

Skills that fix failures without breaking successes — verified via contrastive induction.

Deep Dive

SkillGen, developed by a team including Yuchen Ma, Yue Huang, and researchers from multiple institutions, tackles a major bottleneck in LLM agents: the manual creation of high-quality skills. Skills are reusable, controllable pieces of behavior that improve agent performance without retraining, but they're largely hand-crafted. SkillGen automates this by observing a base agent's trajectories — both successful and failed — and using a multi-agent architecture to inductively identify patterns that lead to success, common failure modes, and behaviors present in successes but absent in failures. The output is a single human-readable skill artifact that can be inspected before deployment.

The framework's most novel contribution is modeling skills as interventions. Instead of just summarizing good behavior, SkillGen empirically verifies the net effect of each skill by comparing outcomes on the same instances with and without the skill. This accounts for both repairs (where the skill fixes a failure) and regressions (where it breaks a success). Across diverse agents and datasets, SkillGen outperforms existing skill-generation baselines, improves held-out performance, and produces skills that transfer across different LLM backends — making it a practical step toward self-improving, auditable AI agents.

Key Points
  • SkillGen uses contrastive induction over successful and failed trajectories to identify reusable patterns and recurring failure modes.
  • It models skills as interventions, measuring net effect by comparing outcomes with and without the skill, accounting for both repairs and regressions.
  • Generated skills are human-readable, auditable, transfer across models, and consistently improve held-out performance over baselines.

Why It Matters

Automated, verifiable skill generation could make LLM agents safer, more efficient, and easier to audit at scale.