Automatically discovers agent skills via failure analysis, improving OfficeQA accuracy by 7.3% and SealQA by 12.1%?

Automatically discovers agent skills via failure analysis, improving OfficeQA accuracy by 7.3% and SealQA by 12.1%.

Creates reusable, structured skill folders while keeping the underlying AI model frozen, governed by a Pareto frontier for selection?

Creates reusable, structured skill folders while keeping the underlying AI model frozen, governed by a Pareto frontier for selection.

Enables zero-shot skill transfer, with skills evolved on SealQA boosting performance on BrowseComp by 5.3% without modification?

Enables zero-shot skill transfer, with skills evolved on SealQA boosting performance on BrowseComp by 5.3% without modification.

Agent Frameworks

EvoSkill framework automates agent skill discovery with 12% accuracy gains

arXiv cs.MA March 04, 2026

⚡The self-evolving system analyzes failures to create reusable skills, boosting QA accuracy by up to 12.1%.

Deep Dive

A research team led by Salaheddin Alzubi has published a paper introducing EvoSkill, a novel framework designed to automate the discovery and refinement of reusable skills for multi-agent AI systems. The core innovation addresses a critical bottleneck in deploying coding agents as general problem solvers: while flexible, they often lack the specialized domain expertise needed for complex tasks. Current approaches rely on manually crafted skills or evolutionary methods that optimize low-level, task-specific artifacts like prompts and code. EvoSkill proposes a self-evolving alternative that analyzes execution failures, proposes new skills or edits existing ones, and materializes them into structured, reusable components, all while keeping the underlying AI model frozen.

The framework operates by maintaining a Pareto frontier of agent programs, selecting only skills that demonstrably improve performance on held-out validation data. In rigorous evaluations, EvoSkill delivered significant performance boosts: a 7.3% increase in exact-match accuracy (from 60.6% to 67.9%) on the OfficeQA benchmark for grounded reasoning with U.S. Treasury data, and a more substantial 12.1% gain (26.6% to 38.7%) on the SealQA benchmark involving search-augmented QA with noisy retrieval. Crucially, the research also demonstrates that skills evolved for one task can transfer zero-shot to another, with skills from SealQA improving accuracy on the BrowseComp task by 5.3% without any modification. This indicates the framework produces genuinely transferable capabilities, moving beyond narrow task optimization.

Key Points

Automatically discovers agent skills via failure analysis, improving OfficeQA accuracy by 7.3% and SealQA by 12.1%.
Creates reusable, structured skill folders while keeping the underlying AI model frozen, governed by a Pareto frontier for selection.
Enables zero-shot skill transfer, with skills evolved on SealQA boosting performance on BrowseComp by 5.3% without modification.

Why It Matters

Automates the creation of expert agent workflows, moving beyond manual coding to build more capable and transferable AI systems.

Read Original Article

EvoSkill framework automates agent skill discovery with 12% accuracy gains

Why It Matters

Related Articles

🚀 Stay Ahead in AI