SkillOpt uses a frontier model to propose bounded edits (add/delete/replace) to markdown skill files, with each edit gated against a validation set; only strict improvements are accepted?

SkillOpt uses a frontier model to propose bounded edits (add/delete/replace) to markdown skill files, with each edit gated against a validation set; only strict improvements are accepted.

Best skills converge after 1-4 accepted edits per step; an edit budget of 4-8 proposals works best, and removing the cap collapses performance?

Best skills converge after 1-4 accepted edits per step; an edit budget of 4-8 proposals works best, and removing the cap collapses performance.

A SkillOpt-optimized skill transferred from Codex to Claude Code zero-shot, gaining +59.7 on SpreadsheetBench; GPT-4.1 nano with optimized skill matched frontier models on procedural tasks?

A SkillOpt-optimized skill transferred from Codex to Claude Code zero-shot, gaining +59.7 on SpreadsheetBench; GPT-4.1 nano with optimized skill matched frontier models on procedural tasks.

Open Source

SkillOpt turns markdown skill files into trainable AI parameters

r/LocalLLaMA May 26, 2026

⚡A new method uses frontier models to iteratively edit skill files, rejecting all but strict improvements.

Deep Dive

A new paper formalizes a method many agent builders have been using ad hoc. They use a frontier model to propose bounded edits (add/delete/replace) to markdown skill files, then gate each edit against a held-out validation set. Only strict improvements accepted; ties rejected, and rejected edits become negative signal for the next round. Best skills converge with 1 to 4 accepted edits out of many proposals. Edit budget of 4 to 8 per step works best; remove the cap and performance collapses. Median final skill is ~920 tokens. A skill optimized on Codex transferred to Claude Code with zero modification and gained +59.7 on SpreadsheetBench. GPT 4.1 nano with an optimized skill roughly matched frontier models on procedural benchmarks. Limitation: the validation gate requires an auto grader with clear correct answers – works for code and spreadsheets, breaks for anything open-ended.

Key Points

SkillOpt uses a frontier model to propose bounded edits (add/delete/replace) to markdown skill files, with each edit gated against a validation set; only strict improvements are accepted.
Best skills converge after 1-4 accepted edits per step; an edit budget of 4-8 proposals works best, and removing the cap collapses performance.
A SkillOpt-optimized skill transferred from Codex to Claude Code zero-shot, gaining +59.7 on SpreadsheetBench; GPT-4.1 nano with optimized skill matched frontier models on procedural tasks.

Why It Matters

Turns skill optimization into a reproducible, automated process that transfers across models, boosting performance with minimal edits.

Read Original Article

SkillOpt turns markdown skill files into trainable AI parameters

Why It Matters

Related Articles

🚀 Stay Ahead in AI