Research & Papers

SkillDroid: Compile Once, Reuse Forever

The new system compiles successful actions into reusable skills, achieving a 91% success rate.

Deep Dive

A team of researchers has introduced SkillDroid, a novel AI agent framework that fundamentally changes how Large Language Models (LLMs) interact with mobile graphical user interfaces (GUIs). Current LLM-based agents treat every task as a new, stateless reasoning problem, requiring a full and expensive LLM inference call for each action step. This makes them slow, unreliable, and incapable of learning from past successes. SkillDroid solves this by implementing a three-layer architecture that 'compiles' successful task executions. When an LLM successfully completes a sequence of UI actions—like ordering food or setting an alarm—SkillDroid saves that trajectory as a reusable, parameterized skill template. This template includes the sequence of actions, weighted element locators for finding UI components, and typed slots for variable inputs.

On future invocations, a matching cascade uses regex patterns, embedding similarity, and app filtering to route a user's instruction to the correct stored skill. The system then replays the skill without any LLM calls, dramatically increasing speed and reliability. A failure-learning layer monitors execution and triggers a recompilation with the LLM only when a skill's success rate degrades. In a rigorous 150-round longitudinal evaluation with systematic variations, SkillDroid achieved an 85.3% success rate, a 23-percentage-point improvement over a stateless LLM baseline. Crucially, it used 49% fewer LLM calls. The pure replay mechanism was 2.4 times faster than full LLM execution and achieved a 100% success rate in testing. Most importantly, the system gets better with use: its success rate converged upward from 87% to 91%, while the baseline agent's performance degraded from 80% to 44%.

Key Points
  • Compiles LLM actions into reusable skills, cutting LLM calls by 49% in testing.
  • Achieved an 85.3% success rate, 23 points above a stateless baseline, and improves to 91% with use.
  • Skill replay is 2.4x faster than full LLM execution and achieved a 100% success rate in replay tests.

Why It Matters

This enables efficient, learning-capable automation for repetitive mobile tasks, reducing cost and latency for AI assistants.