New DSAC algorithm cuts edge AI latency by 80% using Tree-of-Thoughts
Tree-of-Thoughts meets edge computing for faster, cheaper AI content generation on devices.
A new research paper introduces a diffusion-based soft actor-critic (DSAC) algorithm designed to optimize Tree-of-Thoughts (ToT) prompting for edge-enabled AI-generated content (AIGC) services. ToT extends Chain-of-Thought (CoT) by exploring multiple reasoning paths simultaneously, improving output quality but requiring multiple calls to computationally intensive generative AI models—a major challenge for resource-constrained edge devices. The authors, from multiple universities, use creative writing tasks as a case study with the Qwen 2.5-7B-Instruct model. They establish a relationship between output token count, generation delay, and quality, and then model the ToT reasoning process as a directed acyclic graph (DAG). Each vertex is a thought, each edge a transition. The resulting DAG-based thought assignment problem aims to minimize generation delay subject to a user-adjustable quality constraint.
The core contribution is the DSAC algorithm, which integrates diffusion models with reinforcement learning (soft actor-critic) to determine optimal thought assignments. Through extensive simulations, DSAC achieves total generation delay reductions of up to 8.32% over PPO, 11.57% over SAC, and 36.09% over DDQN across various settings. Critically, it reduces latency by over 80% compared to fully local generation while maintaining stringent quality requirements. This work demonstrates that sophisticated multi-path reasoning can be practically deployed on edge devices, unlocking new possibilities for real-time AIGC applications like on-device content creation, intelligent assistants, and creative writing tools without relying solely on cloud infrastructure.
- Proposes DSAC (diffusion-based soft actor-critic) algorithm for optimal thought assignment in Tree-of-Thoughts prompting on edge devices.
- Uses Qwen 2.5-7B-Instruct experiments to model generation delay vs. token count and quality, then frames ToT as a DAG optimization problem.
- Achieves up to 8.32% lower delay than PPO, 11.57% vs. SAC, 36.09% vs. DDQN, and over 80% latency reduction vs. fully local generation.
Why It Matters
Enables complex AI reasoning on resource-constrained edge devices, unlocking real-time AIGC without cloud dependency.