HCAG: Hierarchical Abstraction and Retrieval-Augmented Generation on Theoretical Repositories with LLMs
New AI framework builds complex game theory systems by linking theory to code with multi-level retrieval.
Researchers Yusen Wu and Xiaotie Deng have introduced HCAG (Hierarchical Code/Architecture-guided Agent Generation), a novel framework that addresses the limitations of current Retrieval-Augmented Generation (RAG) methods for complex, theory-driven code generation. Traditional RAG struggles with high-level architectural patterns and cross-file dependencies in domains like algorithmic game theory (AGT), creating a gap between abstract concepts and executable code. HCAG reformulates repository-level generation as a structured, planning-oriented process over hierarchical knowledge.
The framework operates in two distinct phases. First, an offline hierarchical abstraction phase recursively parses code repositories alongside aligned theoretical texts to build a multi-resolution semantic knowledge base. This base explicitly links theoretical concepts, architectural designs, and implementation details. Second, an online hierarchical retrieval and scaffolded generation phase performs top-down, level-wise retrieval to guide large language models (LLMs) using an "architecture-then-module" generation paradigm. For improved robustness, HCAG integrates a multi-agent discussion mechanism inspired by cooperative game theory.
Extensive experiments on diverse game-theoretic system generation tasks show HCAG substantially outperforms existing repository-level methods in code quality, architectural coherence, and requirement pass rate. A theoretical analysis indicates that its hierarchical abstraction with adaptive node compression achieves cost-optimality compared to flat and iterative RAG baselines. Beyond performance, HCAG produces a large-scale, aligned theory-implementation dataset that effectively enhances domain-specific LLMs through post-training. While demonstrated in AGT, the authors position HCAG as a general blueprint for mining, reusing, and generating complex systems from structured codebases across other technical domains.
- Two-phase design: offline hierarchical abstraction builds a multi-resolution knowledge base linking theory to code, while online retrieval guides LLMs in top-down generation.
- Outperforms existing methods: Demonstrates superior code quality, architectural coherence, and requirement pass rates in generating complex game-theoretic systems.
- Creates valuable datasets: Produces aligned theory-implementation datasets that can be used to enhance the performance of other domain-specific AI models through post-training.
Why It Matters
Enables AI to generate architecturally sound code for complex, theory-driven systems, bridging a critical gap between high-level concepts and implementation.