Research & Papers

[R] Graph-Oriented Generation (GOG): Replacing Vector R.A.G. for Codebases with Deterministic AST Traversal (70% Average Token Reduction)

New framework uses deterministic AST graphs instead of vector search, cutting LLM context by 89%.

Deep Dive

Developer David Chisholm has introduced Graph-Oriented Generation (GOG), a novel framework designed to solve the core failures of vector-based Retrieval-Augmented Generation (RAG) when applied to codebases. Frustrated by RAG's tendency to hallucinate import paths and lose context in deep software architectures, Chisholm's approach treats code not as probabilistic text but as a strict mathematical graph. The system uses a deterministic Symbolic Reasoning Model (SRM) to parse an entire repository into an Abstract Syntax Tree (AST) and builds a Directed Acyclic Graph (DAG) of all dependencies. This allows for zero-shot lexical seeding to find target nodes, followed by a strict shortest-path traversal to isolate the exact execution path for any query, mathematically excluding irrelevant files.

Technical benchmarks against a 100+ file Vue/TypeScript codebase are striking. Where traditional RAG took 1.6 seconds to assemble context, GOG achieved the same in 0.001 seconds—a 99.9% reduction. More importantly, the tokens sent to a constrained local model (Qwen 0.8B) dropped by 89.3%, from 4,230 to just 451. This precision allowed the small model to solve deep architectural problems that caused RAG-backed systems to fail, effectively demoting the LLM from a reasoning engine to a syntax translator. The architecture also features O(1) state evolution for file changes via in-memory tensor surgery, versus RAG's O(N) re-indexing. Chisholm is now seeking collaboration to scale GOG into IDE environments and multi-agent loops.

Key Points
  • Uses deterministic AST/DAG traversal instead of vector search, cutting LLM context tokens by 89.3%
  • Achieves 99.9% faster context assembly (0.001s vs 1.6s) and 78.1% faster total execution
  • Enables small models (Qwen 0.8B) to solve complex code tasks by providing noise-free execution paths

Why It Matters

Could make AI coding assistants vastly more accurate and efficient by treating code structure as math, not text.