AgentNLQ achieves 78.1% accuracy on BIRD benchmark with multi-agent NL2SQL
New self-correcting LLM orchestrator boosts NL2SQL accuracy past human parity on enterprise databases.
Olena Bogdanov and colleagues have released AgentNLQ, a general-purpose multi-agent framework for converting natural language questions into SQL queries. The system uses a novel orchestrator powered by large language models (LLMs) that plans query generation, orchestrates sub-tasks, reflects on intermediate results, and self-corrects errors before final output. This iterative process helps AgentNLQ reach 78.1% semantic accuracy on the challenging BIRD (Big Bench for LaRge-scale Database) benchmark, significantly outperforming prior end-to-end approaches.
The method introduces two key innovations. First, an advanced schema enrichment technique that transforms raw database schemas into context-aware metadata by incorporating user-provided business rules and domain-specific annotations. This gives the LLM a richer understanding of column meanings and relationships. Second, the orchestrator’s self-correction loop allows the system to detect and fix common SQL mistakes, such as incorrect joins or missing filters, without human intervention.
AgentNLQ was evaluated across multiple domains and datasets from the BIRD-SQL benchmark, demonstrating consistent generalization. The authors note that while current LLMs have made remarkable progress, NL2SQL still lags behind expert human performance. AgentNLQ’s multi-agent design—where separate agents handle schema understanding, query planning, and validation—represents a practical step toward closing that gap. The paper is available on arXiv under ID 2605.19010.
- Reaches 78.1% semantic accuracy on the BIRD benchmark, a 5+ point improvement over prior best methods.
- Uses a multi-agent architecture with an LLM orchestrator that plans, reflects, and self-corrects SQL queries.
- Employs schema enrichment with user-defined business rules to improve context-awareness and query precision.
Why It Matters
AgentNLQ brings NL2SQL closer to human-expert parity, enabling non-technical users to query enterprise databases accurately.