R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL
New method solves SQL ranking inconsistency and recovers missing queries...
A team of researchers from Microsoft and Seoul National University have introduced R^3-SQL (Ranking Reward and Resampling for Text-to-SQL), a new framework that tackles two persistent problems in natural language to SQL systems. The first issue is that existing methods often assign inconsistent scores to functionally equivalent SQL queries—queries that produce identical execution results but are ranked differently. The second, more critical problem is that ranking cannot recover when the correct SQL query is entirely absent from the candidate pool. R^3-SQL addresses both by first grouping candidates by their execution results and ranking these groups for consistency, combining pairwise preferences across groups with pointwise utility based on group rank and size. This approach captures relative preference, consistency, and candidate quality in a unified reward system.
To solve the recall problem, R^3-SQL introduces agentic resampling, which intelligently judges the generated candidate pool and selectively triggers resampling when the correct SQL is likely missing. The system achieves 75.03% execution accuracy on the BIRD-dev benchmark, establishing a new state-of-the-art among methods that use models with disclosed sizes. The gains are consistent across five different benchmarks, demonstrating generalizability. The paper has been accepted by Findings of ACL 2026, highlighting the importance of both ranking consistency and recall in Text-to-SQL systems for real-world applications.
- R^3-SQL groups candidate SQL queries by execution results to ensure consistent ranking of functionally equivalent queries
- The framework introduces agentic resampling to detect when correct SQL is missing and regenerate candidates
- Achieves 75.03% execution accuracy on BIRD-dev, a new state-of-the-art for disclosed-size models
Why It Matters
This method makes Text-to-SQL more reliable for enterprise use by solving ranking inconsistencies and missing query recovery.