Research & Papers

FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

New AI method mimics human-like reasoning to find precise data across multiple complex tables.

Deep Dive

A research team led by Chaojie Sun has introduced FGTR (Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning), a novel AI method that fundamentally changes how large language models (LLMs) find and extract data from databases. Unlike previous approaches that encode entire tables for simple similarity matching—a process that often includes irrelevant data and struggles with scale—FGTR employs a human-like, two-stage reasoning strategy. It first uses the LLM to identify the relevant database schema elements (like specific columns or tables related to a query) and then retrieves only the precise cell contents needed. This hierarchical process results in a compact, accurate sub-table that directly answers the user's question, drastically cutting through noise.

To validate its performance, the team built two new benchmark datasets based on the established Spider and BIRD benchmarks. Experimental results are striking: FGTR outperforms previous state-of-the-art methods, improving the F2 evaluation metric by 18% on Spider and 21% on BIRD. This significant leap in accuracy demonstrates the method's effectiveness in handling fine-grained, multi-table queries, a task previously underexplored in retrieval research. The paper, currently under review for SIGIR 2026, highlights FGTR's potential to enhance end-to-end performance on downstream tasks that rely on table data, such as complex question answering and data analysis, by making LLM-based retrieval both more precise and computationally efficient.

Key Points
  • Uses a novel two-step, hierarchical LLM reasoning process to first find schema elements, then retrieve specific cells.
  • Outperforms prior methods by improving the F2 metric by 18% on Spider and 21% on the BIRD benchmark.
  • Solves the multi-table query problem, constructing concise answer sub-tables instead of retrieving irrelevant bulk data.

Why It Matters

Enables more accurate, efficient data analysis from complex databases, directly improving AI-powered business intelligence and research tools.