Research & Papers

Training-free TableGrid Navigation boosts LLM table QA by 3.8 points

A new prompting method helps LLMs find cells and reason step-by-step without fine-tuning.

Deep Dive

Large language models (LLMs) struggle with tabular data because table question-answering (TQA) demands precise cell retrieval and multi-step structured reasoning. Existing approaches often rely on fine-tuning or training on task-specific data, but lack verifiable control over how the model navigates tables. To address this, researchers from IIIT Allahabad and TU Dresden introduce two training-free prompting frameworks: TableGrid Navigation (TGN) and Progressive Inference Prompting (PIP). TGN uses a three-module loop—locate, refine, verify—to iteratively scan rows and columns for evidence. PIP enforces explicit column identification and progressive row selection constraints, forcing the model to reason step-by-step.

Evaluated on 17 LLMs (including GPT-4, Llama 3, and Mistral) against 6 baselines (e.g., ReAct, Chain-of-Thought, and fine-tuned TQA models), TGN achieves a 3.8-point improvement over the strongest baseline on the TableBench dataset. On the FeTaQa dataset, PIP sets a new state-of-the-art, outperforming ReAct and Chain-of-Thought by a significant margin. Beyond inference-time gains, the frameworks can act as supervision templates to fine-tune smaller models, narrowing the performance gap with much larger architectures—offering a cost-efficient solution for resource-constrained settings. Accepted at ICDAR 2026, this work provides a versatile, verifiable approach to TQA without the overhead of task-specific training.

Key Points
  • Two training-free prompting frameworks: TGN (TableGrid Navigation) and PIP (Progressive Inference Prompting).
  • TGN beats the strongest baseline by 3.8 points on TableBench; PIP achieves SOTA on FeTaQa over ReAct and Chain-of-Thought.
  • Methods can be used as supervision templates to fine-tune small models, reducing dependence on large LLMs.
  • Evaluated on 17 LLMs across two datasets, demonstrating broad applicability.

Why It Matters

Better table QA without costly fine-tuning—ideal for enterprise analytics and resource-constrained AI deployments.