Training-free TableGrid Navigation boosts LLM table QA by 3.8 points
A new prompting method helps LLMs find cells and reason step-by-step without fine-tuning.
Large language models (LLMs) struggle with tabular data because table question-answering (TQA) demands precise cell retrieval and multi-step structured reasoning. Existing approaches often rely on fine-tuning or training on task-specific data, but lack verifiable control over how the model navigates tables. To address this, researchers from IIIT Allahabad and TU Dresden introduce two training-free prompting frameworks: TableGrid Navigation (TGN) and Progressive Inference Prompting (PIP). TGN uses a three-module loop—locate, refine, verify—to iteratively scan rows and columns for evidence. PIP enforces explicit column identification and progressive row selection constraints, forcing the model to reason step-by-step.
Evaluated on 17 LLMs (including GPT-4, Llama 3, and Mistral) against 6 baselines (e.g., ReAct, Chain-of-Thought, and fine-tuned TQA models), TGN achieves a 3.8-point improvement over the strongest baseline on the TableBench dataset. On the FeTaQa dataset, PIP sets a new state-of-the-art, outperforming ReAct and Chain-of-Thought by a significant margin. Beyond inference-time gains, the frameworks can act as supervision templates to fine-tune smaller models, narrowing the performance gap with much larger architectures—offering a cost-efficient solution for resource-constrained settings. Accepted at ICDAR 2026, this work provides a versatile, verifiable approach to TQA without the overhead of task-specific training.
- Two training-free prompting frameworks: TGN (TableGrid Navigation) and PIP (Progressive Inference Prompting).
- TGN beats the strongest baseline by 3.8 points on TableBench; PIP achieves SOTA on FeTaQa over ReAct and Chain-of-Thought.
- Methods can be used as supervision templates to fine-tune small models, reducing dependence on large LLMs.
- Evaluated on 17 LLMs across two datasets, demonstrating broad applicability.
Why It Matters
Better table QA without costly fine-tuning—ideal for enterprise analytics and resource-constrained AI deployments.