Research & Papers

18% of Bing queries are geospatial, not the 6% you thought

Researchers found 181,827 geospatial queries in MS MARCO, triple previous estimates.

Deep Dive

A new study published on arXiv (2605.11336) by Ilya Ilyankou, Stefano Cavazzi, and James Haworth overturns conventional wisdom about geospatial web search. Applying dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries—without pre-filtering for place names—the team identified 181,827 geospatial queries, or 18.0% of all queries. That's nearly three times the 6.17% that were originally annotated as 'Location' in the dataset's labels.

The resulting taxonomy of 88 query categories reveals a striking imbalance: geospatial web search is overwhelmingly transactional and practical. Costs and prices alone account for 15.3% of geospatial queries, almost double the entire physical geography theme. Other top categories include opening hours, contact details, weather, and travel recommendations. Much of this activity falls outside what traditional GIS systems and knowledge graphs are designed to handle. The categories vary by answer type—from deterministic lookups (spatial databases) to evaluative or temporally volatile queries needing generative or real-time systems. The authors discuss implications for hybrid retrieval architectures and benchmarks for geographic reasoning in LLMs, and they openly release the labelled dataset, classifier, and taxonomy.

Key Points
  • 18% of Bing queries are geospatial, up from 6.17% in original annotations.
  • Transactional lookups like costs (15.3% of geospatial queries) dominate, not physical geography.
  • Researchers open-sourced their labelled dataset, SetFit classifier, and 88-category taxonomy.

Why It Matters

Highlights that search engines and GIS need hybrid systems to handle practical, real-world location queries.