GPSBench: Do Large Language Models Understand GPS Coordinates?
New benchmark finds models fail at basic geometric computations but show surprising real-world geographic reasoning.
Researchers Thinh Hung Truong, Jey Han Lau, and Jianzhong Qi introduced GPSBench, a dataset of 57,800 samples across 17 tasks evaluating geospatial reasoning in 14 state-of-the-art LLMs. They found models like GPT-4 and Claude struggle with geometric coordinate operations but perform better on real-world geographic reasoning. The study shows geographic knowledge degrades from country to city level, and finetuning creates trade-offs between computation gains and world knowledge degradation.
Why It Matters
Critical for AI navigation, robotics, and mapping applications where accurate geospatial understanding is essential for real-world deployment.