MobFusion uses human mobility to boost AI's urban socioeconomic predictions
Foundation models get smarter about cities by tracking how people actually move.
Foundation models like GPT and CLIP have been applied to urban socioeconomic prediction using static data (POI text, satellite imagery). However, these miss the dynamic functional connections between places revealed by human mobility. To address this, researchers from multiple institutions propose MobFusion, a mobility-enhanced foundation model fusion paradigm. MobFusion is instantiated through three complementary designs: (1) using mobility networks as context for zero-shot LLM prompting, (2) as graph connectors to fuse geospatial visual embeddings with textual embeddings, and (3) as structured tokens for multimodal LLM reasoning.
Using anonymized large-scale mobility datasets from three U.S. metropolitan areas, MobFusion consistently improves urban prediction tasks including median household income, population density, and crime prediction across all three instantiations. The results demonstrate that incorporating human mobility effectively enhances the socioeconomic understanding of foundation models. This work opens new avenues for AI-driven urban planning, resource allocation, and policy-making by adding a behavioral layer to static geospatial analysis.
- MobFusion integrates human mobility networks into foundation models via three methods: LLM context, graph connectors, and multimodal tokens.
- Improves predictions of median household income, population density, and crime rates across three U.S. metro areas.
- Shows that dynamic mobility patterns reveal functional city connections better than static POI or satellite data.
Why It Matters
Adds dynamic human behavior to AI's urban understanding, enabling smarter city planning and policy decisions.