Research & Papers

Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation

New AI method generates smarter, ranked lists of destinations by fixing 'answer fixation' in models.

Deep Dive

A research team from institutions including UNSW Sydney and TU Delft has introduced Refine-POI, a novel framework designed to significantly improve how large language models (LLMs) recommend the next point of interest (POI), such as a restaurant, park, or store. The work addresses two fundamental flaws in current methods. First, it solves the 'topology-blind' indexing problem, where existing semantic ID systems fail to ensure that numerically similar codes correspond to semantically similar places (e.g., two Italian restaurants might have very different IDs). Refine-POI employs a hierarchical self-organizing map (SOM) quantization strategy to generate semantic IDs, guaranteeing that proximity in the codebook accurately reflects similarity in the latent feature space.

Second, the framework liberates models from 'answer fixation,' a limitation of standard supervised fine-tuning (SFT) that forces models to output only a single, top-1 prediction. This ignores the practical need for a ranked list of options and stifles reasoning. Instead, Refine-POI uses a policy-gradient reinforcement learning framework to optimize the generation of entire top-k recommendation lists. This approach rewards the model for creating coherent, high-quality sequences of suggestions rather than just matching a single ground-truth label.

Extensive experiments on three real-world datasets demonstrate that Refine-POI substantially outperforms existing state-of-the-art baselines. The result is a system that better synthesizes the advanced reasoning capabilities of modern LLMs with the precise representational needs of location-based tasks. This leads to recommendations that are not only more accurate but also more explainable, as the model's ranking process aligns more naturally with human-like decision-making for exploration.

Key Points
  • Solves 'topology-blind' indexing with a hierarchical self-organizing map (SOM) for semantic IDs, ensuring code proximity matches semantic similarity.
  • Replaces restrictive supervised fine-tuning with a policy-gradient RL framework to generate optimized top-k ranked lists, overcoming 'answer fixation'.
  • Outperforms state-of-the-art baselines across three real-world datasets, enabling more accurate and explainable next-location recommendations.

Why It Matters

Enables apps like Google Maps or Yelp to provide smarter, ranked lists of destinations with reasoning, moving beyond single-guess predictions.