RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
Alibaba's new on-device LLM predicts your next search without calling the cloud.
Predicting a user's next search query from recent behavior is critical for e-commerce recommendation systems, but traditional cloud-based LLMs introduce high latency and costs. Alibaba researchers (Bin Zhang et al.) propose RecGPT-Mobile, a framework that runs a lightweight LLM-based intent understanding agent directly on mobile devices. This on-device deployment captures rapidly evolving user interests in real time without relying on cloud servers, enabling immediate adjustments to feed recommendations. The system is designed for production-scale mobile e-commerce, balancing model size and reasoning capability to fit resource constraints.
Extensive offline analyses and live experiments on Taobao demonstrate that RecGPT-Mobile significantly improves recommendation accuracy compared to cloud-based alternatives, while dramatically reducing inference costs and latency. The approach paves a practical path for deploying LLMs in production recommendation systems on mobile devices. By eliminating round-trip cloud calls, it also enhances privacy and responsiveness. This work lays a scalable foundation for integrating LLMs into real-world next-query prediction systems, a key challenge for modern e-commerce platforms.
- RecGPT-Mobile runs a lightweight LLM directly on mobile devices to predict next user search queries in real time.
- On-device deployment reduces cloud inference costs and latency while capturing evolving user interests faster.
- Offline and online experiments on Taobao showed significant accuracy improvements in feed recommendations using this framework.
Why It Matters
On-device LLMs enable real-time, privacy-preserving personalization for mobile e-commerce without cloud dependency.