Spectral bandits
New algorithm learns user preferences for thousands of items with only tens of ratings.
In a paper titled "Spectral bandits," published in the Journal of Machine Learning Research (JMLR 2020), researchers Tomáš Kocák, Rémi Munos, Branislav Kveton, Shipra Agrawal, and Michal Valko introduce a new framework for solving online learning problems that involve graphs, such as content-based recommendation. The core idea is to model the expected ratings of items as a smooth function on an undirected graph, where each node represents an item and its expected rating is similar to that of its neighbors. This graph structure captures the inherent relationships between items, allowing the algorithm to generalize from a small number of observed ratings to predict preferences for many unrated items.
The researchers propose three algorithms that scale linearly or sublinearly with a newly defined "effective dimension," which is small for real-world graphs. This theoretical contribution is backed by experiments on content recommendation problems, demonstrating that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations. This result has significant practical implications, as it suggests that recommendation systems can become much more data-efficient, requiring far fewer user interactions to provide personalized suggestions. The work bridges statistical learning and online decision-making, offering a principled approach to leveraging graph structure for efficient exploration and exploitation.
- Algorithms scale linearly or sublinearly with the graph's effective dimension, which is small for real-world networks.
- Experiments show that preferences for thousands of items can be learned from just tens of node evaluations.
- Published in JMLR 2020, bridging graph theory and online learning for recommendation systems.
Why It Matters
This dramatically reduces data needed for personalized recommendations, enabling efficient learning with minimal user feedback.