A Topology-Aware Positive Sample Set Construction and Feature Optimization Method in Implicit Collaborative Filtering
New topology-aware method converts false negatives into positives, improving recommendation accuracy by up to 15%.
A research team from Chinese universities has published a novel AI method called TPSC-FO (Topology-aware Positive Sample Set Construction and Feature Optimization) that tackles a fundamental flaw in modern recommendation systems. The core problem is 'false negatives'—items a user would actually enjoy but hasn't clicked on yet, which traditional training treats as negative examples, poisoning the model's understanding of user preferences.
The method works in two stages. First, it analyzes the implicit interaction network (like user-item clicks) using a differential community detection strategy to identify topological clusters. Items within a user's community that they haven't interacted with are flagged as potential false negatives. A personalized noise filter then converts the most reliable ones into positive training samples. Second, a neighborhood-guided feature optimization module refines these new positive samples by blending their features with similar items in the embedding space, reducing noise.
The research, validated across five real-world datasets (like MovieLens and Yelp) and two synthetic datasets, shows TPSC-FO significantly outperforms existing negative sampling techniques. This addresses the chronic issues of data sparsity and extreme class imbalance in implicit feedback, where positive interactions (clicks, likes) are vastly outnumbered by non-interactions. The practical implication is more accurate and serendipitous recommendations for users on streaming, e-commerce, and social platforms, moving beyond the 'filter bubble' of only recommending obvious items.
- TPSC-FO uses network topology to identify 'false negative' items users would like, converting them to positive training data.
- The method improved recommendation accuracy in tests on 5 real-world datasets, tackling core issues of data sparsity and imbalance.
- It combines community detection with personalized noise filtration and neighborhood feature optimization for more robust user preference modeling.
Why It Matters
Enables more accurate and discoverable recommendations in streaming, e-commerce, and social media by fixing a core training flaw.