PathBoost matches GNNs with interpretable tree-based graph learning
Gradient boosting meets graph theory with automatic feature discovery and no black box.
A team of researchers from the University of Oslo has introduced PathBoost, a gradient tree boosting method designed for graph-level prediction tasks. Unlike typical black-box approaches like graph neural networks (GNNs), PathBoost builds interpretable decision trees using path-based features extracted directly from the input graph structure. The method builds on prior work tailored to chemistry but adds three key extensions: adapting binary classification via logistic loss, handling multiple node and edge attributes through a prefix-based decomposition, and automatically selecting anchor nodes based on categorical attribute diversity — removing the need for manual specification.
PathBoost was evaluated against several state-of-the-art graph neural networks and graph kernel methods on standard benchmark datasets. The results show that PathBoost achieves better performance on half of the tested datasets and comparable performance on the rest. Notably, its advantage grows on graphs with larger average node counts, suggesting that the path-based feature space captures structural patterns that GNNs might miss in sparse or high-dimensional settings. The method also produces interpretable feature importance, giving researchers insight into which graph substructures drive predictions.
The implications are significant for domains like drug discovery, materials science, and social network analysis, where interpretability is as critical as accuracy. PathBoost demonstrates that classical machine learning techniques like gradient boosting can still compete with deep learning when designed with domain-specific feature engineering. The authors provide full code and data to encourage replication and further extensions. This work opens the door to hybrid approaches that combine the interpretability of boosting with the representational power of graph-based features.
- PathBoost outperforms GNNs on half of benchmark datasets while remaining fully interpretable.
- Automatically selects anchor nodes using categorical attribute diversity, eliminating manual tuning.
- Handles multiple node and edge attributes via prefix-based path decomposition, scaling to large graphs.
Why It Matters
Interpretable graph analysis without sacrificing accuracy — a transparent alternative to black-box GNNs for high-stakes domains like drug discovery.