Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study
A new study uses 336 datasets and 10,000+ graphs to expose fundamental trade-offs in Graph Neural Networks.
A team of researchers has published a landmark study introducing a rigorous, property-driven methodology for evaluating the expressiveness of Graph Neural Networks (GNNs). The work, titled "Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study," addresses a core challenge in trustworthy AI: understanding what fundamental graph properties GNNs can and cannot learn. The researchers leverage the formal specification language Alloy to create a configurable graph generator, producing two massive new benchmark families designed for systematic testing.
This framework generates 336 new datasets, each containing at least 10,000 labeled graphs, covering 16 properties critical to domains like distributed systems and bioinformatics. It introduces novel metrics to assess GNN expressiveness across three axes: generalizability, sensitivity, and robustness. Applying this framework, the team conducted the first comprehensive study on global pooling methods, revealing distinct trade-offs: attention-based pooling excels in generalization and robustness, while second-order pooling offers superior sensitivity, but no single method dominates. These findings expose fundamental architectural limitations and point to urgent research directions, including adaptive, property-aware pooling mechanisms and robustness-oriented training. By embedding software engineering rigor into AI evaluation, this work provides a principled foundation for building more reliable and expressive GNNs.
- Created two new benchmark families (GraphRandom & GraphPerturb) with 336 datasets and over 10,000 graphs each, covering 16 fundamental properties.
- Introduced a novel evaluation framework with quantitative metrics to assess GNN expressiveness across generalizability, sensitivity, and robustness.
- Found critical trade-offs in global pooling methods, with no single approach performing well across all properties, highlighting a core architectural challenge.
Why It Matters
Provides a rigorous, standardized way to benchmark GNNs, exposing limitations that impact real-world applications in drug discovery, network analysis, and recommendation systems.