Research & Papers

A Hypergraph-Based Framework for Exploratory Business Intelligence

New system achieves up to 230x speedup on LDBC datasets with only 0.27% average error rate.

Deep Dive

A research team including Yunkai Lou, Shunyang Li, and four others has introduced ExBI, a groundbreaking framework that reimagines exploratory business intelligence (BI) using hypergraphs. Traditional BI systems struggle with the iterative, multi-round exploration paradigm of modern analytics, burdened by static schemas, high computational costs, and heavy reliance on expert knowledge. ExBI addresses these limitations by implementing a hypergraph data model with three core operators—Source, Join, and View—which enable dynamic schema evolution and intelligent reuse of materialized views. This architectural shift allows the system to adapt to evolving analytical questions without manual schema redesign.

The team's key innovation lies in its sampling-based algorithms, which provide provable estimation guarantees to tackle computational bottlenecks while maintaining analytical accuracy. In rigorous experiments using the industry-standard LDBC datasets, ExBI demonstrated staggering performance gains. It achieved an average speedup of 16.21x (with peaks up to 146.25x) compared to the graph database Neo4j and an average of 46.67x (peaking at 230.53x) faster than the relational database MySQL. Crucially, these speed improvements came with minimal accuracy trade-offs, maintaining an average error rate of just 0.27% for fundamental COUNT queries. This combination of speed and precision makes ExBI a compelling solution for data teams needing to perform rapid, large-scale exploratory analysis on complex, interconnected datasets.

Key Points
  • ExBI introduces a hypergraph data model with Source, Join, and View operators for dynamic schema evolution and view reuse.
  • Benchmarks show 16.21x average speedup vs. Neo4j and 46.67x vs. MySQL, with peaks over 230x faster on LDBC datasets.
  • Maintains high accuracy with only 0.27% average error for COUNT queries using sampling algorithms with provable guarantees.

Why It Matters

Enables data teams to perform rapid, large-scale exploratory analysis on complex datasets without being bottlenecked by computational costs or expert dependency.