Research & Papers

a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

New algorithm tackles memory explosion, enabling graph-based ML on massive datasets where no natural graph exists.

Deep Dive

Researcher Lionel Yelibi has published a new paper introducing the Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm, a scalable solution to a major bottleneck in graph-based machine learning. The traditional TMFG method requires calculating and storing a dense correlation matrix for all data points, which becomes computationally prohibitive and memory-intensive for large datasets, limiting its use to small or medium-scale problems. Yelibi's a-TMFG directly addresses this by using a k-Nearest Neighbors Graph (kNNG) for the initial construction and implementing a smart memory management strategy that searches for and estimates missing correlations only when needed, dramatically reducing the combinatorial explosion.

The core innovation is the shift from a 'compute-everything-first' approach to an 'estimate-on-demand' paradigm. This allows the algorithm to construct parsimonious, informative graphs from raw data where no inherent network structure is present. The paper demonstrates the method's robustness to parameters and noise and validates it on datasets containing millions of observations. This scalability unlocks TMFG's utility for modern, large-scale ML applications.

By providing a scalable way to generate graph inputs from tabular or feature-based data, a-TMFG opens new avenues for graph neural networks (GNNs) and other graph-based learning techniques. It enables researchers and engineers to apply the powerful relational reasoning of graph models to domains like finance, biology, and social network analysis on previously impractical scales, all without needing a pre-defined graph.

Key Points
  • Replaces dense NxN correlation matrix with kNNG and on-the-fly estimation, solving memory/runtime bottlenecks.
  • Validated on datasets with millions of observations, enabling graph-based ML at scale.
  • Creates graph structures for supervised/unsupervised learning in cases where no natural graph exists (e.g., from feature data).

Why It Matters

Enables graph neural networks and relational AI to be applied to massive, non-graph datasets across finance, bioinformatics, and more.