scShapeBench benchmark automates shape discovery in single-cell RNA data
New benchmark and method scReebTower outperforms PAGA and Mapper for automated geometry detection.
High-dimensional point cloud data from single-cell biology often exhibits distinct geometric shapes—clusters, trajectories, branches, or archetypes—each requiring a specific analysis pipeline. Existing tools like Seurat assume clusters, while Monocle assumes trees, forcing researchers to manually inspect and choose. With the rise of agentic AI scientists, automating this shape detection is crucial. To address this, researchers from multiple institutions introduced scShapeBench, a benchmark comprising synthetic datasets generated from ground-truth skeleton graphs with controlled variance, and real single-cell datasets annotated by experts into four shape categories: clusters, single trajectory, multi-branching, and archetypal.
Alongside the benchmark, they propose scReebTower, a baseline method that leverages diffusion geometry to construct Reeb graphs—a topological skeleton that captures multi-scale structure. scReebTower connects visualization directly with pipeline selection, outperforming existing baselines PAGA and Mapper on both synthetic and real data using topology-aware evaluation metrics. The work provides a standardized framework for automated shape detection, enabling downstream pipelines to be matched to data geometry without manual inspection. This is a key step toward fully autonomous single-cell analysis in AI-driven scientific discovery.
- scShapeBench includes synthetic datasets from ground-truth skeletons and expert-annotated real data across 4 shape types.
- scReebTower uses diffusion geometry to extract Reeb graphs, outperforming PAGA and Mapper in shape detection.
- The benchmark provides topology-aware evaluation metrics for standardized comparison of shape detection methods.
Why It Matters
Automates geometry detection in single-cell data, enabling AI agents to select the right analysis pipeline without manual inspection.