Research & Papers

scShapeBench benchmark automates shape discovery in single-cell RNA data

New benchmark and method scReebTower outperforms PAGA and Mapper for automated geometry detection.

Deep Dive

High-dimensional point cloud data from single-cell biology often exhibits distinct geometric shapes—clusters, trajectories, branches, or archetypes—each requiring a specific analysis pipeline. Existing tools like Seurat assume clusters, while Monocle assumes trees, forcing researchers to manually inspect and choose. With the rise of agentic AI scientists, automating this shape detection is crucial. To address this, researchers from multiple institutions introduced scShapeBench, a benchmark comprising synthetic datasets generated from ground-truth skeleton graphs with controlled variance, and real single-cell datasets annotated by experts into four shape categories: clusters, single trajectory, multi-branching, and archetypal.

Alongside the benchmark, they propose scReebTower, a baseline method that leverages diffusion geometry to construct Reeb graphs—a topological skeleton that captures multi-scale structure. scReebTower connects visualization directly with pipeline selection, outperforming existing baselines PAGA and Mapper on both synthetic and real data using topology-aware evaluation metrics. The work provides a standardized framework for automated shape detection, enabling downstream pipelines to be matched to data geometry without manual inspection. This is a key step toward fully autonomous single-cell analysis in AI-driven scientific discovery.

Key Points
  • scShapeBench includes synthetic datasets from ground-truth skeletons and expert-annotated real data across 4 shape types.
  • scReebTower uses diffusion geometry to extract Reeb graphs, outperforming PAGA and Mapper in shape detection.
  • The benchmark provides topology-aware evaluation metrics for standardized comparison of shape detection methods.

Why It Matters

Automates geometry detection in single-cell data, enabling AI agents to select the right analysis pipeline without manual inspection.