Research & Papers

Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

GPU-accelerated performance analysis ingests 100,000 traces in under 10 seconds.

Deep Dive

As exascale systems push concurrency to unprecedented levels, traditional performance analysis tools buckle under the telemetry overhead. Rice University researcher Dragana Grbic presents a heterogeneous framework that accelerates the hpcanalysis infrastructure with a high-performance C++ API and GPU parallelism. The results are striking: on Aurora, the framework ingests 100,000 MPI rank traces in just 9.69 seconds, and its GPU-accelerated layer runs up to 314x faster than CPU-based processing when analyzing 100,000 execution traces.

Beyond raw speed, the framework introduces a topology-aware workflow that maps logical performance outliers to physical Slingshot interconnect coordinates, pinpointing network congestion across 22 distinct racks. The paper also introduces a novel tri-dimensional performance model that 're-materializes' iterative behavior from execution traces. Applied to a GAMESS workload on Frontier, the model identified a 32.28% potential speedup. This work provides a practical path to real-time, at-scale diagnostics for the next generation of supercomputers.

Key Points
  • C++ API ingests 100,000 MPI ranks in 9.69 seconds on Aurora
  • GPU acceleration achieves up to 314x speedup over CPU processing
  • Tri-dimensional model identifies 32.28% potential speedup for GAMESS on Frontier

Why It Matters

Brings real-time performance diagnosis to exascale systems, enabling massive efficiency gains for scientific computing.