Research & Papers

FlashSpread: IO-Aware GPU Simulation of Non-Markovian Epidemic Dynamics via Kernel Fusion

Fused Triton kernel hits 8.09 Giga-NUPS on million-node graphs, slashing simulation time...

Deep Dive

FlashSpread tackles the computational bottleneck of non-Markovian (renewal) epidemic simulation, where age-dependent hazard functions force dense per-step updates that break traditional sparse event-queue CPU methods. The team—Heman Shakeri, Behnaz Moradi-Jamei, Aram Vajdi, and Ehsan Ardjmand—designed a single fused Triton kernel that consolidates CSR traversal, numerically stable erfcx-based hazard evaluation, Bernoulli tau-leaping, state transitions, and infectivity write-back, keeping all intermediates in streaming-multiprocessor registers. This eliminates memory-bound overhead and enables CUDA Graph capture for deterministic replay.

On an NVIDIA A100, FlashSpread achieves 8.09 Giga-NUPS (node updates per second) at N=10^6 on a uniform-degree graph, a 217x strict hardware speedup over optimized CPU tau-leaping. For scale-free Barabasi-Albert graphs, a degree-aware dispatch (thread/warp/edge-merge) recovers 4.5x throughput (0.45 to 2.0 Giga-NUPS) over the default kernel. The framework scales to N=10^8 on a single A100 (40 GB) using mixed-precision storage that extends L2-reachable scale by ~3x and yields a 1.15x throughput lift at bandwidth-bound extremes. Validation against exact non-Markovian Gillespie shows structural bias of ~6% on peak infection and ~7% on final attack rate, stable across two decades of tolerance—well within typical epidemiological parameter uncertainty.

Key Points
  • FlashSpread fuses the entire renewal pipeline into a single Triton kernel, achieving 8.09 Giga-NUPS on a uniform-degree graph with N=10^6
  • 217x speedup over optimized CPU tau-leaping on an NVIDIA A100, with 4.5x additional gain on scale-free graphs via degree-aware dispatch
  • Scales to 100 million nodes on a single A100 (40 GB) using mixed-precision storage, with structural bias of ~6-7% versus exact Gillespie reference

Why It Matters

Enables realistic epidemic forecasting on massive contact networks in minutes instead of days, speeding up public health response modeling.