Open Source

16x Spark Cluster (Build Update)

16 DGX Sparks unified at 200Gbps — a custom setup that rivals H100 clusters for large model inference

Deep Dive

A self-described builder has finished assembling a 16-node DGX Spark cluster, connected via a single QSFP56 cable per node to an FS N8510 200Gbps QSFP56 fabric switch. Each Spark bonds its two NICs into the port, achieving 100–111 Gbps per rail and aggregating to the advertised 200 Gbps line rate. Setup involved racking each unit, powering on, waiting 20 minutes per node for system updates, then scripting passwordless SSH, jumbo frames, and IP configuration. The builder reports the process was smoother than expected, with Nvidia’s custom Ubuntu image running out of the box with most prerequisites pre-installed.

The motivation for choosing DGX Sparks over H100s or a GB300 is unified memory capacity: the power of staying entirely within Nvidia’s ecosystem. With just 8 nodes, the cluster already serves the 434GB GLM-5.1-NVFP4 model at tensor parallelism 8. Next tests will target DeepSeek and Kimi. The longer-term architecture plans a prefill/decode split: the Spark cluster handles massive parallel throughput for prefill, while 2–4 upcoming M5 Ultra Mac Studios will be added to the rack to handle decode. The full rack also includes an OPNSense firewall, Mikrotik switches, QNAP 374TB U.2 NAS, dual 4090 workstations, and a SuperMicro 4x H100 NVL station.

Key Points
  • 16 DGX Sparks each hit 200Gbps line rate via dual-rail bonded QSFP56 cables to a single FS N8510 switch
  • Unified memory enables serving a 434GB GLM-5.1 model on just 8 nodes at TP=8, showcasing Nvidia ecosystem advantage
  • Planned prefill/decode split uses Spark cluster for prefill and future Mac Studios for decode, maximizing throughput

Why It Matters

Custom-built Spark clusters with unified memory can challenge traditional H100 deployments for large-scale AI inference at lower cost.