Open Source

Collected the infinity stones

A DIY heterogeneous cluster with 2.3TB RAM and Blackwells seeks Tinygrad driver help.

Deep Dive

A Reddit user known as Street-Buyer-2428 has made waves in the AI hardware community by announcing they've assembled a monster cluster: 2.3 terabytes of RAM and over 400 virtual CPU cores. The system's final piece is connecting it to NVIDIA's Blackwell GPUs using RDMA (Remote Direct Memory Access) for low-latency data transfer. The setup is intended to be what they call a 'heterogeneous cluster'—using Blackwells for the prefill stage (processing input tokens) and an existing studio mesh (likely a multi-node system) for the decoding stage, effectively splitting the inference pipeline across different hardware to maximize efficiency.

The builder admits the project is stalled on the Tinygrad driver—a lightweight, open-source deep learning framework—to enable RDMA between the host memory and the Blackwell cards. They've put out a call for collaboration from anyone with expertise in Tinygrad, RDMA, or cluster networking. If successful, this could be one of the first community-driven clusters to mix consumer-grade CPU memory with enterprise GPUs for large-model inference. The post underscores the growing trend of DIY AI builders pushing the boundaries of what's possible with off-the-shelf and last-gen hardware.

Key Points
  • 2.3 TB RAM and 400+ vCores in a single system
  • Uses NVIDIA Blackwell GPUs for prefill and a separate mesh for decode
  • Needs community help with Tinygrad driver for RDMA connectivity

Why It Matters

Demonstrates that cutting-edge AI clusters can be built by enthusiasts, lowering the barrier to large-model inference.