Open Source

Feedback on my 256gb VRAM local setup and cluster plans. Lawyer keeping it local.

r/LocalLLaMA March 21, 2026

⚡Legal professional constructs private 8-GPU cluster to run massive models on confidential case files.

Deep Dive

A legal professional has taken data privacy into their own hands by constructing a formidable local AI cluster from the ground up. Dubbed 'Node 1,' the system is built on a Gigabyte Threadripper motherboard with 256GB of DDR4 RAM and is powered by eight NVIDIA V100 SXM GPUs, each with 32GB of VRAM. The setup uses custom pass-through boards to NVLink pairs of GPUs, creating 128GB memory pools, and connects them to the motherboard via PCIe switches. With a current power draw of 2800 watts from two standard circuits, future plans include installing a 240-volt outlet and potentially adding more V100s. The immediate goal is to use this hardware to run the largest available open-source reasoning models, like GLM or DeepSeek, entirely offline.

The lawyer's project is driven by a need for absolute confidentiality while automating legal work. The primary application is building Retrieval-Augmented Generation (RAG) systems on a decade's worth of personal case files and documents. This aims to automate routine legal tasks and provide strong AI assistance for semi-routine work. Following RAG implementation, the plan is to conduct QLoRA (Quantized Low-Rank Adaptation) training to fine-tune models specifically for legal reasoning. The builder is also planning a 'Node 2' with an AMD platform and RTX 3090s, and possesses additional inventory including P40 and P100 GPUs for a potential distributed cluster, all to avoid sending sensitive client data to external cloud AI services.

Key Points

Built a local 'Node 1' with 8x NVIDIA V100 32GB GPUs (256GB total VRAM) NVLinked in pairs, powered by 2800W across two circuits.
Goal is to run massive models like GLM or DeepSeek locally for RAG on 10 years of legal files and subsequent QLoRA training.
Possesses extensive hardware for expansion, including a planned Node 2 with RTX 3090s and inventory of older Tesla cards for a distributed cluster.

Why It Matters

Shows a real-world, extreme approach to private AI for sensitive professions, moving beyond theory to hardware implementation.

Read Original Article

Feedback on my 256gb VRAM local setup and cluster plans. Lawyer keeping it local.

Why It Matters

Stay Ahead in AI