Feedback on my 256gb VRAM local setup and cluster plans. Lawyer keeping it local.
Legal professional constructs private 8-GPU cluster to run massive models on confidential case files.
A legal professional has taken data privacy into their own hands by constructing a formidable local AI cluster from the ground up. Dubbed 'Node 1,' the system is built on a Gigabyte Threadripper motherboard with 256GB of DDR4 RAM and is powered by eight NVIDIA V100 SXM GPUs, each with 32GB of VRAM. The setup uses custom pass-through boards to NVLink pairs of GPUs, creating 128GB memory pools, and connects them to the motherboard via PCIe switches. With a current power draw of 2800 watts from two standard circuits, future plans include installing a 240-volt outlet and potentially adding more V100s. The immediate goal is to use this hardware to run the largest available open-source reasoning models, like GLM or DeepSeek, entirely offline.
The lawyer's project is driven by a need for absolute confidentiality while automating legal work. The primary application is building Retrieval-Augmented Generation (RAG) systems on a decade's worth of personal case files and documents. This aims to automate routine legal tasks and provide strong AI assistance for semi-routine work. Following RAG implementation, the plan is to conduct QLoRA (Quantized Low-Rank Adaptation) training to fine-tune models specifically for legal reasoning. The builder is also planning a 'Node 2' with an AMD platform and RTX 3090s, and possesses additional inventory including P40 and P100 GPUs for a potential distributed cluster, all to avoid sending sensitive client data to external cloud AI services.
- Built a local 'Node 1' with 8x NVIDIA V100 32GB GPUs (256GB total VRAM) NVLinked in pairs, powered by 2800W across two circuits.
- Goal is to run massive models like GLM or DeepSeek locally for RAG on 10 years of legal files and subsequent QLoRA training.
- Possesses extensive hardware for expansion, including a planned Node 2 with RTX 3090s and inventory of older Tesla cards for a distributed cluster.
Why It Matters
Shows a real-world, extreme approach to private AI for sensitive professions, moving beyond theory to hardware implementation.