Open-Source "GreenBoost" Driver Aims To Augment NVIDIA GPUs vRAM With System RAM & NVMe To Handle Larger LLMs
Open-source tool lets consumer GPUs run massive AI models by pooling memory from RAM and SSDs.
A new open-source software driver called 'GreenBoost' is making waves by addressing one of the biggest bottlenecks in local AI development: GPU memory constraints. Developed by independent researchers, the driver creates a virtual memory pool that combines a GPU's native vRAM with available system RAM and even fast NVMe SSD storage. This approach allows consumer graphics cards, such as NVIDIA's GeForce RTX 4090 with 24GB of vRAM, to effectively work with AI models that demand 40GB, 60GB, or more of memory by swapping data between these tiers.
The technology operates by treating the slower system RAM and NVMe storage as overflow areas for the GPU's high-speed vRAM. While accessing data from RAM or an SSD is slower than from vRAM, sophisticated caching algorithms prioritize keeping actively used model weights in the fastest available memory. This means researchers and developers can now experiment with larger parameter models—like 70B or even 120B parameter LLMs—on hardware that was previously insufficient, bypassing the need for costly server-grade GPUs like the H100.
Early benchmarks show the driver can enable a system with 64GB of system RAM and a fast PCIe 4.0 NVMe drive to provide a combined 'effective vRAM' of over 80GB to the GPU. While inference speed sees a performance penalty when data is fetched from slower memory tiers, the primary benefit is functionality: being able to load and run models that were completely impossible before. The driver is particularly impactful for the open-source AI community, where model sizes are rapidly growing but access to high-vRAM hardware remains limited.
- Creates a unified memory pool from GPU vRAM, system RAM, and NVMe SSD storage
- Enables consumer GPUs (e.g., RTX 4090) to run LLMs requiring 40GB+ of memory
- Open-source nature allows free experimentation with larger models without expensive hardware upgrades
Why It Matters
Democratizes access to cutting-edge AI by letting developers run massive models on affordable consumer hardware.