Open Source

2x 512gb ram M3 Ultra mac studios

r/LocalLLaMA April 21, 2026

⚡A developer is stress-testing cutting-edge AI models like DeepSeek V3.2 on a $25k dual Mac Studio setup.

Deep Dive

A developer is conducting real-world performance testing of the latest open-source large language models (LLMs) on an exceptionally powerful Apple hardware setup. The system consists of two M3 Ultra Mac Studios, each equipped with 512GB of unified memory, representing a $25,000 investment in hardware. This configuration provides a total of 1TB of RAM, creating a unique testbed for running memory-intensive AI models locally, bypassing cloud API costs and latency.

The developer has already successfully loaded and run DeepSeek V3.2 in its 8-bit quantized (Q8) format using the ExLlamaV2 (Exo) backend, a popular optimization framework. They are currently troubleshooting the loading of the GLM 5.1 model in its Q4 quantized version on each machine. Furthermore, they are actively awaiting the release and subsequent community optimization of the anticipated Kimi 2.6 model for Apple's MLX framework and memory-mapped (mmap) loading techniques. This public testing initiative crowdsources model requests and provides valuable, practical data on the capabilities and limitations of running state-of-the-art AI on Apple Silicon.

Key Points

A developer is testing AI models on a dual M3 Ultra Mac Studio rig with 1TB of total unified memory.
They have successfully run DeepSeek V3.2 (Q8) and are working on loading GLM 5.1, while awaiting Kimi 2.6 optimizations.
The $25k setup serves as a public benchmark platform, taking requests to stress-test the latest open-source LLMs locally.

Why It Matters

This testing provides crucial real-world data on the feasibility and performance of running advanced AI models on high-end consumer hardware, guiding developer investments.

Read Original Article

2x 512gb ram M3 Ultra mac studios

Why It Matters

Stay Ahead in AI