13 months since the DeepSeek moment, how far have we gone running models locally?
From $6,000 to $600: local AI performance now matches last year's frontier models at 1/10th the cost.
Thirteen months after Hugging Face engineer's viral tweet demonstrated running frontier-level DeepSeek R1 at Q8 precision for approximately $6,000, the local AI landscape has transformed. Today, the significantly more capable Qwen3-27B model can run at similar speeds (around 5 tokens/second) on hardware costing just $600—a 90% cost reduction. Even more impressive, the stronger Qwen3.5-35B-A3B model achieves usable speeds of 17-20 tokens/second at Q4/Q5 precision, representing both performance and efficiency breakthroughs from companies like Qwen and DeepSeek.
This rapid progress in model compression and hardware optimization suggests we're approaching a tipping point for local AI. The trajectory indicates that within the next year, we could see 4-billion parameter models running locally that match or exceed the capabilities of today's larger cloud models like Kimi 2.5. This democratization of AI compute has profound implications for privacy-sensitive applications, edge computing, and reducing dependency on cloud API costs, potentially reshaping how developers and enterprises deploy AI solutions.
- Cost dropped 90%: $6,000 hardware requirement reduced to $600 for similar performance
- Speed increased 4x: From 5 tps with DeepSeek R1 to 17-20 tps with Qwen3.5-35B-A3B
- Model quality improved: Current local models surpass last year's frontier models despite smaller size
Why It Matters
Democratizes AI development, enables private local deployment, and reduces cloud dependency for enterprises and developers.