Ran Moonshot AI's 1-trillion-parameter Kimi K2.5 model on a single RTX 3060 GPU?

Ran Moonshot AI's 1-trillion-parameter Kimi K2.5 model on a single RTX 3060 GPU

Used 768GB of Intel Optane Persistent Memory for weight storage?

Used 768GB of Intel Optane Persistent Memory for weight storage

Achieved ~4 tokens per second inference speed on consumer hardware?

Achieved ~4 tokens per second inference speed on consumer hardware

Viral Wire

Kimi K2.5 model runs on RTX 3060 with Intel Optane

MEXC News / AI Weekly / Startup Fortune May 24, 2026

⚡1 trillion parameters on a $300 GPU — at 4 tokens per second.

Deep Dive

In a surprising breakthrough, AI enthusiast APFrisco successfully demonstrated running Moonshot AI's massive 1-trillion-parameter Kimi K2.5 model on a single consumer-grade Nvidia RTX 3060 graphics card. The key enabler was 768GB of Intel Optane Persistent Memory, which provided the necessary capacity to hold the model's weights. Despite the RTX 3060's 12GB VRAM limitation, the setup achieved local inference at approximately four tokens per second—remarkable for non-enterprise hardware.

This feat highlights the potential of persistent memory to bridge the gap between consumer and enterprise AI workloads. While 4 tokens per second is too slow for real-time use, it's fast enough for offline experimentation, model evaluation, and fine-tuning research. The approach could democratize access to trillion-parameter models, traditionally reserved for server clusters with dozens of H100 GPUs. APFrisco's work opens the door for hobbyists and startups to explore frontier-scale models without breaking the bank.

Key Points

Ran Moonshot AI's 1-trillion-parameter Kimi K2.5 model on a single RTX 3060 GPU
Used 768GB of Intel Optane Persistent Memory for weight storage
Achieved ~4 tokens per second inference speed on consumer hardware

Why It Matters

Enables trillion-parameter model inference on consumer GPUs, drastically lowering barriers to AI research.

Read Original Article

Kimi K2.5 model runs on RTX 3060 with Intel Optane

Why It Matters

Related Articles

🚀 Stay Ahead in AI