Open Source

I trained a language model on CPU in 1.2 hours with no matrix multiplications — here's what I learned

r/LocalLLaMA February 18, 2026

⚡A 13.6M parameter model uses ternary weights and runs on a 2-thread CPU, achieving a 6.80 validation loss.

Deep Dive

Developer changcheng967 built FlashLM v3, a 13.6M parameter language model. It uses ternary weights ({-1, 0, +1}) and was trained on a 2-thread CPU in just 1.2 hours on 32M tokens. The model learns syntax but not semantics, and the project revealed that 86% of training time was spent on the inefficient output layer. The code is MIT licensed and available on Hugging Face.

Why It Matters

This research highlights bottlenecks in efficient model design and pushes the boundaries of what's possible with minimal, CPU-only hardware.

Read Original Article

I trained a language model on CPU in 1.2 hours with no matrix multiplications — here's what I learned

Why It Matters

Stay Ahead in AI