25M parameter TinyStories model trained from scratch on only 8GB VRAM?

25M parameter TinyStories model trained from scratch on only 8GB VRAM

Multi-Token Prediction (MTP) technique integrated despite slower training?

Multi-Token Prediction (MTP) technique integrated despite slower training

Open-source GitHub repo enables others to replicate and experiment?

Open-source GitHub repo enables others to replicate and experiment

Open Source

Tiny 25M model trained from scratch on 8GB VRAM using new open-source repo

r/LocalLLaMA May 30, 2026

⚡A developer built an open-source project to train a 25M parameter model on just 8GB VRAM.

Deep Dive

Developer epoyraz (u/tevlon) answered the call for more accessible model training with a new open-source project: train-a-model-from-scratch. The repo provides a full pipeline to train a TinyStories language model from scratch using just 8GB of VRAM. The resulting 25M-parameter model is not a full LLM but a functional demonstration of building transformers on budget-friendly consumer GPUs. The trained weights are available on HuggingFace for anyone to test.

During development, epoyraz tested several optimizations: mHC was too small for the task, BitNet showed no memory savings during training, and TurboQuant proved unnecessary. However, Multi-Token Prediction (MTP) successfully accelerated learning for multiple tokens but at the cost of slower overall training. The project is designed for hobbyists and researchers who want to understand every step of training a small transformer without expensive hardware. It also sets a foundation for future community-driven improvements in low-resource training.

Key Points

25M parameter TinyStories model trained from scratch on only 8GB VRAM
Multi-Token Prediction (MTP) technique integrated despite slower training
Open-source GitHub repo enables others to replicate and experiment

Why It Matters

Democratizes small-scale LLM training from scratch, making it accessible on budget consumer GPUs.

Read Original Article

Tiny 25M model trained from scratch on 8GB VRAM using new open-source repo

Why It Matters

Related Articles

🚀 Stay Ahead in AI