Tiny 25M model trained from scratch on 8GB VRAM using new open-source repo
A developer built an open-source project to train a 25M parameter model on just 8GB VRAM.
Developer epoyraz (u/tevlon) answered the call for more accessible model training with a new open-source project: train-a-model-from-scratch. The repo provides a full pipeline to train a TinyStories language model from scratch using just 8GB of VRAM. The resulting 25M-parameter model is not a full LLM but a functional demonstration of building transformers on budget-friendly consumer GPUs. The trained weights are available on HuggingFace for anyone to test.
During development, epoyraz tested several optimizations: mHC was too small for the task, BitNet showed no memory savings during training, and TurboQuant proved unnecessary. However, Multi-Token Prediction (MTP) successfully accelerated learning for multiple tokens but at the cost of slower overall training. The project is designed for hobbyists and researchers who want to understand every step of training a small transformer without expensive hardware. It also sets a foundation for future community-driven improvements in low-resource training.
- 25M parameter TinyStories model trained from scratch on only 8GB VRAM
- Multi-Token Prediction (MTP) technique integrated despite slower training
- Open-source GitHub repo enables others to replicate and experiment
Why It Matters
Democratizes small-scale LLM training from scratch, making it accessible on budget consumer GPUs.