Open Source

UG student launches Dhi-5B (Trained from Scratch)

This student-built model challenges giants with 10x less budget...

Deep Dive

An undergraduate student has built Dhi-5B, a 5-billion parameter multimodal language model trained from scratch for only $1,200. The model uses advanced techniques like FlashAttention-3 and a custom codebase, trained in five stages including pre-training on 40B tokens and vision extension. The base 4B-parameter variant is available now, with instruct and vision versions coming soon. Early evaluations show it competing with models costing 10x more, demonstrating remarkable compute efficiency.

Why It Matters

It dramatically lowers the barrier to state-of-the-art AI development, potentially democratizing model creation.