Training a 144M Spiking Neural Network for text generation from scratch — no transformer teacher, no distillation
A novel 144M-parameter SNN language model trained for just $10 shows emergent 97% sparsity and unique interpretability.
An independent researcher has successfully trained Nord AI, a 144-million parameter Spiking Neural Network language model, from scratch for approximately $10 using a rented NVIDIA A5000 GPU. This marks a significant milestone as only the second SNN LM trained from scratch (after SpikeGPT) and features a completely novel architecture not based on transformers or RWKV. The model was trained on the FineWeb-Edu dataset, and its development challenges the high-cost paradigm of modern AI, demonstrating that alternative, brain-inspired architectures can be explored with minimal resources. The researcher openly shared the code on GitHub and the model on Hugging Face, inviting feedback from the neuromorphic computing community.
Technically, Nord AI exhibits several fascinating properties that emerge naturally during training without explicit regularization. It achieves 97-98% inference sparsity, meaning only 2-3% of neurons fire per token, which could lead to massive efficiency gains on neuromorphic hardware. Early qualitative comparisons with the 124M-parameter GPT-2 Small suggest Nord AI has an advantage in topic coherence, potentially due to sparse activation acting as a relevance filter. Crucially, the SNN architecture provides inherent interpretability; spike rate analysis visually shows which model blocks are active during reasoning (e.g., Block 4 at 9.8% for processing vs. Block 0 at 0.6% for noise filtering). The model also incorporates Spike-Timing Dependent Plasticity (STDP), a biological learning rule, enabling potential online learning during conversation. While current text fluency lags behind GPT-2 and loss is still at 4.5, this proof-of-concept opens a new path for efficient, interpretable, and biologically plausible language models.
- Trained a 144M-parameter Spiking Neural Network language model from scratch for only ~$10 on a rented GPU.
- Achieves 97-98% inference sparsity naturally without sparsity loss, promising major efficiency gains.
- Offers unique interpretability via spike analysis and uses biological STDP rules for potential online learning.
Why It Matters
Pioneers a low-cost, efficient, and interpretable path for AI that diverges from expensive transformer models, aligning with neuromorphic hardware.