I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R]
An 18-year-old developer successfully trained a 1.088B-parameter SNN from scratch, a feat previously thought impossible.
An 18-year-old independent developer has achieved a significant milestone in neuromorphic computing by successfully training a 1.088-billion-parameter Spiking Neural Network (SNN) from scratch. This challenges the prevailing assumption in research papers like SpikeBERT that training SNNs of this scale directly from random initialization fails due to vanishing gradients, forcing researchers to rely on ANN-to-SNN conversion or distillation techniques. The developer trained the model for 27,000 steps, reaching a loss of 4.4, before exhausting their computational budget.
The model exhibited remarkable emergent properties. It maintained approximately 93% sparsity, meaning only 7% of neurons fired per token, which translates to drastically lower memory and energy use during inference compared to dense models like GPT. Intriguingly, around step 25,000, the model began generating structurally correct Russian text without explicit targeting, showcasing unexpected cross-lingual generalization. Furthermore, as the architecture scaled past 600 million parameters, the model autonomously routed 39% of its activations into a persistent memory module, demonstrating an intrinsic learning of architectural efficiency at scale.
While the text generation is not yet fluent and the loss is high due to the truncated training, this experiment serves as a critical proof-of-concept. It validates that pure, large-scale SNNs can converge, opening the door for more research into their direct training. The developer has open-sourced the full 12GB training checkpoint, code, and architecture details, inviting collaboration to refine surrogate gradients and explore deployment on neuromorphic hardware platforms.
- Trained a 1.088B-parameter Spiking Neural Network (SNN) from random initialization, defying previous research that deemed it impossible due to vanishing gradients.
- Achieved 93% sparsity (only 7% neuron firing rate), promising massive gains in inference efficiency and memory usage compared to traditional dense models.
- Demonstrated emergent behaviors: cross-lingual Russian text generation and autonomous re-routing of 39% of activations to memory at scale.
Why It Matters
This breakthrough could enable ultra-low-power AI inference on specialized neuromorphic chips, drastically reducing the cost and energy footprint of large language models.