1Covenant/Covenant-72B: Largest model so far to be trained on decentralized permissionless GPU nodes
The 72-billion-parameter Covenant model was trained across permissionless nodes, solving bandwidth bottlenecks.
Covenant AI has achieved a significant milestone in decentralized AI by successfully training Covenant-72B, a 72-billion-parameter large language model, across a network of permissionless GPU nodes. This approach bypasses the need for expensive, centralized supercomputing clusters typically required for such massive models. The core technical breakthrough enabling this feat is their novel training method called SparseLoco.
SparseLoco is built upon the existing DiLoCo (Distributed Low-Communication) framework but introduces critical optimizations to solve the fundamental bandwidth bottleneck of decentralized training. It drastically reduces the frequency of synchronization needed between the distributed nodes. Furthermore, it incorporates an aggressive top-K sparsification technique, meaning only the most significant updates (the 'top K' values) are communicated between nodes, slashing the data transfer requirements. Combined with a local AdamW optimizer on each node, this method makes training a model of Covenant-72B's scale on a fragmented, decentralized network computationally feasible.
- Covenant-72B is a 72-billion-parameter model, making it one of the largest trained via decentralized methods.
- The SparseLoco method uses aggressive top-K sparsification and reduced sync frequency to overcome network bandwidth limits.
- Training was done on permissionless GPU nodes, challenging the need for centralized supercomputers like those from NVIDIA.
Why It Matters
This proves large-scale AI training can be decentralized, reducing costs and barriers to entry for model development.