Open Source

1Covenant/Covenant-72B: Largest model so far to be trained on decentralized permissionless GPU nodes

r/LocalLLaMA March 17, 2026

⚡The 72-billion-parameter Covenant model was trained across permissionless nodes, solving bandwidth bottlenecks.

Deep Dive

Covenant AI has achieved a significant milestone in decentralized AI by successfully training Covenant-72B, a 72-billion-parameter large language model, across a network of permissionless GPU nodes. This approach bypasses the need for expensive, centralized supercomputing clusters typically required for such massive models. The core technical breakthrough enabling this feat is their novel training method called SparseLoco.

SparseLoco is built upon the existing DiLoCo (Distributed Low-Communication) framework but introduces critical optimizations to solve the fundamental bandwidth bottleneck of decentralized training. It drastically reduces the frequency of synchronization needed between the distributed nodes. Furthermore, it incorporates an aggressive top-K sparsification technique, meaning only the most significant updates (the 'top K' values) are communicated between nodes, slashing the data transfer requirements. Combined with a local AdamW optimizer on each node, this method makes training a model of Covenant-72B's scale on a fragmented, decentralized network computationally feasible.

Key Points

Covenant-72B is a 72-billion-parameter model, making it one of the largest trained via decentralized methods.
The SparseLoco method uses aggressive top-K sparsification and reduced sync frequency to overcome network bandwidth limits.
Training was done on permissionless GPU nodes, challenging the need for centralized supercomputers like those from NVIDIA.

Why It Matters

This proves large-scale AI training can be decentralized, reducing costs and barriers to entry for model development.

Read Original Article

1Covenant/Covenant-72B: Largest model so far to be trained on decentralized permissionless GPU nodes

Why It Matters

Stay Ahead in AI