Research & Papers

Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

New distributed training system hits 1.2 EFLOPS, compressing weeks of training to just hours.

Deep Dive

A team of researchers has shattered a major bottleneck in AI-for-Science by developing a system that trains billion-parameter physics models orders of magnitude faster. They introduced two key innovations: MatRIS-MoE, a massive Mixture-of-Experts model designed as a universal Machine Learning Interatomic Potential (uMLIP), and Janus, a novel high-dimensional distributed training framework specifically engineered for these complex models. uMLIPs are foundational AI models pre-trained on diverse datasets to perform quantum-accurate simulations of materials and molecules, but their training has been prohibitively slow due to the need for second-order derivatives and explosive computational overhead at scale.

Janus tackles this with hardware-aware optimizations, enabling it to run efficiently across two Exascale supercomputers. The system achieved a staggering peak performance of 1.2 EFLOPS in single precision, representing 24-35.5% of the theoretical peak, while maintaining over 90% parallel efficiency. This performance leap compresses the training timeline for a billion-parameter uMLIP from what previously took weeks down to mere hours. The work establishes a new benchmark for AI4S foundation models and provides the essential high-performance computing infrastructure to accelerate discoveries in chemistry, materials science, and drug design by making rapid, large-scale physical simulations feasible.

Key Points
  • Introduced Janus, a first-of-its-kind distributed training framework for universal Machine Learning Interatomic Potentials (uMLIPs), solving the lack of parallel frameworks for models requiring second-order derivatives.
  • Achieved 1.2 EFLOPS performance on Exascale supercomputers with >90% parallel efficiency, compressing billion-parameter model training from weeks to hours.
  • The MatRIS-MoE model and Janus framework together set a new high-water mark for AI-for-Science foundation models, enabling rapid, quantum-accurate simulations across the periodic table.

Why It Matters

This breakthrough drastically accelerates the discovery of new materials, chemicals, and pharmaceuticals by making high-fidelity atomic-scale simulation practical.