Research & Papers

Leveraging SIMD for Accelerating Large-number Arithmetic

New algorithm restructures math for parallel CPUs, boosting cryptography and science.

Deep Dive

Researchers Subhrajit Das, Abhishek Bichhawat, and Yuvraj Patel from IIT Jodhpur and IIT Delhi have introduced DigitsOnTurbo (DoT), a novel approach to accelerating large-number arithmetic by restructuring computations around independent, data-parallel operations. Unlike traditional algorithms that suffer from inherent dependencies limiting SIMD (single instruction, multiple data) adoption on modern CPUs, DoT reimagines the arithmetic pipeline to fully leverage parallel processing capabilities. The results are striking: DoT achieves up to 1.85x speedups for addition and subtraction and 2.3x for multiplication compared to prior SIMD implementations.

When integrated into state-of-the-art libraries, DoT delivers even more impressive gains: up to 4x speedup for addition and subtraction and up to 2x for multiplication. These improvements cascade into real-world performance boosts, including up to 19.3% end-to-end throughput gains for scientific computations and up to 7.9% latency and 5.9% throughput improvements for cryptographic implementations. The work, published on arXiv, targets a critical bottleneck in high-performance computing and cryptography, where large-number arithmetic is foundational.

Key Points
  • DigitsOnTurbo (DoT) restructures large-number arithmetic for SIMD parallelism, achieving up to 1.85x speedup for addition/subtraction and 2.3x for multiplication over prior SIMD methods.
  • When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition/subtraction and 2x for multiplication.
  • End-to-end gains include up to 19.3% throughput improvement for scientific computing and up to 7.9% latency reduction for cryptography.

Why It Matters

DoT unlocks significant performance gains for cryptography and scientific computing by fully utilizing modern CPU parallel capabilities.