LinkedIn's DuaLip-GPU: PyTorch powers 10x faster extreme-scale optimization
LinkedIn rebuilt its optimization solver with GPU-accelerated PyTorch, achieving order-of-magnitude speedups.
LinkedIn's DuaLip was a distributed solver for linear programs (LPs) that optimized decisions like job matching, metric balancing, and email volume—problems involving hundreds of millions of users and trillions of decision variables. The original implementation ran on a CPU-bound Scala/Spark stack, limiting hardware utilization and making it difficult to extend to new problem formulations. To overcome these bottlenecks, LinkedIn's team rebuilt the core execution engine using PyTorch, leveraging its GPU acceleration and flexible define-by-run paradigm. The result, DuaLip-GPU, replaces expensive matrix factorizations with first-order methods that rely on sparse matrix-vector ops and blockwise projections, all optimized for multi-GPU clusters.
Initial benchmarks show order-of-magnitude speedups compared to the CPU baseline, with efficient scaling across multiple GPUs and a 50% reduction in engineering overhead for new constraints. By exposing the hot path as an explicit dataflow over PyTorch tensors, the system retains the mathematical guarantees of linear programming while gaining the performance and flexibility of modern deep learning frameworks. This approach demonstrates that PyTorch can serve as a powerful engine for non-deep-learning optimization at web scale, potentially influencing how other platforms tackle similar extreme-scale decision problems.
- DuaLip-GPU replaces LinkedIn's CPU-bound Scala/Spark solver with PyTorch for GPU acceleration.
- Handles linear programs with up to trillions of decision variables and sparse constraints.
- Achieves order-of-magnitude speedups and efficient multi-GPU scaling while reducing engineering complexity.
Why It Matters
Shows how PyTorch can accelerate non-deep-learning optimization at web scale, reducing cost and latency for LinkedIn's decision systems.