DuaLip-GPU replaces LinkedIn's CPU-bound Scala/Spark solver with PyTorch for GPU acceleration?

DuaLip-GPU replaces LinkedIn's CPU-bound Scala/Spark solver with PyTorch for GPU acceleration.

Handles linear programs with up to trillions of decision variables and sparse constraints?

Handles linear programs with up to trillions of decision variables and sparse constraints.

Achieves order-of-magnitude speedups and efficient multi-GPU scaling while reducing engineering complexity?

Achieves order-of-magnitude speedups and efficient multi-GPU scaling while reducing engineering complexity.

Developer Tools

LinkedIn's DuaLip-GPU: PyTorch powers 10x faster extreme-scale optimization

PyTorch Blog June 01, 2026

⚡LinkedIn rebuilt its optimization solver with GPU-accelerated PyTorch, achieving order-of-magnitude speedups.

Deep Dive

LinkedIn's DuaLip was a distributed solver for linear programs (LPs) that optimized decisions like job matching, metric balancing, and email volume—problems involving hundreds of millions of users and trillions of decision variables. The original implementation ran on a CPU-bound Scala/Spark stack, limiting hardware utilization and making it difficult to extend to new problem formulations. To overcome these bottlenecks, LinkedIn's team rebuilt the core execution engine using PyTorch, leveraging its GPU acceleration and flexible define-by-run paradigm. The result, DuaLip-GPU, replaces expensive matrix factorizations with first-order methods that rely on sparse matrix-vector ops and blockwise projections, all optimized for multi-GPU clusters.

Initial benchmarks show order-of-magnitude speedups compared to the CPU baseline, with efficient scaling across multiple GPUs and a 50% reduction in engineering overhead for new constraints. By exposing the hot path as an explicit dataflow over PyTorch tensors, the system retains the mathematical guarantees of linear programming while gaining the performance and flexibility of modern deep learning frameworks. This approach demonstrates that PyTorch can serve as a powerful engine for non-deep-learning optimization at web scale, potentially influencing how other platforms tackle similar extreme-scale decision problems.

Key Points

DuaLip-GPU replaces LinkedIn's CPU-bound Scala/Spark solver with PyTorch for GPU acceleration.
Handles linear programs with up to trillions of decision variables and sparse constraints.
Achieves order-of-magnitude speedups and efficient multi-GPU scaling while reducing engineering complexity.

Why It Matters

Shows how PyTorch can accelerate non-deep-learning optimization at web scale, reducing cost and latency for LinkedIn's decision systems.

Read Original Article

LinkedIn's DuaLip-GPU: PyTorch powers 10x faster extreme-scale optimization

Why It Matters

Related Articles

🚀 Stay Ahead in AI