Research & Papers

CvxCluster: Solving Large, Complex, Granular Resource Allocation Problems 100-1000x Faster

New convex optimization method outperforms MIP solvers by up to 2500x on Azure traces

Deep Dive

Cluster resource allocation is a computationally hard problem, traditionally tackled with mixed integer programming (MIP) or heuristic search. Researchers from Stanford (Obi Nnorom Jr., Stephen Boyd, Philip Levis) propose CvxCluster, a novel method that reformulates the problem as a convex optimization task. This shift from discrete to continuous optimization enables dramatically faster solution times. The algorithm operates in two stages: first, it solves a convex relaxation to compute principled resource prices for each machine. Second, it uses these prices to drive a lightweight greedy procedure that places tasks efficiently.

In experiments using real Azure traces, CvxCluster scaled to over 100,000 servers and sustained arrival rates 500,000 times the baseline. It ran 100 to 2,500 times faster than a state-of-the-art MIP solver while remaining within 3% of the optimal objective. The method supports complex constraints such as job anti-affinity, machine types, and GPU servers. The key insight: by making the problem continuous, convex optimization can find solutions that are just as good or better than prior heuristics, but orders of magnitude faster. This could transform how cloud providers handle resource allocation in large-scale data centers.

Key Points
  • CvxCluster uses a two-stage convex optimization algorithm, replacing slow MIP solvers.
  • Achieves 100-2500x speedup on Azure traces with 100,480 servers, within 3% of optimal.
  • Supports complex constraints like job anti-affinity, machine types, and GPU servers.

Why It Matters

Cloud operators can allocate resources 100x faster, enabling larger scale and lower latency for cluster management.