Research & Papers

Gupta et al. crack rectangular matrix multiplication in low-bandwidth model

New bounds: Θ(d√n) for d ≤ √n, O(d^{4/3}) for narrow matrices

Deep Dive

In a new paper on arXiv, researchers Chetan Gupta, Jukka Suomela, and Hossein Vahidi tackle rectangular matrix multiplication in the low-bandwidth distributed computing model. Here, n computers each hold part of two input matrices and can only exchange O(log n)-bit messages per round. The goal: each computer outputs its designated portion of the product. Prior work focused on square n×n multiplication (achieving O(n^{4/3}) rounds over semirings). This paper extends to rectangular instances ⟨a,b,c⟩ (a×b times b×c) without sparsity assumptions.

For two natural aspect ratios, ⟨n,d,n⟩ and ⟨d,n,d⟩ with d ≤ n, the authors prove strikingly different complexity behaviors. For ⟨d,n,d⟩ (tall, skinny matrices), they achieve Õ(d^{4/3}) rounds—a natural generalization of the square case. For ⟨n,d,n⟩ (wide matrices), a phase transition emerges: when d ≤ √n, the complexity is exactly Θ(d√n) (matching lower and upper bounds). For d ≥ √n, the best upper bound becomes O(d^{2/3} n^{2/3}), suggesting fundamentally different communication constraints. This work provides tight bounds for an important regime and opens questions for further rectangular cases.

Key Points
  • For ⟨d,n,d⟩ (narrow matrices), complexity is Õ(d^{4/3})—matching generalized square case.
  • For ⟨n,d,n⟩ (wide matrices) with d ≤ √n, complexity is Θ(d√n) with matching upper/lower bounds.
  • For d ≥ √n in wide case, upper bound is O(d^{2/3} n^{2/3}), showing a phase transition in round complexity.

Why It Matters

New bounds for rectangular matrix multiplication in low-bandwidth distributed systems, critical for scaling ML and scientific computing.