Research & Papers

Better Neural Network Expressivity: Subdividing the Simplex

A new proof shows ReLU networks can compute complex functions with roughly log₃(n) layers, not log₂(n).

Deep Dive

A team of researchers has published a groundbreaking paper on arXiv that fundamentally changes our understanding of ReLU neural network expressivity. The work, 'Better Neural Network Expressivity: Subdividing the Simplex,' disproves a conjecture from NeurIPS 2021 that stated ⌈log₂(n+1)⌉ hidden layers were optimal for computing all continuous piecewise linear (CPWL) functions—a critical class for AI. The authors demonstrate that only ⌈log₃(n-1)⌉+1 layers are sufficient, a significant reduction in required depth. A key technical breakthrough was proving that ReLU networks with just two hidden layers can exactly represent the maximum function of five inputs, a task previously thought to require more layers. More broadly, they show that computing the maximum of n≥4 numbers requires only ⌈log₃(n-2)⌉+1 layers. This new upper bound nearly matches the ⌈log₃(n)⌉ lower bound established in prior ICLR 2025 work for networks with decimal fraction weights. The construction has an elegant geometric interpretation, involving polyhedral subdivisions of a simplex into simpler polytopes. For practitioners, this means the theoretical minimum depth for highly expressive networks is lower than previously assumed. This could influence the design of future architectures, potentially leading to more efficient training and inference for models that rely on CPWL approximations, which includes most modern deep learning systems using ReLU activations.

Key Points
  • Disproves the Hertrich et al. (NeurIPS'21) conjecture that ⌈log₂(n+1)⌉ layers are optimal for CPWL functions.
  • Proves a new sufficient depth of ⌈log₃(n-1)⌉+1 layers, a major reduction in theoretical minimum network depth.
  • Shows a 2-layer ReLU network can compute the max of 5 inputs, a foundational building block for complex functions.

Why It Matters

Lowers the theoretical bar for efficient AI model design, potentially enabling shallower, faster networks with equal power.