Includes common layers such as linear projections, attention, pooling, normalization, and positional encodings?

Includes common layers such as linear projections, attention, pooling, normalization, and positional encodings.

Shifts theoretical focus from PAC learnability to inductive biases, scalability, and optimization performance?

Shifts theoretical focus from PAC learnability to inductive biases, scalability, and optimization performance.

Research & Papers

All Feedforward Neural Networks Proven to Have Finite Sample Complexity

arXiv stat.ML May 11, 2026

⚡A groundbreaking proof shows MLPs, CNNs, GNNs, and even transformers are PAC-learnable.

Deep Dive

A new mathematical theorem proves that a vast class of feedforward neural networks—including MLPs, CNNs, GNNs, and transformers with fixed sequence length—all possess finite sample complexity in the agnostic Probably Approximately Correct (PAC) learning model. The proof, from Kratsios, Cousins, Sáez de Ocáriz Borde, Jun Kim, and Brugiapaglia, shows that any fixed feedforward architecture whose layers are definable in an o-minimal structure (a framework for tame geometry) guarantees distribution-free learnability even with unbounded parameters. This covers virtually all standard operations: linear projections, residual connections, attention mechanisms, pooling, normalization layers, and admissible positional encodings.

The key implication is that finite-sample PAC learnability is no longer a differentiator between architectures. Instead, the paper repositions it as a baseline property. The authors argue this shifts the conversation toward inductive biases, geometric priors, scalability, and optimization behavior—areas that truly distinguish modern architectures. This result unifies previous architecture-specific VC-dimension arguments under a single, elegant mathematical umbrella, offering a foundational result for theoretical machine learning.

Key Points

Covers all standard feedforward architectures: MLPs, CNNs, GNNs, and transformers with fixed sequence length.
Includes common layers such as linear projections, attention, pooling, normalization, and positional encodings.
Shifts theoretical focus from PAC learnability to inductive biases, scalability, and optimization performance.

Why It Matters

This theorem redefines neural network theory, making finite sample complexity a baseline property, not a differentiator.

Read Original Article

All Feedforward Neural Networks Proven to Have Finite Sample Complexity

Why It Matters

Related Articles

Stay Ahead in AI