Research & Papers

Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

arXiv cs.NE April 24, 2026

⚡New activation family outperforms GELU on GPT-2, ResNet-56, and BERT-small.

Deep Dive

Eylon Krause's new paper introduces Geometric Monomial (GEM), a family of activation functions that achieve ReLU-like performance with purely rational arithmetic and C^{2N} smoothness. The family includes three variants: GEM (base), E-GEM (epsilon-parameterized for arbitrary L^p approximation of ReLU), and SE-GEM (piecewise variant eliminating dead neurons). An N-ablation study found N=1 optimal for standard-depth networks, reducing the GELU deficit on CIFAR-100 + ResNet-56 from 6.10% to 2.12%. The smoothness parameter reveals a CNN-transformer tradeoff: N=1 for deep CNNs, N=2 for transformers.

On benchmarks, GEM variants consistently beat GELU. SE-GEM (epsilon=10^{-4}) surpassed GELU on CIFAR-10 + ResNet-56 (92.51% vs 92.44%). On GPT-2 (124M), GEM achieved the lowest perplexity (72.57 vs 73.76 for GELU). E-GEM (epsilon=10) achieved the best validation loss on BERT-small (6.656). The epsilon parameterization reveals a scale-dependent optimum: small epsilon for deep CNNs and large transformers, large epsilon for small transformers like BERT-small due to limited depth and unconstrained gradients.

Key Points

GEM family uses rational arithmetic for C^{2N} smoothness, outperforming GELU on GPT-2 (72.57 vs 73.76 perplexity).
SE-GEM achieves 92.51% accuracy on CIFAR-10 + ResNet-56, beating GELU's 92.44%.
Optimal smoothness N=1 for CNNs, N=2 for transformers; epsilon parameterization enables scale-dependent tuning.

Why It Matters

GEM offers a drop-in activation upgrade that boosts accuracy across CNNs and transformers with purely rational math.

Read Original Article

Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

Why It Matters

Stay Ahead in AI