Genetic programming evolves heterogeneous scalar functions for each ViT layer, directly from pre-trained weights without retraining?

Genetic programming evolves heterogeneous scalar functions for each ViT layer, directly from pre-trained weights without retraining

Captures 91.6% of normalization variance (R²) vs 70.2% for homogeneous approximations?

Captures 91.6% of normalization variance (R²) vs 70.2% for homogeneous approximations

Recovers 84.25% Top-1 ImageNet-1K accuracy in only 20 epochs, eliminating the global reduction bottleneck?

Recovers 84.25% Top-1 ImageNet-1K accuracy in only 20 epochs, eliminating the global reduction bottleneck

Research & Papers

Genetic programming makes Vision Transformers 91.6% accurate on edge without retraining

arXiv cs.CV May 15, 2026

⚡No retraining needed: layer-specific scalar functions replace normalization in ViTs

Deep Dive

Deploying Vision Transformers (ViTs) on edge devices has been hampered by the computational cost of layer normalization, which creates a global reduction bottleneck. Recent work replaced normalization with homogeneous scalar approximations, but those poorly fit different layers and required expensive retraining. In a new paper, Kieran Carrigg and colleagues propose a genetic programming (GP) framework that evolves layer-specific scalar functions directly from pre-trained weights. Their post-training re-alignment strategy adapts each layer individually, eliminating the need for full model retraining.

Results show the evolved expressions capture 91.6% of the target normalization variance (R²) versus just 70.2% for one-size-fits-all baselines. The modified ViT recovers 84.25% Top-1 accuracy on ImageNet-1K in only 20 epochs—preserving performance while removing the global reduction bottleneck. This creates a favorable trade-off between arithmetic complexity and off-chip memory traffic, removing a key barrier to efficient ViT inference on edge accelerators like mobile GPUs and FPGAs.

Key Points

Genetic programming evolves heterogeneous scalar functions for each ViT layer, directly from pre-trained weights without retraining
Captures 91.6% of normalization variance (R²) vs 70.2% for homogeneous approximations
Recovers 84.25% Top-1 ImageNet-1K accuracy in only 20 epochs, eliminating the global reduction bottleneck

Why It Matters

Enables Vision Transformers to run efficiently on edge accelerators, unlocking real-time computer vision on mobile and IoT devices.

Read Original Article

Genetic programming makes Vision Transformers 91.6% accurate on edge without retraining

Why It Matters

Related Articles

🚀 Stay Ahead in AI