VeloxNet: Efficient Spatial Gating for Lightweight Embedded Image Classification
New lightweight CNN architecture outperforms MobileNet and EfficientNet on aerial disaster datasets with 399K parameters.
A research team led by Md Meftahul Ferdaus from the University of New Orleans has developed VeloxNet, a novel convolutional neural network architecture designed specifically for resource-constrained embedded vision applications. The core innovation replaces the traditional 'fire modules' in SqueezeNet with gated multi-layer perceptron (gMLP) blocks. Each block contains a spatial gating unit (SGU) that applies learned spatial projections and multiplicative gating, enabling the network to capture global spatial dependencies across an entire feature map in a single layer. This contrasts with standard convolutions, which are limited to local receptive fields defined by small kernel sizes.
VeloxNet was rigorously evaluated against eleven established baselines, including MobileNet variants, ShuffleNet, EfficientNet, and recent vision transformers, on three critical aerial image datasets: AIDER for emergency response, CDD for comprehensive disaster assessment, and LDD for levee defect detection. The results are striking: VeloxNet achieved a 46.1% reduction in parameters compared to SqueezeNet (down to just 399,366) while simultaneously boosting weighted F1 scores by 6.32% on AIDER, 30.83% on CDD, and 2.51% on LDD. This demonstrates a clear trade-off breakthrough where model efficiency and accuracy improve concurrently.
The architecture's design directly addresses the pressing need for deployable AI in field operations like aerial disaster monitoring and infrastructure inspection, where devices have strict limits on power, memory, and latency. By providing global context modeling with fewer parameters, VeloxNet enables more intelligent on-device processing without relying on cloud connectivity. The team has committed to releasing the source code publicly upon the paper's acceptance, paving the way for integration into real-world embedded vision systems.
- Reduces model parameters by 46.1% vs. SqueezeNet, down to 399,366 total parameters.
- Improves classification accuracy (F1 score) by up to 30.83% on the Comprehensive Disaster Dataset (CDD).
- Uses Spatial Gating Units (SGUs) for global spatial modeling, replacing local convolutional fire modules.
Why It Matters
Enables more accurate, real-time image analysis on drones and edge devices for critical infrastructure and disaster response.