PrimeSVT: Automated pruning cuts memory 26.68% for Spiking Vision Transformers
New framework structurally prunes SViTs without accuracy loss, enabling embedded deployment.
Spiking Vision Transformers (SViTs) promise energy-efficient AI but suffer from large model sizes that hinder deployment on embedded devices. Existing compression methods rely on unstructured pruning, which demands specialized hardware accelerators to exploit irregular sparsity patterns efficiently. Moreover, manual tuning for each network makes the process non-scalable. To address these challenges, a team of researchers from Vienna and Norway introduces PrimeSVT—an automated, memory-aware structured pruning framework for pre-trained SViTs. The framework first analyzes layer sizes (number of parameters) and robustness under different pruning rates, then applies a prioritized compression policy: it prunes layers sequentially from largest to smallest while respecting user-defined constraints on accuracy and memory savings. For each layer, PrimeSVT uses channel-wise filter pruning based on L2-norm values to remove non-significant weights structurally, ensuring compatibility with widely-used computing architectures.
Experimental results demonstrate that PrimeSVT reduces memory usage by 26.68% in a single automated pruning pass, with minimal accuracy degradation. The pruned model achieves 70.3% accuracy without fine-tuning and 72.9% with fine-tuning, compared to the original 73.3% accuracy—a drop well within the acceptable 3% margin. This structured approach eliminates the need for custom hardware accelerators, making SViTs practical for embedded AI applications like edge vision systems. By automating the pruning process, PrimeSVT also reduces design time and enhances scalability, marking a significant step toward deploying spiking neural networks in resource-constrained environments.
- PrimeSVT automates structured pruning for SViTs, removing need for specialized hardware accelerators.
- Using prioritized compression policy, it prunes largest layers first, saving 26.68% memory with minimal accuracy drop.
- Achieves 70.3% accuracy without fine-tuning and 72.9% with fine-tuning, versus original 73.3%.
Why It Matters
Makes Spiking Vision Transformers deployable on resource-constrained devices without custom hardware.