Research & Papers

Ask the Expert: Collaborative Inference for Vision Transformers with Near-Edge Accelerators

This breakthrough makes powerful Vision Transformers viable on phones and IoT devices.

Deep Dive

Researchers have developed a novel collaborative inference framework for Vision Transformers (ViTs) on edge devices. It uses a lightweight ViT on-device paired with specialized 'expert' models on a nearby accelerator. A routing mechanism sends only low-confidence samples to the experts. On CIFAR-100, the method boosts expert accuracy by 4.12% on target subsets and overall accuracy by 2.76%. Crucially, it cuts latency by up to 45% versus edge-only execution and reduces energy consumption by 46% compared to full offload.

Why It Matters

This enables complex AI vision applications on resource-constrained devices, from smartphones to drones, without sacrificing performance.