Research & Papers

FusionSense cuts edge AI energy 33x with smart multimodal sensing

New near-sensor framework slashes data transmission while boosting accuracy

Deep Dive

Autonomous systems and smart industrial deployments increasingly split computation across near-sensor, edge, and cloud resources, but tight energy, latency, and reliability budgets demand runtime adaptivity. As multimodal sensor suites (cameras, LiDAR, depth) proliferate, most prior approaches either fuse modalities on powerful servers or apply uni-modal near-sensor filters that ignore cross-modal dependencies, leading to redundant transmissions or missed events. FusionSense, presented by Sanggeon Yun et al. and accepted to ISLPED 2026, solves this with a three-step procedure: (i) a server-side fusion model first learns the downstream task, (ii) filter-out-safe (FoS) labels quantify each modality's necessity relative to the fused decision, and (iii) an edge-side fusion model is compacted by injecting near-sensor predictions as auxiliary signals. The result is a run-time decision layer that jointly reduces compute and communication while scaling linearly with sensor count.

On a dual-modality (RGB+Depth/LiDAR) setup with SynDrone, FusionSense sustains task quality at substantially higher data-reduction rates than uni-modal filters. It delivers large end-to-end gains: up to 33x lower energy at 1% prevalence of events of interest (FoI), 11x at 10%, a 92.3% reduction in quality loss at a fixed 30% data reduction, and roughly 1.5x higher energy savings than the best prior filtering baseline. This means drones, robots, and smart cameras can run multiple sensors continuously while preserving battery life and maintaining detection accuracy—critical for applications like surveillance, autonomous navigation, and industrial inspection where cloud connectivity is unreliable or costly.

Key Points
  • Introduces filter-out-safe (FoS) labels to dynamically assess each sensor modality's importance for fusion decisions
  • Achieves up to 33x lower energy consumption at 1% event prevalence and 11x at 10% on SynDrone benchmark
  • Reduces quality loss by 92.3% at a fixed 30% data reduction, with 1.5x better energy savings than prior baselines

Why It Matters

Enables real-time multimodal AI on energy-constrained edge devices without sacrificing accuracy or latency.