MuteBench reveals which multimodal AI models survive sensor failures
New benchmark tests 6 fusion architectures on 125K clinical samples across two failure modes.
MuteBench evaluates how multimodal fusion AI handles sensor failures using 9 datasets from 7 clinical domains, 6 fusion architectures, and two failure modes: missing a whole modality or losing a contiguous time segment within a modality. The paper finds that architecture family is the strongest predictor of robustness, outweighing parameter count. Channel-independent models tolerate modality missing well but can be sensitive to within-modality missing, especially on short sequences. Curriculum modality dropout protects reliably only up to the maximum dropout rate used in training. Channel count, sequence length, and modality alignment jointly determine which failure mode poses the greater threat. A PTB-XL case study suggests diffusion-based imputation can improve downstream classification under within-modality missing, with larger gains for models whose expert routing is most sensitive to corrupted inputs, though broader validation remains open.
- Benchmark evaluates 6 fusion architectures on 9 clinical datasets with 125K samples across two failure modes.
- Architecture family is the strongest predictor of robustness, outweighing model parameter count.
- Diffusion-based imputation improved downstream classification for within-modality missing data on PTB-XL ECG data.
Why It Matters
Helps engineers choose and design AI that stays reliable when medical sensors drop data mid-use.