We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file.
Identical INT8 model accuracy drops from 93% to 71% across five different Snapdragon chipsets.
LocalLLaMA community researchers tested the same INT8-quantized ONNX model across five Snapdragon chipsets, revealing dramatic accuracy differences. The Snapdragon 8 Gen 3 scored 91.8%, while the 4 Gen 2 plummeted to 71.2%, a 22% variance. The cause is hardware-specific: NPU precision handling, operator fusion, and memory-constrained fallbacks to CPU. This highlights a critical gap in AI testing, as cloud-based benchmarks fail to catch these real-world, on-device performance issues.
Why It Matters
Developers must test AI models on actual target hardware, not just cloud GPUs, to ensure consistent performance for end users.