NVIDIA Jetson Orin Nano: Host inference 2.1x faster than on-sensor MLC
Wire-level latency measurements reveal I2C bottleneck, not silicon, is the culprit.
A new pre-registered measurement study challenges the conventional wisdom that on-sensor machine learning cores (MLC) always deliver lower latency than host-side inference. Researchers Akul Swami and Dnyaneshwar Sonawane used a Saleae Logic Pro 8 logic analyzer to capture wire-level interrupt-to-decision latency on the NVIDIA Jetson Orin Nano running with an STMicroelectronics LSM6DSOX IMU. They tested three pipelines—a host-side decision-tree classifier, the standard MLC bank-switch read protocol, and an MLC binary-fast variant—under idle, I2C bus contention, and CPU stress conditions. The protocol was rigorously pre-registered with 12 externally-timestamped Zenodo amendments before confirmatory data collection (4,770 of 4,860 trials included, 98.15% acceptance).
Results show host inference consistently outperforms the on-sensor MLC: median latency of 321.7 μs versus 681.5 μs at idle (2.1x faster) and 574.5 μs versus 1,325.4 μs under I2C contention (2.3x faster). The dominant latency contributor is not the silicon’s classification speed but the three-transaction I2C read protocol required to pull MLC decisions off the sensor. The team also characterized a reproducible 706.5 ms MLC decision cadence that bounds full stimulus-to-decision latency. For edge AI professionals deploying low-latency sensor fusion pipelines, this study provides critical evidence that host-side inference can actually be faster—provided the I2C bus overhead is accounted for. Full code, data, and pre-registration are available on GitHub.
- Host decision-tree classifier posted median latency 321.7 μs vs 681.5 μs for MLC at idle (2.1x faster).
- I2C three-transaction read protocol, not the MLC classification, is the dominant latency bottleneck.
- A reproducible 706.5 ms MLC decision cadence bounds the full stimulus-to-decision delivery time.
Why It Matters
For edge AI developers, this study debunks the assumption that on-sensor ML is always lower latency—I2C overhead can be the real bottleneck.