Researchers use YOLOv11 on vibration spectrograms to detect bearing faults with 99.5% mAP
Computer vision meets industrial fault detection – YOLO models now analyze time-frequency images with near-perfect accuracy.
A team led by Po-Heng Chou has published a new method for bearing fault monitoring that treats vibration signals as images for object detection. The framework first applies continuous wavelet transform (CWT) to convert raw vibration data into spectrograms, improving the visibility of weak, non-stationary fault signatures. Then, YOLOv9, YOLOv10, and YOLOv11 are used to detect and identify localized fault-related energy regions in the time-frequency domain. Experiments on three benchmark datasets (CWRU, PU, IMS) show the approach achieves mean average precision (mAP) of 99.4%, 97.8%, and 99.5% respectively, outperforming conventional time-series models, modern vision backbones, and STFT-based representations.
Beyond raw accuracy, the localized region detection framework offers a more interpretable relationship between time-frequency energy distributions and characteristic bearing fault frequencies. This makes it particularly suitable for real-world industrial settings where noise and signal variability often challenge traditional methods. The work was submitted to IEEE Sensors Letters and is available on arXiv (2509.03070). By applying computer vision techniques to classical signal processing problems, the researchers demonstrate a generalizable approach that could extend to other predictive maintenance applications, such as gearbox or motor monitoring.
- Framework converts vibration signals to CWT spectrograms, then uses YOLOv9/10/11 to detect fault-related energy regions in time-frequency domain.
- Achieves 99.5% mAP on IMS dataset and 99.4% on CWRU, outperforming STFT-based and conventional time-series baselines.
- Provides interpretable links between detected regions and characteristic bearing fault frequencies, aiding diagnostics in noisy industrial environments.
Why It Matters
Enables more accurate, interpretable predictive maintenance for industrial machinery using off-the-shelf object detection models.