Research & Papers

ECG-Lens: Benchmarking ML & DL Models on PTB-XL Dataset

A new deep learning benchmark shows complex CNNs outperform traditional ML by 80% on raw ECG data.

Deep Dive

A team of researchers including Saloni Garg and Ukant Jadia has published a comprehensive benchmark study, 'ECG-Lens,' comparing the performance of traditional machine learning and deep learning models on the critical task of automated electrocardiogram (ECG) classification. The study, published on arXiv, rigorously tested three ML algorithms—Decision Tree, Random Forest, and Logistic Regression—against three DL architectures: a simple CNN, an LSTM network, and their novel Complex CNN dubbed 'ECG-Lens.' All models were evaluated on the PTB-XL dataset, which contains 12-lead recordings from both normal patients and those with various cardiac conditions, using metrics like accuracy, precision, recall, F1-score, and ROC-AUC.

The deep learning models were trained directly on raw ECG signals, allowing them to automatically extract discriminative features without manual engineering. To enhance performance and dataset diversity, the researchers applied data augmentation using the Stationary Wavelet Transform (SWT), a technique that preserves the essential characteristics of the physiological signals. The results were decisive: the ECG-Lens model substantially outperformed all others, achieving a top classification accuracy of 80% and an impressive ROC-AUC score of 90%.

This benchmark demonstrates a clear performance gap, with complex deep learning architectures like CNNs proving far more effective than traditional ML methods for processing raw, multi-lead biomedical time-series data. The study's findings provide a concrete, practical foundation for developers and clinicians to select and improve automated diagnostic tools, directly guiding future condition-specific model development for cardiovascular disease monitoring and diagnosis.

Key Points
  • ECG-Lens, a complex CNN, achieved 80% accuracy and 90% ROC-AUC on the PTB-XL dataset.
  • The benchmark pitted 3 DL models against 3 traditional ML algorithms, with DL models trained on raw 12-lead ECG signals.
  • Data augmentation using Stationary Wavelet Transform (SWT) was used to enhance model performance and training sample diversity.

Why It Matters

Provides a concrete benchmark for developing more accurate, automated AI tools to diagnose heart conditions from ECG data, potentially improving healthcare outcomes.