Research & Papers

Deep Learning using Rectified Linear Units (ReLU)

A 2026 revision to a 2018 paper sets the record straight on ReLU's origins and performance.

Deep Dive

A 2026 revision to a foundational 2018 paper by Abien Fred Agarap serves a dual purpose: correcting the historical record and providing robust empirical validation for the Rectified Linear Unit (ReLU). The author formally addresses a common citation error, clarifying that his original 2018 work investigated ReLU only at the classification layer, while the definitive integration of piecewise linear functions into deep learning architectures was established earlier by Nair and Hinton in 2010. This historical rectification is crucial for accurate academic lineage in a field that evolves rapidly.

Alongside the citation fix, the paper delivers a rigorous, statistically sound performance analysis. The study compares ReLU against traditional saturating functions—Hyperbolic Tangent (Tanh) and Logistic (Sigmoid)—across image classification, text classification, and image reconstruction tasks. To ensure robustness, the evaluation used 10 independent randomized trials and assessed significance with the non-parametric Kruskal-Wallis H test. The results are stark: Sigmoid failed to converge in deep convolutional vision tasks, yielding accuracies equivalent to random chance due to the vanishing gradient problem. In contrast, ReLU and Tanh showed stable convergence, with ReLU achieving superior mean accuracy and F1-scores on classification tasks.

The findings reaffirm the theoretical superiority of non-saturating activation functions for training deep neural networks. While Tanh performed best for image reconstruction (measured by peak signal-to-noise ratio), ReLU's overall dominance in classification underscores its role as a cornerstone of modern deep learning. The paper concludes that there is a statistically significant performance variance among activation functions, solidifying the practical necessity of choices like ReLU in contemporary AI architecture design.

Key Points
  • Corrects widespread misattribution, tracing ReLU's deep learning integration to Nair & Hinton (2010), not the 2018 paper.
  • Empirical comparison using 10 randomized trials shows ReLU achieved highest accuracy/F1-score; Sigmoid failed completely in deep vision tasks.
  • Uses the Kruskal-Wallis H test to confirm statistically significant performance differences, validating theory on non-saturating functions.

Why It Matters

Ensures accurate AI history and provides robust, statistical proof for fundamental architectural choices that underpin modern neural networks.