Abstract
Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.
Original language | English (US) |
---|---|
Title of host publication | 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798350328141 |
DOIs | |
State | Published - 2023 |
Event | 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 - Monticello, United States Duration: Sep 26 2023 → Sep 29 2023 |
Publication series
Name | 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 |
---|
Conference
Conference | 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 |
---|---|
Country/Territory | United States |
City | Monticello |
Period | 9/26/23 → 9/29/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.