RMSPROP CONVERGES WITH PROPER HYPER-PARAMETER

Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun

Research output: Contribution to conferencePaperpeer-review

8 Scopus citations

Abstract

Despite the existence of divergence examples, RMSprop remains one of the most popular algorithms in machine learning. Towards closing the gap between theory and practice, we prove that RMSprop converges with proper choice of hyper-parameters under certain conditions. More specifically, we prove that when the hyper-parameter β2 is close enough to 1, RMSprop and its random shuffling version converge to a bounded region in general, and to critical points in the interpolation regime. It is worth mentioning that our results do not depend on “bounded gradient" assumption, which is often the key assumption utilized by existing theoretical work for Adam-type adaptive gradient method. Removing this assumption allows us to establish a phase transition from divergence to non-divergence for RMSprop. Finally, based on our theory, we conjecture that in practice there is a critical threshold β2, such that RMSprop generates reasonably good results only if 1 > β2 ≥ β2. We provide empirical evidence for such a phase transition in our numerical experiments.

Original languageEnglish (US)
StatePublished - 2021
Event9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online
Duration: May 3 2021May 7 2021

Conference

Conference9th International Conference on Learning Representations, ICLR 2021
CityVirtual, Online
Period5/3/215/7/21

Bibliographical note

Funding Information:
M. Hong is supported by NSF grant CMMI-1727757. Ruichen Li from Peking University helped check some proof of Theorem 4.3. We thank all anonymous reviewers for their feedback. We also want to thank Eduard Gorbunov and Juntang Zhuang for pointing out some mistakes on openreview in the earlier versions.

Publisher Copyright:
© 2021 ICLR 2021 - 9th International Conference on Learning Representations. All rights reserved.

Fingerprint

Dive into the research topics of 'RMSPROP CONVERGES WITH PROPER HYPER-PARAMETER'. Together they form a unique fingerprint.

Cite this