Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization

Ziyi Chen, Yi Zhou, Yingbin Liang, Zhaosong Lu

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Various optimal gradient-based algorithms have been developed for smooth nonconvex optimization. However, many nonconvex machine learning problems do not belong to the class of smooth functions and therefore the existing algorithms are sub-optimal. Instead, these problems have been shown to satisfy certain generalized-smooth conditions, which have not been well understood in the existing literature. In this paper, we propose a notion of α-symmetric generalized-smoothness that extends the existing notions and covers many important functions such as high-order polynomials and exponential functions. We study the fundamental properties and establish descent lemmas for the functions in this class. Then, to solve such a large class of nonconvex problems, we design a special deterministic normalized gradient descent algorithm that achieves the optimal iteration complexity O(ϵ-2), and also prove that the popular SPIDER variance reduction algorithm achieves the optimal sample complexity O(ϵ-3) in the stochastic setting. Our results show that solving generalized-smooth nonconvex problems is as efficient as solving smooth nonconvex problems.

Original languageEnglish (US)
Pages (from-to)4953-5001
Number of pages49
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: Jul 23 2023Jul 29 2023

Bibliographical note

Publisher Copyright:
© 2023 Proceedings of Machine Learning Research. All rights reserved.

Fingerprint

Dive into the research topics of 'Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization'. Together they form a unique fingerprint.

Cite this