Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications. Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter per iteration PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov’s momentum, can further tighten this PD error bound. Numerical results demonstrate the usefulness of heavy ball momentum in FW iterations.
|Original language||English (US)|
|Title of host publication||Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021|
|Editors||Marc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan|
|Publisher||Neural information processing systems foundation|
|Number of pages||12|
|State||Published - 2021|
|Event||35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online|
Duration: Dec 6 2021 → Dec 14 2021
|Name||Advances in Neural Information Processing Systems|
|Conference||35th Conference on Neural Information Processing Systems, NeurIPS 2021|
|Period||12/6/21 → 12/14/21|
Bibliographical noteFunding Information:
This work is supported by NSF grants 1901134, 2126052, and 2128593. The authors would also like to thank the anonymous reviewers for their feedback.
© 2021 Neural information processing systems foundation. All rights reserved.