We consider a class of popular distributed non-convex optimization problems, in which agents connected by a network G collectively optimize a sum of smooth (possibly non-convex) local objective functions. We address the following question: if the agents can only access the gradients of local functions, what are the fastest rates that any distributed algorithms can achieve, and how to achieve those rates. First, we show that there exist difficult problem instances, such that it takes a class of distributed first-order methods at least O(1/√ϵ (G) × L /ϵ) communication rounds to achieve certain ϵ-solution [where ϵ (G) denotes the spectral gap of the graph Laplacian matrix, and L is some Lipschitz constant]. Second, we propose (near) optimal methods whose rates match the developed lower rate bound (up to a ploylog factor). The key in the algorithm design is to properly embed the classical polynomial filtering techniques into modern first-order algorithms. To the best of our knowledge, this is the first time that lower rate bounds and optimal methods have been developed for distributed non-convex optimization problems.
Bibliographical noteFunding Information:
Manuscript received October 21, 2018; revised May 28, 2019 and July 21, 2019; accepted July 23, 2019. Date of publication September 23, 2019; date of current version November 5, 2019. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Marco Moretti. This work was supported in part by National Science Foundation under Grants CMMI-1727757 and CCF-1526078 and in part by an AFOSR under Grant 15RT0767. This paper was presented in part at the Asilomar Conference on Signal, Systems and Computer, Pacific Grove, CA, November 2019 . (Corresponding author: Mingyi Hong.) The authors are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55414 USA (e-mail: email@example.com; firstname.lastname@example.org).
- Lower complexity bound
- Non-convex distributed optimization
- Optimal convergence rate