Multicomposite nonconvex optimization for training deep neural networks

Ying Cui, Ziyu He, Jong Shi Pang

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. Allowing for arbitrary finite numbers of input samples and training layers, an arbitrary number of neurons within each layer, and arbitrary piecewise activation functions, the proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection with enhancements, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. While a routine implementation of the semismooth Newton method would be computationally expensive, we show that careful linear algebraic implementation helps to greatly reduce the computational and storage costs for problems of arbitrary dimensions. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results from a MATLAB implementation demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).

Original languageEnglish (US)
Pages (from-to)1693-1723
Number of pages31
JournalSIAM Journal on Optimization
Issue number2
StatePublished - 2020

Bibliographical note

Publisher Copyright:
© 2020 Society for Industrial and Applied Mathematics.


  • Deep neural network
  • Exact penalization
  • Majorization-minimization
  • Nonconvexity
  • Nondifferentiablity
  • Semismooth Newton method


Dive into the research topics of 'Multicomposite nonconvex optimization for training deep neural networks'. Together they form a unique fingerprint.

Cite this