Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost

Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang

Research output: Contribution to journalConference articlepeer-review

65 Scopus citations

Abstract

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume32
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: Dec 8 2019Dec 14 2019

Bibliographical note

Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost'. Together they form a unique fingerprint.

Cite this