A Finite Sample Analysis of the Actor-Critic Algorithm

Zhuoran Yang, Kaiqing Zhang, Mingyi Hong, Tamer Basar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We study the finite-sample performance of batch actor-critic algorithm for reinforcement learning with nonlinear function approximations. Specifically, in the critic step, we estimate the action-value function corresponding to the policy of the actor within some parametrized function class, while in the actor step, the policy is updated using the policy gradient estimated based on the critic, so as to minimize the objective function defined as the expected value of discounted cumulative rewards. Under this setting, for the parameter sequence created by the actor steps, we show that the gradient norm of the objective function at any limit point is close to zero up to some fundamental error. In particular, we show that the error corresponds to the statistical rate of policy evaluation with nonlinear function approximations. For the special class of linear functions and when the number of samples goes to infinity, our result recovers the classical convergence results for the online actor-critic algorithm, which is based on the asymptotic behavior of two-time-scale stochastic approximation.

Original languageEnglish (US)
Title of host publication2018 IEEE Conference on Decision and Control, CDC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2759-2764
Number of pages6
ISBN (Electronic)9781538613955
DOIs
StatePublished - Jan 18 2019
Event57th IEEE Conference on Decision and Control, CDC 2018 - Miami, United States
Duration: Dec 17 2018Dec 19 2018

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2018-December
ISSN (Print)0743-1546

Conference

Conference57th IEEE Conference on Decision and Control, CDC 2018
CountryUnited States
CityMiami
Period12/17/1812/19/18

Bibliographical note

Funding Information:
supported in part by NSF grant CMMI-1727757.

Funding Information:
Z. Yang is with the Dept. of Operations Research and Financial Engineering, Princeton University (zy6@princeton.edu). M. Hong is with the Dept. of Electrical and Computer Engineering, University of Minnesota (mhong@umn.edu). K. Zhang and T. Bas¸ar are with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign ({kzhang66, basar1}@illinois.edu). K. Zhang and T. Bas¸ar were supported in part by US Army Research Office (ARO) Grant W911NF-16-1-0485, and in part by Office of Naval Research (ONR) MURI Grant N00014-16-1-2710. M. Hong was supported in part by NSF grant CMMI-1727757.

Fingerprint Dive into the research topics of 'A Finite Sample Analysis of the Actor-Critic Algorithm'. Together they form a unique fingerprint.

Cite this