Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, Zaiyue Yang

Research output: Contribution to journalConference articlepeer-review

26 Scopus citations

Abstract

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in various engineering applications, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an 'additional' projection step to control the 'gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a neighborhood of the optimum. The resultant error bounds are the first of its type-in the sense that they hold under the most practical assumptions - which is made possible by means of a novel multi-step Lyapunov analysis.

Original languageEnglish (US)
Pages (from-to)4485-4495
Number of pages11
JournalProceedings of Machine Learning Research
Volume108
StatePublished - 2020
Event23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online
Duration: Aug 26 2020Aug 28 2020

Bibliographical note

Publisher Copyright:
Copyright © 2020 by the author(s)

Fingerprint

Dive into the research topics of 'Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation'. Together they form a unique fingerprint.

Cite this