On-Policy Reinforcement Learning via Ensemble Gaussian Processes with Application to Resource Allocation

Konstantinos Polyzos, Qin Lu, Alireza Sadeghi, Georgios B. Giannakis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Reinforcement learning (RL) is an interactive decisionmaking tool with well documented merits for resource allocation tasks in uncertain environments, such as those emerging with Internet-of-Things. While they can attain state-of-the-art performance in several application domains, RL using deep neural networks can be less attractive when the training datasets involved are prohibitively large. Aiming at sample efficiency, this contribution adopts nonparametric value function models using Gaussian processes (GPs). Relying on the temporal-difference update rule, a novel GP-SARSA approach is developed, where the action selection is guided by Thompson sampling to balance exploration and exploitation. Targeting also computational scalability, the advocated approach leverages random features that replace GP-SARSA's nonparametric function learning with a parametric approximate model. Adaptation to unknown dynamics is accomplished through an ensemble (E) of GP-SARSA learners, whose weights are updated in a data-driven fashion. Performance of the proposed (E)GP-SARSA is evaluated on a practical resource allocation problem.

Original languageEnglish (US)
Title of host publication55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021
EditorsMichael B. Matthews
PublisherIEEE Computer Society
Pages1018-1022
Number of pages5
ISBN (Electronic)9781665458283
DOIs
StatePublished - 2021
Event55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021 - Virtual, Pacific Grove, United States
Duration: Oct 31 2021Nov 3 2021

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
Volume2021-October
ISSN (Print)1058-6393

Conference

Conference55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021
Country/TerritoryUnited States
CityVirtual, Pacific Grove
Period10/31/2111/3/21

Bibliographical note

Funding Information:
This work was supported in part by ARO grant W911NF2110297, and NSF grants 2126052 and 1901134. The work of Konstantinos D. Polyzos was also supported by the Onassis Foundation Scholarship.

Publisher Copyright:
© 2021 IEEE.

Keywords

  • Gaussian process
  • Reinforcement learning
  • Thompson sampling
  • resource allocation

Fingerprint

Dive into the research topics of 'On-Policy Reinforcement Learning via Ensemble Gaussian Processes with Application to Resource Allocation'. Together they form a unique fingerprint.

Cite this