Abstract
Reinforcement learning (RL) is an interactive decisionmaking tool with well documented merits for resource allocation tasks in uncertain environments, such as those emerging with Internet-of-Things. While they can attain state-of-the-art performance in several application domains, RL using deep neural networks can be less attractive when the training datasets involved are prohibitively large. Aiming at sample efficiency, this contribution adopts nonparametric value function models using Gaussian processes (GPs). Relying on the temporal-difference update rule, a novel GP-SARSA approach is developed, where the action selection is guided by Thompson sampling to balance exploration and exploitation. Targeting also computational scalability, the advocated approach leverages random features that replace GP-SARSA's nonparametric function learning with a parametric approximate model. Adaptation to unknown dynamics is accomplished through an ensemble (E) of GP-SARSA learners, whose weights are updated in a data-driven fashion. Performance of the proposed (E)GP-SARSA is evaluated on a practical resource allocation problem.
Original language | English (US) |
---|---|
Title of host publication | 55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021 |
Editors | Michael B. Matthews |
Publisher | IEEE Computer Society |
Pages | 1018-1022 |
Number of pages | 5 |
ISBN (Electronic) | 9781665458283 |
DOIs | |
State | Published - 2021 |
Event | 55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021 - Virtual, Pacific Grove, United States Duration: Oct 31 2021 → Nov 3 2021 |
Publication series
Name | Conference Record - Asilomar Conference on Signals, Systems and Computers |
---|---|
Volume | 2021-October |
ISSN (Print) | 1058-6393 |
Conference
Conference | 55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Pacific Grove |
Period | 10/31/21 → 11/3/21 |
Bibliographical note
Funding Information:This work was supported in part by ARO grant W911NF2110297, and NSF grants 2126052 and 1901134. The work of Konstantinos D. Polyzos was also supported by the Onassis Foundation Scholarship.
Publisher Copyright:
© 2021 IEEE.
Keywords
- Gaussian process
- Reinforcement learning
- Thompson sampling
- resource allocation