Abstract
We propose a new algorithm for learning the model parameters of a partially observable Markov decision process (POMDP) based on coupled canonical polyadic decomposition (CPD). Coupled CPD for a set of tensors is an extension to CPD for individual tensors, which has improved identifiability properties, as well as an analogous simultaneous diagonalization (SD) algorithm for uniquely recovering the latent factors efficiently. We explain how to form a set of three-way tensors from the trajectory of a POMDP under a stationary memoryless policy, so that coupled CPD can be applied afterwards to recover the model parameters, with identifiability and computational guarantees.
Original language | English (US) |
---|---|
Title of host publication | 2019 IEEE Data Science Workshop, DSW 2019 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 295-299 |
Number of pages | 5 |
ISBN (Electronic) | 9781728107080 |
DOIs | |
State | Published - Jun 2019 |
Event | 2019 IEEE Data Science Workshop, DSW 2019 - Minneapolis, United States Duration: Jun 2 2019 → Jun 5 2019 |
Publication series
Name | 2019 IEEE Data Science Workshop, DSW 2019 - Proceedings |
---|
Conference
Conference | 2019 IEEE Data Science Workshop, DSW 2019 |
---|---|
Country/Territory | United States |
City | Minneapolis |
Period | 6/2/19 → 6/5/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- coupled CPD
- partially observable Markov decision process
- reinforcement learning
- tensor decomposition