DUAL ASCENT and PRIMAL-DUAL ALGORITHMS for INFINITE-HORIZON NONSTATIONARY MARKOV DECISION PROCESSES

Research output: Contribution to journalArticlepeer-review

Abstract

Infinite-horizon nonstationary Markov decision processes (MDPs) extend their stationary counterparts by allowing temporal variations in immediate costs and transition probabilities. Bellman's characterization of optimality and equivalent primal-dual linear programming formulations for these MDPs include a countably infinite number of variables and equations. Simple policy iteration, also viewed as a primal simplex algorithm, is the state of the art in solving these MDPs. It produces a sequence of policies whose costs-To-go converge monotonically from above to optimal. This suffers from two limitations. A cost-improving policy update is computationally expensive and an optimality gap is missing. We propose two dual-based approaches to address these concerns. The first, called dual ascent, maintains approximate costs-To-go (dual variables) and corresponding nonnegative errors in Bellman's equations. The dual variables are iteratively increased such that errors vanish asymptotically. This guarantees that dual variables converge monotonically from below to optimal. This has two limitations. It does not maintain a sequence of policies (primal variables). Hence, it does not provide a decision-making strategy at termination and does not offer an upper bound on the optimal costs-To-go. The second approach, termed the primal-dual method, addresses these limitations. It maintains a primal policy, dual approximations of its costs-To-go, the corresponding nonegative Bellman's errors, and inherits monotonic dual value convergence. The key is a so-called rebalancing step, which leads to a duality gap-based stopping criterion and also primal value convergence. Computational experiments demonstrate the benefits of primal-dual over dual ascent and that primal-dual is orders of magnitude faster than simple policy iteration.

Original languageEnglish (US)
Pages (from-to)1391-1415
Number of pages25
JournalSIAM Journal on Optimization
Volume33
Issue number3
DOIs
StatePublished - 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 Society for Industrial and Applied Mathematics Publications. All rights reserved.

Keywords

  • Bellman's equations
  • dynamic programming
  • value convergence

Fingerprint

Dive into the research topics of 'DUAL ASCENT and PRIMAL-DUAL ALGORITHMS for INFINITE-HORIZON NONSTATIONARY MARKOV DECISION PROCESSES'. Together they form a unique fingerprint.

Cite this