A linear programming approach to nonstationary infinite-horizon Markov decision processes

Archis Ghate, Robert L. Smith

Research output: Contribution to journalArticlepeer-review

33 Scopus citations

Abstract

Nonstationary infinite-horizon Markov decision processes (MDPs) generalize the most well-studied class of sequential decision models in operations research, namely, that of stationary MDPs, by relaxing the restrictive assumption that problem data do not change over time. Linear programming (LP) has been very successful in obtaining structural insights and devising solution methods for stationary MDPs. However, an LP approach for nonstationary MDPs is currently missing. This is because the LP formulation of a nonstationary infinite-horizon MDP includes countably infinite variables and constraints, and research on such infinite-dimensional LPs has traditionally faced several hurdles. For instance, duality results may not hold; an extreme point may not be a basic feasible solution; and in the context of a simplex algorithm, a pivot operation may require infinite data and computations, and a sequence of improving extreme points need not converge in value to optimal. In this paper, we tackle these challenges and establish (1) weak and strong duality, (2) complementary slackness, (3) a basic feasible solution characterization of extreme points, (4) a one-to-one correspondence between extreme points and deterministic Markovian policies, and (5) we devise a simplex algorithm for an infinite-dimensional LP formulation of nonstationary infinite-horizon MDPs. Pivots in this simplex algorithm use finite data, perform finite computations, and generate a sequence of improving extreme points that converges in value to optimal. Moreover, this sequence of extreme points gets arbitrarily close to the set of optimal extreme points. We also prove that decisions prescribed by these extreme points are eventually exactly optimal in all states of the nonstationary infinite-horizon MDP in early periods.

Original languageEnglish (US)
Pages (from-to)413-425
Number of pages13
JournalOperations research
Volume61
Issue number2
DOIs
StatePublished - Mar 2013
Externally publishedYes

Keywords

  • Dynamic programming
  • Infinite linear programs
  • Simplex algorithm

Fingerprint

Dive into the research topics of 'A linear programming approach to nonstationary infinite-horizon Markov decision processes'. Together they form a unique fingerprint.

Cite this