A FAMILY OF s-RECTANGULAR ROBUST MDPs: RELATIVE CONSERVATIVENESS, ASYMPTOTIC ANALYSES, AND FINITE-SAMPLE PROPERTIES

Sivaramakrishnan Ramani, Archis Ghate

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce a family of s-rectangular robust Markov decision processes (s-RMDPs) indexed with \rho ∊ [1, ∞]. In each state, the ambiguity set of transition probability mass functions (pmfs) across actions equals a sublevel set of the \ell\rho-norm of a vector of distances from reference pmfs. Setting \rho = ∞ recovers (s, a)-RMDPs. For any s-RMDP from this family, there is an (s, a)RMDP whose robust optimal value is at least as good, and vice versa. This occurs because s- and (s, a)-RMDPs can employ different ambiguity set radii, casting doubt on the belief that (s, a)RMDPs are more conservative than s-RMDPs. If the distance is lower semicontinuous and convex, then, for any s-RMDP, there exists an (s, a)-RMDP with an identical robust optimal value. We also study data-driven s-RMDPs, where the reference pmf is constructed from state transition samples. If the distance satisfies a Pinsker-type inequality, the robust optimal and out-of-sample values both converge with sample-size to the true optimal. We derive rates of convergence and sample complexity when the distance satisfies a concentration inequality. Under this concentration inequality, the robust optimal value provides a probabilistic lower bound on the out-of-sample value. An artifact of the analyses behind these guarantees is the surprising conclusion that (s, a)-RMDPs might be the least conservative among all s-RMDPs within our family. The asymptotic and finite sample properties also extend for a class of nonrectangular RMDPs.

Original languageEnglish (US)
Pages (from-to)1540-1568
Number of pages29
JournalSIAM Journal on Optimization
Volume34
Issue number2
DOIs
StatePublished - Jun 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 Society for Industrial and Applied Mathematics.

Keywords

  • distributionally robust optimization
  • dynamic programming
  • probabilistic performance guarantees
  • sample complexity
  • value convergence

Fingerprint

Dive into the research topics of 'A FAMILY OF s-RECTANGULAR ROBUST MDPs: RELATIVE CONSERVATIVENESS, ASYMPTOTIC ANALYSES, AND FINITE-SAMPLE PROPERTIES'. Together they form a unique fingerprint.

Cite this