## Abstract

We introduce a family of s-rectangular robust Markov decision processes (s-RMDPs) indexed with \rho ∊ [1, ∞]. In each state, the ambiguity set of transition probability mass functions (pmfs) across actions equals a sublevel set of the \ell\rho-norm of a vector of distances from reference pmfs. Setting \rho = ∞ recovers (s, a)-RMDPs. For any s-RMDP from this family, there is an (s, a)RMDP whose robust optimal value is at least as good, and vice versa. This occurs because s- and (s, a)-RMDPs can employ different ambiguity set radii, casting doubt on the belief that (s, a)RMDPs are more conservative than s-RMDPs. If the distance is lower semicontinuous and convex, then, for any s-RMDP, there exists an (s, a)-RMDP with an identical robust optimal value. We also study data-driven s-RMDPs, where the reference pmf is constructed from state transition samples. If the distance satisfies a Pinsker-type inequality, the robust optimal and out-of-sample values both converge with sample-size to the true optimal. We derive rates of convergence and sample complexity when the distance satisfies a concentration inequality. Under this concentration inequality, the robust optimal value provides a probabilistic lower bound on the out-of-sample value. An artifact of the analyses behind these guarantees is the surprising conclusion that (s, a)-RMDPs might be the least conservative among all s-RMDPs within our family. The asymptotic and finite sample properties also extend for a class of nonrectangular RMDPs.

Original language | English (US) |
---|---|

Pages (from-to) | 1540-1568 |

Number of pages | 29 |

Journal | SIAM Journal on Optimization |

Volume | 34 |

Issue number | 2 |

DOIs | |

State | Published - Jun 2024 |

Externally published | Yes |

### Bibliographical note

Publisher Copyright:© 2024 Society for Industrial and Applied Mathematics.

## Keywords

- distributionally robust optimization
- dynamic programming
- probabilistic performance guarantees
- sample complexity
- value convergence