We consider a discrete time Markov Decision Process (MDP) under the discounted payoff criterion in the presence of additional discounted cost constraints. We study the sensitivity of optimal Stationary Randomized (SR) policies in this setting with respect to the upper bound on the discounted cost constraint functionals.We show that such sensitivity analysis leads to an improved version of the Feinberg-Shwartz algorithm (Math Oper Res 21(4):922-945, 1996) for finding optimal policies that are ultimately stationary and deterministic.
- Finite models
- Linear programming
- Randomized policies
- Stationary deterministic policies