Convex synthesis of randomized policies for controlled Markov chains with density safety upper bound constraints

Mahmoud El Chamie, Yue Yu, Behcet Acikmese

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

The main objective of this paper is to synthesize optimal decision-making policies for a finite-horizon Markov Decision Process (MDP) while satisfying a safety constraint that imposes an upper bound on the state probability density function (pdf) of the underlying Markov Chain (MC) for all time steps. The classical approach based on state-action frequencies for constrained MDPs yields decision policies that provide safety constraint satisfaction for stationary distributions (i.e., asymptotically), but not necessarily providing safety during the transient regime. This paper introduces a new synthesis method for randomized Markovian policies for finite-horizon MDPs, where the safety constraint satisfaction is guaranteed for both the transient and the stationary distributions independent from initial state (i.e., providing safe policies for the worst-case analysis). An efficient Linear Programming (LP) based synthesis algorithm is proposed, which produces a convex set of feasible policies and ensures that the expected total reward is above a computable lower-bound. A simulation example of a swarm of autonomous agents is also presented to demonstrate the practical importance of having safe policies.

Original languageEnglish (US)
Title of host publication2016 American Control Conference, ACC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6290-6295
Number of pages6
ISBN (Electronic)9781467386821
DOIs
StatePublished - Jul 28 2016
Externally publishedYes
Event2016 American Control Conference, ACC 2016 - Boston, United States
Duration: Jul 6 2016Jul 8 2016

Publication series

NameProceedings of the American Control Conference
Volume2016-July
ISSN (Print)0743-1619

Other

Other2016 American Control Conference, ACC 2016
Country/TerritoryUnited States
CityBoston
Period7/6/167/8/16

Bibliographical note

Publisher Copyright:
© 2016 American Automatic Control Council (AACC).

Fingerprint

Dive into the research topics of 'Convex synthesis of randomized policies for controlled Markov chains with density safety upper bound constraints'. Together they form a unique fingerprint.

Cite this