Abstract
The main objective of this paper is to synthesize optimal decision-making policies for a finite-horizon Markov Decision Process (MDP) while satisfying a safety constraint that imposes an upper bound on the state probability density function (pdf) of the underlying Markov Chain (MC) for all time steps. The classical approach based on state-action frequencies for constrained MDPs yields decision policies that provide safety constraint satisfaction for stationary distributions (i.e., asymptotically), but not necessarily providing safety during the transient regime. This paper introduces a new synthesis method for randomized Markovian policies for finite-horizon MDPs, where the safety constraint satisfaction is guaranteed for both the transient and the stationary distributions independent from initial state (i.e., providing safe policies for the worst-case analysis). An efficient Linear Programming (LP) based synthesis algorithm is proposed, which produces a convex set of feasible policies and ensures that the expected total reward is above a computable lower-bound. A simulation example of a swarm of autonomous agents is also presented to demonstrate the practical importance of having safe policies.
Original language | English (US) |
---|---|
Title of host publication | 2016 American Control Conference, ACC 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 6290-6295 |
Number of pages | 6 |
ISBN (Electronic) | 9781467386821 |
DOIs | |
State | Published - Jul 28 2016 |
Externally published | Yes |
Event | 2016 American Control Conference, ACC 2016 - Boston, United States Duration: Jul 6 2016 → Jul 8 2016 |
Publication series
Name | Proceedings of the American Control Conference |
---|---|
Volume | 2016-July |
ISSN (Print) | 0743-1619 |
Other
Other | 2016 American Control Conference, ACC 2016 |
---|---|
Country/Territory | United States |
City | Boston |
Period | 7/6/16 → 7/8/16 |
Bibliographical note
Publisher Copyright:© 2016 American Automatic Control Council (AACC).