Abstract
We consider a delivery drone that is supposed to achieve pick-up and delivery tasks that arrive stochastically during a mission. Since a delivery drone is often equipped with a camera, it can also gather useful information by monitoring the environment during the pick-up and delivery task. Motivated by the multi-use of drones, we address a persistent monitoring problem where a drone’s high-level decision making is modeled as a Markov decision process (MDP) with unknown transition probabilities. The reward function is designed based on the valuable information over the environment, and the pick-up and delivery tasks are defined by bounded time temporal logic specifications. We use a reinforcement learning (RL) algorithm that maximizes the expected sum of rewards while various dynamically arriving temporal logic specifications are satisfied with a desired probability in every episode during learning. We demonstrate the simulation results and discuss the quality of the proposed method.
Original language | English (US) |
---|---|
Title of host publication | AIAA Scitech 2021 Forum |
Publisher | American Institute of Aeronautics and Astronautics Inc, AIAA |
Pages | 1-13 |
Number of pages | 13 |
ISBN (Print) | 9781624106095 |
DOIs | |
State | Published - Jan 4 2021 |
Event | AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2021 - Virtual, Online Duration: Jan 11 2021 → Jan 15 2021 |
Publication series
Name | AIAA Scitech 2021 Forum |
---|
Conference
Conference | AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2021 |
---|---|
City | Virtual, Online |
Period | 1/11/21 → 1/15/21 |
Bibliographical note
Publisher Copyright:© 2021, American Institute of Aeronautics and Astronautics Inc, AIAA. All Rights Reserved.