Imitation learning via kernel mean embedding

Kee Eung Kim, Hyun Soo Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Imitation learning refers to the problem where an agent learns a policy that mimics the demonstration provided by the expert, without any information on the cost function of the environment. Classical approaches to imitation learning usually rely on a restrictive class of cost functions that best explains the expert's demonstration, exemplified by linear functions of pre-defined features on states and actions. We show that the kernelization of a classical algorithm naturally reduces the imitation learning to a distribution learning problem, where the imitation policy tries to match the state-action visitation distribution of the expert. Closely related to our approach is the recent work on leveraging generative adversarial networks (GANs) for imitation learning, but our reduction to distribution learning is much simpler, robust to scarce expert demonstration, and sample efficient. We demonstrate the effectiveness of our approach on a wide range of high-dimensional control tasks.

Original languageEnglish (US)
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PublisherAAAI press
Pages3415-3422
Number of pages8
ISBN (Electronic)9781577358008
StatePublished - 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: Feb 2 2018Feb 7 2018

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Other

Other32nd AAAI Conference on Artificial Intelligence, AAAI 2018
CountryUnited States
CityNew Orleans
Period2/2/182/7/18

Bibliographical note

Funding Information:
Kee-Eung Kim is supported by IITP/MSIT (2017-0-01778) and DAPA/ADD via KAIST HSVRC. Hyun Soo Park is supported by MnDrive Robotics, Sensing, and Advanced Manufacturing and Oculus/Facebook Research.

Fingerprint Dive into the research topics of 'Imitation learning via kernel mean embedding'. Together they form a unique fingerprint.

Cite this