Building relational world models for reinforcement learning

Trevor Walker, Lisa Torrey, Jude Shavlik, Richard MacLin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Many reinforcement learning domains are highly relational.Whiletraditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value-iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning.

Original languageEnglish (US)
Title of host publicationInductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers
Pages280-291
Number of pages12
DOIs
StatePublished - Mar 10 2008
Externally publishedYes
Event17th International Conference on Inductive Logic Programming, ILP 2007 - Corvallis, OR, United States
Duration: Jun 19 2007Jun 21 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4894 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other17th International Conference on Inductive Logic Programming, ILP 2007
CountryUnited States
CityCorvallis, OR
Period6/19/076/21/07

Fingerprint Dive into the research topics of 'Building relational world models for reinforcement learning'. Together they form a unique fingerprint.

Cite this