Abstract
We present a novel formulation for providing advice to a reinforcement learner that employs support-vector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner. In KBKR, users can say that in some set of states, an action's value should be greater than some linear expression of the current state. In our new technique, which we call Preference KBKR (Pref-KBKR), the user can provide advice in a more natural manner by recommending that some action is preferred over another in the specified set of states. Specifying preferences essentially means that users are giving advice about policies rather than Q values, which is a more natural way for humans to present advice. We present the motivation for preference advice and a proof of the correctness of our extension to KBKR. In addition, we show empirical results that our method can make effective use of advice on a novel reinforcement-learning task, based on the RoboCup simulator, which we call Breakaway. Our work demonstrates the significant potential of advice-giving techniques for addressing complex reinforcement learning problems, while further demonstrating the use of support-vector regression for reinforcement learning.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 |
Pages | 819-824 |
Number of pages | 6 |
Volume | 2 |
State | Published - Dec 1 2005 |
Event | 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 - Pittsburgh, PA, United States Duration: Jul 9 2005 → Jul 13 2005 |
Other
Other | 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 |
---|---|
Country/Territory | United States |
City | Pittsburgh, PA |
Period | 7/9/05 → 7/13/05 |