Non-stationary policy learning in 2-player zero sum games

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

A key challenge in multiagent environments is the construction of agents that are able to learn while acting in the presence of other agents that are simultaneously learning and adapting. These domains require on-line learning methods without the benefit of repeated training examples, as well as the ability to adapt to the evolving behavior of other agents in the environment. The difficulty is further exacerbated when the agents are in an adversarial relationship, demanding that a robust (i.e. winning) non-stationary policy be rapidly learned and adapted. We propose an on-line sequence learning algorithm, ELPH, based on a straightforward entropy pruning technique that is able to rapidly learn and adapt to non-stationary policies. We demonstrate the performance of this method in a non-stationary learning environment of adversarial zero-sum matrix games.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
Pages789-794
Number of pages6
Volume2
StatePublished - Dec 1 2005
Event20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 - Pittsburgh, PA, United States
Duration: Jul 9 2005Jul 13 2005

Other

Other20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
CountryUnited States
CityPittsburgh, PA
Period7/9/057/13/05

Fingerprint Dive into the research topics of 'Non-stationary policy learning in 2-player zero sum games'. Together they form a unique fingerprint.

Cite this