Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning

Shenghui Chen, Yue Yu, David Fridovich-Keil, Ufuk Topcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outper- form those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.

Original languageEnglish (US)
Title of host publication2023 62nd IEEE Conference on Decision and Control, CDC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2202-2207
Number of pages6
ISBN (Electronic)9798350301243
DOIs
StatePublished - 2023
Externally publishedYes
Event62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapore
Duration: Dec 13 2023Dec 15 2023

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference62nd IEEE Conference on Decision and Control, CDC 2023
Country/TerritorySingapore
CitySingapore
Period12/13/2312/15/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Fingerprint

Dive into the research topics of 'Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning'. Together they form a unique fingerprint.

Cite this