TY - JOUR
T1 - Sampled fictitious play for approximate dynamic programming
AU - Epelman, Marina
AU - Ghate, Archis
AU - Smith, Robert L.
PY - 2011/12
Y1 - 2011/12
N2 - Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However, there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems.
AB - Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However, there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems.
KW - Computational game theory
KW - Simulation optimization
KW - Stochastic dynamic optimization
UR - http://www.scopus.com/inward/record.url?scp=79955058933&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955058933&partnerID=8YFLogxK
U2 - 10.1016/j.cor.2011.01.023
DO - 10.1016/j.cor.2011.01.023
M3 - Article
AN - SCOPUS:79955058933
SN - 0305-0548
VL - 38
SP - 1705
EP - 1718
JO - Computers and Operations Research
JF - Computers and Operations Research
IS - 12
ER -