Semiparametric contextual bandits

Akshay Krishnamurthy, Steven Wu, Vasilis Syrgkanis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve Õ(d√T) regret over T rounds, when the linear function is d-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenewald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages4330-4349
Number of pages20
ISBN (Electronic)9781510867963
StatePublished - Jan 1 2018
Externally publishedYes
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume6

Other

Other35th International Conference on Machine Learning, ICML 2018
CountrySweden
CityStockholm
Period7/10/187/15/18

Cite this

Krishnamurthy, A., Wu, S., & Syrgkanis, V. (2018). Semiparametric contextual bandits. In J. Dy, & A. Krause (Eds.), 35th International Conference on Machine Learning, ICML 2018 (pp. 4330-4349). (35th International Conference on Machine Learning, ICML 2018; Vol. 6). International Machine Learning Society (IMLS).