Semiparametric contextual bandits

Akshay Krishnamurthy, Steven Wu, Vasilis Syrgkanis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve Õ(d√T) regret over T rounds, when the linear function is d-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenewald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages4330-4349
Number of pages20
ISBN (Electronic)9781510867963
StatePublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume6

Other

Other35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period7/10/187/15/18

Bibliographical note

Publisher Copyright:
© CURRAN-CONFERENCE. All rights reserved.

Fingerprint

Dive into the research topics of 'Semiparametric contextual bandits'. Together they form a unique fingerprint.

Cite this