Abstract
This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve Õ(d√T) regret over T rounds, when the linear function is d-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenewald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.
Original language | English (US) |
---|---|
Title of host publication | 35th International Conference on Machine Learning, ICML 2018 |
Editors | Jennifer Dy, Andreas Krause |
Publisher | International Machine Learning Society (IMLS) |
Pages | 4330-4349 |
Number of pages | 20 |
ISBN (Electronic) | 9781510867963 |
State | Published - 2018 |
Event | 35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden Duration: Jul 10 2018 → Jul 15 2018 |
Publication series
Name | 35th International Conference on Machine Learning, ICML 2018 |
---|---|
Volume | 6 |
Other
Other | 35th International Conference on Machine Learning, ICML 2018 |
---|---|
Country/Territory | Sweden |
City | Stockholm |
Period | 7/10/18 → 7/15/18 |
Bibliographical note
Publisher Copyright:© CURRAN-CONFERENCE. All rights reserved.