Randomized allocation with arm elimination in a bandit problem with covariates

Wei Qian, Yuhong Yang

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Motivated by applications in personalized web services and clinical research, we consider a multi-armed bandit problem in a setting where the mean reward of each arm is associated with some covariates. A multi-stage randomized allocation with arm elimination algorithm is proposed to combine the flexibility in reward function modeling and a theoretical guarantee of a cumulative regret minimax rate. When the function smoothness parameter is unknown, the algorithm is equipped with a histogram estimation based smoothness parameter selector using Lepski’s method, and is shown to maintain the regret minimax rate up to a logarithmic factor under a “self-similarity” condition.

Original languageEnglish (US)
Pages (from-to)242-270
Number of pages29
JournalElectronic Journal of Statistics
Volume10
Issue number1
DOIs
StatePublished - 2016

Keywords

  • Adaptive estimation
  • Contextual bandit problem
  • MABC
  • Nonparametric bandit
  • Regret bound

Fingerprint Dive into the research topics of 'Randomized allocation with arm elimination in a bandit problem with covariates'. Together they form a unique fingerprint.

Cite this