Dynamic learning of patient response types: An application to treating chronic diseases

Diana M. Negoescu, Kostas Bimpikis, Margaret L. Brandeau, Dan A. Iancu

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Currently available medication for treating many chronic diseases is often effective only for a subgroup of patients, and biomarkers accurately assessing whether an individual belongs to this subgroup typically do not exist. In such settings, physicians learn about the effectiveness of a drug primarily through experimentation—i.e., by initiating treatment and monitoring the patient’s response. Precise guidelines for discontinuing treatment are often lacking or left entirely to the physician’s discretion. We introduce a framework for developing adaptive, personalized treatments for such chronic diseases. Our model is based on a continuous-time, multi-armed bandit setting where drug effectiveness is assessed by aggregating information from several channels: by continuously monitoring the state of the patient, but also by (not) observing the occurrence of particular infrequent health events, such as relapses or disease flare-ups. Recognizing that the timing and severity of such events provide critical information for treatment decisions is a key point of departure in our framework compared with typical (bandit) models used in healthcare. We show that the model can be analyzed in closed form for several settings of interest, resulting in optimal policies that are intuitive and may have practical appeal. We illustrate the effectiveness of the methodology by developing a set of efficient treatment policies for multiple sclerosis, which we then use to benchmark several existing treatment guidelines.

Original languageEnglish (US)
Pages (from-to)3469-3488
Number of pages20
JournalManagement Science
Issue number8
StatePublished - Aug 2018

Bibliographical note

Funding Information:
History: Accepted by Noah Gans, stochastic models and simulation. Funding:M. L. Brandeau was supported by the National Institute on Drug Abuse [Grant R01-DA15612]. SupplementalMaterial: The online appendix is available at https://doi.org /10.1287/mnsc.2017.2793.

Publisher Copyright:
© 2017 INFORMS.


  • Adaptive treatment
  • Continuous time
  • Dynamic programming
  • Multiarmed bandits
  • Optimal control
  • Stochastic model applications


Dive into the research topics of 'Dynamic learning of patient response types: An application to treating chronic diseases'. Together they form a unique fingerprint.

Cite this