A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

Julian Wolfson, Sunayan Bandyopadhyay, Mohamed Elidrisi, Gabriela Vazquez-Benitez, David M. Vock, Donald Musgrove, Gediminas Adomavicius, Paul E. Johnson, Patrick J. O'Connor

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.

Original languageEnglish (US)
Pages (from-to)2941-2957
Number of pages17
JournalStatistics in Medicine
Volume34
Issue number21
DOIs
StatePublished - Sep 20 2015

Fingerprint

Naive Bayes
Machine Learning
Electronic Health Records
Prediction
Healthcare
Health
Electronics
Databases
Delivery of Health Care
Cox Proportional Hazards Model
Cohort Study
Public Health
Censoring
Missing Data
Proportional Hazards Models
Sample Size
Longitudinal Studies
Covariates
Cohort Studies
Alternatives

Keywords

  • Electronic health records
  • Machine learning
  • Naive Bayes
  • Risk prediction
  • Survival analysis

Cite this

Wolfson, J., Bandyopadhyay, S., Elidrisi, M., Vazquez-Benitez, G., Vock, D. M., Musgrove, D., ... O'Connor, P. J. (2015). A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Statistics in Medicine, 34(21), 2941-2957. https://doi.org/10.1002/sim.6526

A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. / Wolfson, Julian; Bandyopadhyay, Sunayan; Elidrisi, Mohamed; Vazquez-Benitez, Gabriela; Vock, David M.; Musgrove, Donald; Adomavicius, Gediminas; Johnson, Paul E.; O'Connor, Patrick J.

In: Statistics in Medicine, Vol. 34, No. 21, 20.09.2015, p. 2941-2957.

Research output: Contribution to journalArticle

Wolfson, J, Bandyopadhyay, S, Elidrisi, M, Vazquez-Benitez, G, Vock, DM, Musgrove, D, Adomavicius, G, Johnson, PE & O'Connor, PJ 2015, 'A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data', Statistics in Medicine, vol. 34, no. 21, pp. 2941-2957. https://doi.org/10.1002/sim.6526
Wolfson, Julian ; Bandyopadhyay, Sunayan ; Elidrisi, Mohamed ; Vazquez-Benitez, Gabriela ; Vock, David M. ; Musgrove, Donald ; Adomavicius, Gediminas ; Johnson, Paul E. ; O'Connor, Patrick J. / A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. In: Statistics in Medicine. 2015 ; Vol. 34, No. 21. pp. 2941-2957.
@article{b6f98f989d214f27a23bbb7d64100d7f,
title = "A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data",
abstract = "Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.",
keywords = "Electronic health records, Machine learning, Naive Bayes, Risk prediction, Survival analysis",
author = "Julian Wolfson and Sunayan Bandyopadhyay and Mohamed Elidrisi and Gabriela Vazquez-Benitez and Vock, {David M.} and Donald Musgrove and Gediminas Adomavicius and Johnson, {Paul E.} and O'Connor, {Patrick J.}",
year = "2015",
month = "9",
day = "20",
doi = "10.1002/sim.6526",
language = "English (US)",
volume = "34",
pages = "2941--2957",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "21",

}

TY - JOUR

T1 - A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

AU - Wolfson, Julian

AU - Bandyopadhyay, Sunayan

AU - Elidrisi, Mohamed

AU - Vazquez-Benitez, Gabriela

AU - Vock, David M.

AU - Musgrove, Donald

AU - Adomavicius, Gediminas

AU - Johnson, Paul E.

AU - O'Connor, Patrick J.

PY - 2015/9/20

Y1 - 2015/9/20

N2 - Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.

AB - Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.

KW - Electronic health records

KW - Machine learning

KW - Naive Bayes

KW - Risk prediction

KW - Survival analysis

UR - http://www.scopus.com/inward/record.url?scp=84938292482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938292482&partnerID=8YFLogxK

U2 - 10.1002/sim.6526

DO - 10.1002/sim.6526

M3 - Article

C2 - 25980520

AN - SCOPUS:84938292482

VL - 34

SP - 2941

EP - 2957

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 21

ER -