TY - JOUR
T1 - Multiple predictively equivalent risk models for handling missing data at time of prediction
T2 - With an application in severe hypoglycemia risk prediction for type 2 diabetes
AU - Ma, Sisi
AU - Schreiner, Pamela J.
AU - Seaquist, Elizabeth R.
AU - Ugurbil, Mehmet
AU - Zmora, Rachel
AU - Chow, Lisa S.
N1 - Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2020/3
Y1 - 2020/3
N2 - The presence of missing data at the time of prediction limits the application of risk models in clinical and research settings. Common ways of handling missing data at the time of prediction include measuring the missing value and employing statistical methods. Measuring missing value incurs additional cost, whereas previously reported statistical methods results in reduced performance compared to when all variables are measured. To tackle these challenges, we introduce a new strategy, the MMTOP algorithm (Multiple models for Missing values at Time Of Prediction), which does not require measuring additional data elements or data imputation. Specifically, at model construction time, the MMTOP constructs multiple predictively equivalent risk models utilizing different risk factor sets. The collection of models are stored and to be queried at prediction time. To predict an individual's risk in the presence of incomplete data, the MMTOP selects the risk model based on measurement availability for that individual from the collection of predictively equivalent models and makes the risk prediction with the selected model. We illustrate the MMTOP with severe hypoglycemia (SH) risk prediction based on data from the Action to Control Cardiovascular Risk in Diabetes (ACCORD) study. We identified 77 predictively equivalent models for SH with cross-validated c-index of 0.77 ± 0.03. These models are based on 77 distinct risk factor sets containing 12–17 risk factors. In terms of handling missing data at the time of prediction, the MMTOP outperforms all four tested competitor methods and maintains consistent performance as the number of missing variables increase.
AB - The presence of missing data at the time of prediction limits the application of risk models in clinical and research settings. Common ways of handling missing data at the time of prediction include measuring the missing value and employing statistical methods. Measuring missing value incurs additional cost, whereas previously reported statistical methods results in reduced performance compared to when all variables are measured. To tackle these challenges, we introduce a new strategy, the MMTOP algorithm (Multiple models for Missing values at Time Of Prediction), which does not require measuring additional data elements or data imputation. Specifically, at model construction time, the MMTOP constructs multiple predictively equivalent risk models utilizing different risk factor sets. The collection of models are stored and to be queried at prediction time. To predict an individual's risk in the presence of incomplete data, the MMTOP selects the risk model based on measurement availability for that individual from the collection of predictively equivalent models and makes the risk prediction with the selected model. We illustrate the MMTOP with severe hypoglycemia (SH) risk prediction based on data from the Action to Control Cardiovascular Risk in Diabetes (ACCORD) study. We identified 77 predictively equivalent models for SH with cross-validated c-index of 0.77 ± 0.03. These models are based on 77 distinct risk factor sets containing 12–17 risk factors. In terms of handling missing data at the time of prediction, the MMTOP outperforms all four tested competitor methods and maintains consistent performance as the number of missing variables increase.
KW - Missing data
KW - Risk factors
KW - Risk modeling
KW - T2DM
UR - http://www.scopus.com/inward/record.url?scp=85079174125&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079174125&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2020.103379
DO - 10.1016/j.jbi.2020.103379
M3 - Article
C2 - 32001388
AN - SCOPUS:85079174125
SN - 1532-0464
VL - 103
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 103379
ER -