Abstract
Missing values pose a significant challenge in data analytic, especially in clinical studies, data is typically missing-not-at-random (MNAR). Applying techniques (e.g. imputations) that were designed for missing-at-random (MAR) to MNAR data, can lead to biases. In this work, we propose pattern-wise analysis, a collection of methods for building predictive models in the presence of MNAR missing values. On a per-pattern basis, this methodology constructs an individual model for each missingness pattern. We show that even the simplest pattern-wise method, Per-Pattern Modeling (PPM) outperforms models built on data sets completed by the most popular imputation methods. PPM faces difficulty when the number of missingness patterns is too high or when the missingness patterns have too few observations. We developed variants of PPM to overcome these challenges from three complementary perspectives: (i) from a model selection perspective, where PPM can select patterns to build models; (ii) a distributional perspective, where the training data set is expanded in a distribution-preserving fashion; and (iii) from a causal perspective, where a causal structure for the MNAR mechanism is assumed and exploited to convert the problem from MNAR to MAR. Evaluation of the proposed methods on both synthetic MNAR data and a real-world clinical data set of sepsis patients shows notable improvement over traditional approaches.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 9th IEEE International Conference on Big Knowledge, ICBK 2018 |
Editors | Ong Yew Soon, Huanhuan Chen, Xindong Wu, Charu Aggarwal |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 415-422 |
Number of pages | 8 |
ISBN (Electronic) | 9781538691243 |
DOIs | |
State | Published - Dec 24 2018 |
Event | 9th IEEE International Conference on Big Knowledge, ICBK 2018 - Singapore, Singapore Duration: Nov 17 2018 → Nov 18 2018 |
Publication series
Name | Proceedings - 9th IEEE International Conference on Big Knowledge, ICBK 2018 |
---|
Conference
Conference | 9th IEEE International Conference on Big Knowledge, ICBK 2018 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 11/17/18 → 11/18/18 |
Bibliographical note
Funding Information:This work is partially supported by NIH award LM011972 and NSF award IIS 1602394. The views expressed in this paper are those of the authors and do not necessarily reflect the view of the funding agencies.
Publisher Copyright:
© 2018 IEEE
Keywords
- Informative Missing
- Missing Not At Random
- Missing Value
- No Imputation
- Pattern-Wise Learning