Don't do imputation: Dealing with informative missing values in EHR data analysis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Missing values pose a significant challenge in data analytic, especially in clinical studies, data is typically missing-not-at-random (MNAR). Applying techniques (e.g. imputations) that were designed for missing-at-random (MAR) to MNAR data, can lead to biases. In this work, we propose pattern-wise analysis, a collection of methods for building predictive models in the presence of MNAR missing values. On a per-pattern basis, this methodology constructs an individual model for each missingness pattern. We show that even the simplest pattern-wise method, Per-Pattern Modeling (PPM) outperforms models built on data sets completed by the most popular imputation methods. PPM faces difficulty when the number of missingness patterns is too high or when the missingness patterns have too few observations. We developed variants of PPM to overcome these challenges from three complementary perspectives: (i) from a model selection perspective, where PPM can select patterns to build models; (ii) a distributional perspective, where the training data set is expanded in a distribution-preserving fashion; and (iii) from a causal perspective, where a causal structure for the MNAR mechanism is assumed and exploited to convert the problem from MNAR to MAR. Evaluation of the proposed methods on both synthetic MNAR data and a real-world clinical data set of sepsis patients shows notable improvement over traditional approaches.

Original languageEnglish (US)
Title of host publicationProceedings - 9th IEEE International Conference on Big Knowledge, ICBK 2018
EditorsOng Yew Soon, Huanhuan Chen, Xindong Wu, Charu Aggarwal
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages415-422
Number of pages8
ISBN (Electronic)9781538691243
DOIs
StatePublished - Dec 24 2018
Event9th IEEE International Conference on Big Knowledge, ICBK 2018 - Singapore, Singapore
Duration: Nov 17 2018Nov 18 2018

Publication series

NameProceedings - 9th IEEE International Conference on Big Knowledge, ICBK 2018

Conference

Conference9th IEEE International Conference on Big Knowledge, ICBK 2018
CountrySingapore
CitySingapore
Period11/17/1811/18/18

Keywords

  • Informative Missing
  • Missing Not At Random
  • Missing Value
  • No Imputation
  • Pattern-Wise Learning

Fingerprint Dive into the research topics of 'Don't do imputation: Dealing with informative missing values in EHR data analysis'. Together they form a unique fingerprint.

Cite this