A novel method for handling Missing Not at Random Data in the electronic health records

Xinpeng Shen, Sisi Ma, Pedro J. Caraballo, Prashanthi Vemuri, Gyorgy J Simon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Causal inference aims to estimate the causal relationships and effect sizes among treatments and outcomes. Electronic health record (EHR) data is a valuable healthcare data source that can support causal inference. However, a large percentage of the data is missing in EHRs and they are missing not at random (MNAR). Ignoring MNAR can lead to severe biases, to the extent where the causal structure underlying the data gets distorted. We proposed a new causal inference methodology that addresses the MNAR problem and thus helps preserve the causal structure. We compared the performance of our proposed method with the traditional causal inference method, structural equation modeling (SEM). We evaluated these methods for their accuracy in estimating the causal effect sizes and their ability to converge at all. We employed both simulation studies and real-world EHR data sets. We demonstrated that imputation under the improper missingness mechanism distorted the causal structure to a degree where SEM found it incompatible with the data and failed to. converge. Even when 20 to 30 % of the values were missing, SEM failed to converge in as many as 50% of the runs. The proposed causal inference method achieved a higher convergence rate and more accurate estimation of latent treatment effects both on the synthetic data and on a real EHR data set. We proposed a new methodology that incorporates the knowledge of missing data mechanisms. It significantly mitigated the biases associated with MNAR in the EHR dataset and substantially outperformed SEM that uses the improper missing data mechanism.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE 10th International Conference on Healthcare Informatics, ICHI 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages21-26
Number of pages6
ISBN (Electronic)9781665468459
DOIs
StatePublished - 2022
Event10th IEEE International Conference on Healthcare Informatics, ICHI 2022 - Rochester, United States
Duration: Jun 11 2022Jun 14 2022

Publication series

NameProceedings - 2022 IEEE 10th International Conference on Healthcare Informatics, ICHI 2022

Conference

Conference10th IEEE International Conference on Healthcare Informatics, ICHI 2022
Country/TerritoryUnited States
CityRochester
Period6/11/226/14/22

Bibliographical note

Funding Information:
This work is supported by the National Institutes of Health (NIH) grants AG056366, TR002494, and LM011972.

Publisher Copyright:
© 2022 IEEE.

Keywords

  • causal inference
  • missing imputation
  • MNAR

Fingerprint

Dive into the research topics of 'A novel method for handling Missing Not at Random Data in the electronic health records'. Together they form a unique fingerprint.

Cite this