Abstract
Post-acute sequelae of SARS-Co V-2 infection (PASC), also known as Long COVID, is an emerging medical condition in the aftermath of the COVID-19 pandemic. Research on this disease is limited by its newness and the lack of reliable controls, which can hinder model development. The National COVID Cohort Collaborative (N3C)11https://ncats.nih.gov/n3c contains Electronic Health Record (EHR) data for 7 million COVID positive patients from 76 sites across the United States, of which there are fifty thousand Long COVID patients. For this study, we model our risk factor analysis as Positive Unlabeled (PU) problem, where we treat Long COVID patients as the positive sample and rest of the COVID positive patients as unlabeled data. We first curate reliable controls using a PU modeling technique called bagging. We then use this cohort of positive and the curated negative samples to model risk factors for Long COVID. We utilize an attention-based deep learning approach using Long Short Term Memory (LSTM) networks on historical diagnosis data prior to COVID-19 infection, to first predict for Long COVID and then extract the model attention values to score input diagnoses for each patient. Using this process, we achieve an Area Under the Receiver Operating Characteristic (AUROC) of 0.93 (0.88 F1 Score) for the prediction task, significantly outperforming the same model trained on randomly selected controls. We then use a scoring process to rank different input diagnoses for each correctly classified patient with attention values extracted from the trained model and find the temporal distribution of top diagnosis codes which, when represented graphically, becomes a helpful tool to for physicians to investigate diagnosis patterns that effect Long COVID and also evaluate model trustworthiness.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023 |
Editors | M. Arif Wani, Mihai Boicu, Moamar Sayed-Mouchaweh, Pedro Henriques Abreu, Joao Gama |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 430-436 |
Number of pages | 7 |
ISBN (Electronic) | 9798350345346 |
DOIs | |
State | Published - 2023 |
Event | 22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023 - Jacksonville, United States Duration: Dec 15 2023 → Dec 17 2023 |
Publication series
Name | Proceedings - 22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023 |
---|
Conference
Conference | 22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023 |
---|---|
Country/Territory | United States |
City | Jacksonville |
Period | 12/15/23 → 12/17/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Deep learning
- positive-unlabeled learning
- self-supervised learning