Abstract
Type 1 diabetes (T1D) outcome prediction plays a vital role in identifying novel risk factors, ensuring early patient care and designing cohort studies. TEDDY is a longitudinal cohort study that collects a vast amount of multi-omics and clinical data from its participants to explore the progression and markers of T1D. However, missing data in the omics profiles make the outcome prediction a difficult task. TEDDY collected time series gene expression for less than 6% of enrolled participants. Additionally, for the participants whose gene expressions are collected, 79% time steps are missing. This study introduces an advanced bioinformatics framework for gene expression imputation and islet autoimmunity (IA) prediction. The imputation model generates synthetic data for participants with partially or entirely missing gene expression. The prediction model integrates the synthetic gene expression with other risk factors to achieve better predictive performance. Comprehensive experiments on TEDDY datasets show that: (1) Our pipeline can effectively integrate synthetic gene expression with family history, HLA genotype and SNPs to better predict IA status at 2 years (sensitivity 0.622, AUC 0.715) compared with the individual datasets and state-of-the-art results in the literature (AUC 0.682). (2) The synthetic gene expression contains predictive signals as strong as the true gene expression, reducing reliance on expensive and long-term longitudinal data collection. (3) Time series gene expression is crucial to the proposed improvement and shows significantly better predictive ability than cross-sectional gene expression. (4) Our pipeline is robust to limited data availability. Availability: Code is available at https://github.com/compbiolabucf/TEDDY
Original language | English (US) |
---|---|
Article number | bbac537 |
Journal | Briefings in Bioinformatics |
Volume | 24 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 2023 |
Bibliographical note
Funding Information:This work is supported by U24DK097771 from the National Institute of Diabetes, Digestive and Kidney Diseases via the NIDDK Information Network’s (dkNET) New Investigator Pilot Program in Bioinformatics. Acknowledgments
Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].
Keywords
- autoencoders
- incomplete time-series gene expression
- islet autoimmunity prediction
- long short-term memory
- multi-omics
- type-1 diabetes
PubMed: MeSH publication types
- Journal Article
- Research Support, N.I.H., Extramural