Incomplete time-series gene expression in integrative study for islet autoimmunity prediction

Khandakar Tanvir Ahmed, Sze Cheng, Qian Li, Jeongsik Yong, Wei Zhang

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Type 1 diabetes (T1D) outcome prediction plays a vital role in identifying novel risk factors, ensuring early patient care and designing cohort studies. TEDDY is a longitudinal cohort study that collects a vast amount of multi-omics and clinical data from its participants to explore the progression and markers of T1D. However, missing data in the omics profiles make the outcome prediction a difficult task. TEDDY collected time series gene expression for less than 6% of enrolled participants. Additionally, for the participants whose gene expressions are collected, 79% time steps are missing. This study introduces an advanced bioinformatics framework for gene expression imputation and islet autoimmunity (IA) prediction. The imputation model generates synthetic data for participants with partially or entirely missing gene expression. The prediction model integrates the synthetic gene expression with other risk factors to achieve better predictive performance. Comprehensive experiments on TEDDY datasets show that: (1) Our pipeline can effectively integrate synthetic gene expression with family history, HLA genotype and SNPs to better predict IA status at 2 years (sensitivity 0.622, AUC 0.715) compared with the individual datasets and state-of-the-art results in the literature (AUC 0.682). (2) The synthetic gene expression contains predictive signals as strong as the true gene expression, reducing reliance on expensive and long-term longitudinal data collection. (3) Time series gene expression is crucial to the proposed improvement and shows significantly better predictive ability than cross-sectional gene expression. (4) Our pipeline is robust to limited data availability. Availability: Code is available at https://github.com/compbiolabucf/TEDDY

Original languageEnglish (US)
Article numberbbac537
JournalBriefings in Bioinformatics
Volume24
Issue number1
DOIs
StatePublished - Jan 1 2023

Bibliographical note

Funding Information:
This work is supported by U24DK097771 from the National Institute of Diabetes, Digestive and Kidney Diseases via the NIDDK Information Network’s (dkNET) New Investigator Pilot Program in Bioinformatics. Acknowledgments

Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Keywords

  • autoencoders
  • incomplete time-series gene expression
  • islet autoimmunity prediction
  • long short-term memory
  • multi-omics
  • type-1 diabetes

PubMed: MeSH publication types

  • Journal Article
  • Research Support, N.I.H., Extramural

Fingerprint

Dive into the research topics of 'Incomplete time-series gene expression in integrative study for islet autoimmunity prediction'. Together they form a unique fingerprint.

Cite this