Abstract
Recently, a non-parametric method has been proposed to impute the genetic component of a trait for a large set of genotyped individuals based on a separate genome-wide association study (GWAS) summary dataset of the same trait (from the same population). The imputed trait may contain linear, non-linear and epistatic effects of genetic variants, thus can be used for downstream linear or non-linear association analyses and machine learning tasks. Here, we propose an extension of the method to impute both genetic and environmental components of a trait using both single nucleotide polymorphism (SNP)-trait and omics-trait association summary data. We illustrate an application to a UK Biobank subset of individuals (n ≈ 80K) with both body mass index (BMI) GWAS data and metabolomic data. We divided the whole dataset into two equally sized and non-overlapping training and test datasets; we used the training data to build SNP- and metabolite-BMI association summary data and impute BMI on the test data. We compared the performance of the original and new imputation methods. As by the original method, the imputed BMI values by the new method largely retained SNP-BMI association information; however, the latter retained more information about BMI-environment associations and were more highly correlated with the original observed BMI values.
Original language | English (US) |
---|---|
Pages (from-to) | 2693-2703 |
Number of pages | 11 |
Journal | Human molecular genetics |
Volume | 32 |
Issue number | 17 |
DOIs | |
State | Published - Sep 1 2023 |
Bibliographical note
Publisher Copyright:© 2023 The Author(s).
PubMed: MeSH publication types
- Journal Article
- Research Support, Non-U.S. Gov't
- Research Support, N.I.H., Extramural