Statistical inference with large-scale trait imputation

Jingchen Ren, Wei Pan

Research output: Contribution to journalArticlepeer-review


Recently a nonparametric method called LS-imputation has been proposed for large-scale trait imputation based on a GWAS summary dataset and a large set of genotyped individuals. The imputed trait values, along with the genotypes, can be treated as an individual-level dataset for downstream genetic analyses, including those that cannot be done with GWAS summary data. However, since the covariance matrix of the imputed trait values is often too large to calculate, the current method imposes a working assumption that the imputed trait values are identically and independently distributed, which is incorrect in truth. Here we propose a “divide and conquer/combine” strategy to estimate and account for the covariance matrix of the imputed trait values via batches, thus relaxing the incorrect working assumption. Applications of the methods to the UK Biobank data for marginal association analysis showed some improvement by the new method in some cases, but overall the original method performed well, which was explained by nearly constant variances of and mostly weak correlations among imputed trait values.

Original languageEnglish (US)
Pages (from-to)625-641
Number of pages17
JournalStatistics in Medicine
Issue number4
StatePublished - Feb 20 2024

Bibliographical note

Publisher Copyright:
© 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.


  • GWAS
  • LS-imputation
  • SNP
  • least squares
  • linear models

PubMed: MeSH publication types

  • Journal Article


Dive into the research topics of 'Statistical inference with large-scale trait imputation'. Together they form a unique fingerprint.

Cite this