Association analysis and meta-analysis of multiallelic variants for large-scale sequence data

Yu Jiang, Sai Chen, Xingyan Wang, Mengzhen Liu, William G. Iacono, John K. Hewitt, John E. Hokanson, Kenneth Krauter, Markku Laakso, Kevin W. Li, Sharon M. Lutz, Matthew McGue, Anita Pandit, Gregory J.M. Zajac, Michael Boehnke, Goncalo R. Abecasis, Scott I. Vrieze, Bibo Jiang, Xiaowei Zhan, Dajiang J. Liu

Research output: Contribution to journalArticlepeer-review


There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform metaanalysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.

Original languageEnglish (US)
Article number585
Issue number5
StatePublished - May 2020

Bibliographical note

Funding Information:
Funding: DL and YJ were supported by R01HG008983 from the National Institute of Health. MB was supported by R01HG000376 from the National Institute of Health. W.G.I has been supported by DA05147 and DA036216 from the National Institute of Health. SML was supported by K01HL125858 from the National Institute of Health. CADD study has been supported by DA011015 from the National Institute of Health. The COPDGene Study (NCT00608764) is supported by National Heart, Lung and Blood Institute NHLBI R01 HL084323 and HL089897 and is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, GlaxoSmithKline, Siemens, and Sunovion. The funding sources played no role in the design of the study.

Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.


  • GWAS
  • Meta-analysis
  • Multi-allelic variants
  • Smoking

Fingerprint Dive into the research topics of 'Association analysis and meta-analysis of multiallelic variants for large-scale sequence data'. Together they form a unique fingerprint.

Cite this