Both genome-wide association study and next-generation sequencing data analyses are widely employed to identify disease susceptible common and/or rare genetic variants. Rare variants generally have large effects though they are hard to detect due to their low frequencies. Currently, many existing statistical methods for rare variants association studies employ a weighted combination scheme, which usually puts subjective weights or suboptimal weights based on some adhoc assumptions (e.g., ignoring dependence between rare variants). In this study, we analytically derived optimal weights for both common and rare variants and proposed a general and novel approach to test association between an optimally weighted combination of variants (G-TOW) in a gene or pathway for a continuous or dichotomous trait while easily adjusting for covariates. Results of the simulation studies show that G-TOW has properly controlled type I error rates and it is the most powerful test among the methods we compared when testing effects of either both rare and common variants or rare variants only. We also illustrate the effectiveness of G-TOW using the Genetic Analysis Workshop 17 (GAW17) data. Additionally, we applied G-TOW and other competitive methods to test disease-associated genes in real data of schizophrenia. The G-TOW has successfully verified genes FYN and VPS39 which are associated with schizophrenia reported in existing publications. Both of these genes are missed by the weighted sum statistic and the sequence kernel association test. Simulation study and real data analysis indicate that G-TOW is a powerful test.
Bibliographical noteFunding Information:
Q. S. and S. Z. are funded by the National Human Genome Research Institute of the National Institutes of Health under award number (R15HG008209). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. X. W. is supported by the University of North Texas Foundation which was contributed by Dr. Linda Truitt Creagh. The content is solely the responsibility of the authors and does not necessarily represent the views of the University of North Texas Foundation and Dr. Linda Truitt Creagh. The Genetic Analysis workshops are supported by NIH (grant R01 GM031575) from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome data set was supported in part by NIH (R01 MH059490) and used sequencing data from the 1,000 Genomes Project (http://www.1000genomes.org). The Schizophrenia data/analyses presented in the current publication are based on the use of study data downloaded from the dbGaP web site, under dpGaP accession: phs000473.v2.p2. A superior high-performance computing infrastructure at University of North Texas was used in obtaining results presented in this publication.
© 2019 Wiley Periodicals, Inc.
- association studies
- common variants
- optimal weights
- rare variants