On Sample Size and Power Calculation for Variant Set-Based Association Tests

Baolin Wu, James S. Pankow

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Sample size and power calculations are an important part of designing new sequence-based association studies. The recently developed SEQPower and SPS programs adopted computationally intensive Monte Carlo simulations to empirically estimate power for a series of variant set association (VSA) test methods including the sequence kernel association test (SKAT). It is desirable to develop methods that can quickly and accurately compute power without intensive Monte Carlo simulations. We will show that the computed power for SKAT based on the existing analytical approach could be inflated especially for small significance levels, which are often of primary interest for large-scale whole genome and exome sequencing projects. We propose a new χ2-approximation-based approach to accurately and efficiently compute sample size and power. In addition, we propose and implement a more accurate "exact" method to compute power, which is more efficient than the Monte Carlo approach though generally involves more computations than the χ2 approximation method. The exact approach could produce very accurate results and be used to verify alternative approximation approaches. We implement the proposed methods in publicly available R programs that can be readily adapted when planning sequencing projects.

Original languageEnglish (US)
Pages (from-to)136-143
Number of pages8
JournalAnnals of Human Genetics
Issue number2
StatePublished - Mar 1 2016

Bibliographical note

Funding Information:
This research was supported in part by NIH grants GM083345 and CA134848. We are grateful to the University of Minnesota Supercomputing Institute for assistance with the computations. We want to thank the editor and reviewer for the constructive comments that have greatly improved the presentation of the paper. The ARIC Study is carried out as a collaborative study supported by contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN- 268201100008C, HHSN268201100009C, HHSN2682011- 00010C, HHSN268201100011C, and HHSN26820110- 0012C), R01HL087641, R01HL59367, and R01HL- 086694; contract U01HG004402; and contract HHSN268200625226C. The authors thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by Grant Number UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. Support for exome chip genotyping in the ARIC Study was provided by the National Institutes of Health (NIH) American Recovery and Reinvestment Act of 2009 (ARRA) (5RC2HL102419).

Publisher Copyright:
© 2016 John Wiley & Sons Ltd/University College London.


  • Sample size
  • Sequence kernel association test
  • Sequencing study


Dive into the research topics of 'On Sample Size and Power Calculation for Variant Set-Based Association Tests'. Together they form a unique fingerprint.

Cite this