Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce ‘annotation principal components’, multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
Bibliographical noteFunding Information:
This work was supported by grant nos. R35-CA197449, P01-CA134294, U19-CA203654 and R01-HL113338 (to X. Lin), U01-HG009088 (to X. Lin, S.R.S. and B.M.N.), R01-HL142711 (to P.N. and G.M.P.), K01-HL125751 and R03-HL141439 (to G.M.P.), R35-HL135824 (to C.J.W.), 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1TR001881 and DK063491 (to J.I.R. and X.G.), HHSN268201800002I (to G.R.A.), R35-GM127131 and R01-MH101244 (to S.R.S.), U01-HL72518, HL087698, HL49762, HL59684, HL58625, HL071025, HL112064, NR0224103 and M01-RR000052 (to the Johns Hopkins General Clinical Research Center), R01-HL093093, R01-HL133040 (to D.E.W.), NO1-HC-25195, HHSN268201500001I, 75N92019D00031 and R01-HL092577-06S1 (to R.S.V. and L.A.C.), the Evans Medical Foundation and the Jay and Louis Coffman Endowment from the Department of Medicine, Boston University School of Medicine (to R.S.V.), HHSN268201800001I (to K.M.R., A.T.K., M.P.C. and J.G.B.), U01-HL137162 (to K.M.R. and M.P.C.), R35-HL135818 and R01-HL113338 (to S.R.), R01-HL113323, U01-DK085524, R01-HL045522, R01-MH078143, R01-MH078111 and R01-MH083824 (to J.M.P., M.C.M., J.E.C. and J.B.), R01-HL92301, R01-HL67348, R01-NS058700, R01-AR48797 and R01-AG058921 (to N.D.P. and D.W.B.), R01-DK071891 (to N.D.P., B.I.F. and D.W.B.), M01-RR07122 and F32-HL085989 (to the General Clinical Research Center of the Wake Forest University School of Medicine), the American Diabetes Association, P60-AG10484 (to the Claude Pepper Older Americans Independence Center of Wake Forest University Health Sciences), U01-HL137181 (to J.R.O.), R01-HL093093 (to S.T.M.), 1U24CA237617 and 5U24HG009446 (to X.S.L.), HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C and HHSN268201600004C (to C.L.K.), U01-HL072524, R01-HL104135-04S1, U01-HL054472, U01-HL054473, U01-HL054495, U01-HL054509 and R01-HL055673-18S1 (to M.R.I., S.A. and D.K.A.), Swedish Research Council grant no. 201606830 (to G.H.), grant nos. HHSN268201800010I, HHSN268201800011I, HHSN268201800012I, HHSN268201800013I, HHSN268201800014I and HHSN268201800015I (to A.C.), HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700005I and HHSN268201700004I (to E.B.), and R01-HL134320 (to C.M.B.). WGS for the TOPMed program was supported by the NHLBI. Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (no. 3R01HL-117626-02S1; contract no. HHSN268201800002I). Phenotype harmonization, data management, sample identity quality control and general study coordination were provided by the TOPMed Data Coordinating Center (no. 3R01HL-120393-02S1; contract no. HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The full study-specific acknowledgements are detailed in the Supplementary Note.
S.A. reports equity and employment by 23andMe. L.A.C. spends part of her time consulting for the Dyslipidemia Foundation, a nonprofit company, as a statistical consultant. X.S.L. is cofounder, board member and scientific advisory board of GV20 Oncotherapy, board member of the scientific advisory board of 3DMedCare, consultant of Genentech and is a recipient of research grants from Sanofi and Takeda, all unrelated to the present work. For The Amish Research Program receives partial support from Regeneron Pharmaceuticals for B.D.M. M.E.M reports a grant from Regeneron Pharmaceuticals that is unrelated to the present work. B.M.P. serves on the steering committee of the Yale Open Data Access Project funded by Johnson & Johnson. S.R. reports interests in Jazz Pharmaceuticals, Eisai and Respircardia, all unrelated to the present work. Z.W. cofounded Rgenta Therapeutics and directs its scientific advisory board. B.M.N. is on the scientific advisory board of Deep Genomics, and is a consultant for CAMP4 Therapeutics, Takeda and Biogen. S.R.S. is a consultant to NGM Biopharmaceuticals and Inari Agriculture. He is also on the scientific advisory board of Veritas Genetics. G.R.A. is an employee of Regeneron Pharmaceuticals and owns stock and stock options for Regeneron Pharmaceuticals. The spouse of C.J.W. works at Regeneron Pharmaceuticals. P.N. reports grants from Amgen, Apple and Boston Scientific, and consulting income from Apple and Blackstone Life Sciences, all unrelated to the present work. X. Lin is a consultant to AbbVie Pharmaceuticals.
© 2020, The Author(s), under exclusive licence to Springer Nature America, Inc.