Does the inclusion of rare variants improve risk prediction?

Erin Austin, Wei Pan, Xiaotong T Shen

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Every known link between a genetic variant and blood pressure improves the understanding and potentially the risk assessment of related diseases such as hypertension. Genetic data have become increasingly comprehensive and available for an increasing number of samples. The availability of whole-genome sequencing data means that statistical genetic models must evolve to meet the challenge of using both rare variants (RVs) and common variants (CVs) to link previously unidentified genome loci to disease-related traits. Penalized regression has two features, variable selection and proportional coefficient shrinkage, that allow researchers to build models tailored to hypothesized characteristics of the genotype-phenotype map. The following work uses the Genetic Analysis Workshop 18 data to investigate the performance of a spectrum of penalized regressions using at first only CVs or only RVs to predict systolic blood pressure (SBP). Next, combinations of CVs and RVs are used to model SBP, and the impact on prediction is quantified. The study demonstrates that penalized regression improves blood pressure prediction for any combination of CVs and RVs compared with maximum likelihood estimation. More significantly, models using both types of variants provide better predictions of SBP than those using only CVs or only RVs. The predictive mean squared error was reduced by up to 11.5% when RVs were added to CV-only penalized regression models. Elastic net regression with equally weighted LASSO and ridge components, in particular, can use large numbers of single-nucleotide polymorphisms to improve prediction.

Original languageEnglish (US)
Article numberS94
JournalBMC Proceedings
StatePublished - Jun 17 2014

Bibliographical note

Funding Information:
This research was supported by National Institutes of Health (NIH) grants R01HL65462, R01HL105397, and R01GM081535. The authors appreciate the valuable feedback from the GAW18 Machine Learning and Data Mining Group including AB, H-HH, SK, AL, and group leader RC. The GAW18 whole genome sequencing data were provided by the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples) Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The GAW is supported by NIH grant R01 GM031575. This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.

Publisher Copyright:
© 2014 Austin et al.; licensee BioMed Central Ltd.


Dive into the research topics of 'Does the inclusion of rare variants improve risk prediction?'. Together they form a unique fingerprint.

Cite this