Does the inclusion of rare variants improve risk prediction?

Erin Austin, Wei Pan, Xiaotong T Shen

Research output: Contribution to journalArticle

1 Scopus citations

Abstract

Every known link between a genetic variant and blood pressure improves the understanding and potentially the risk assessment of related diseases such as hypertension. Genetic data have become increasingly comprehensive and available for an increasing number of samples. The availability of whole-genome sequencing data means that statistical genetic models must evolve to meet the challenge of using both rare variants (RVs) and common variants (CVs) to link previously unidentified genome loci to disease-related traits. Penalized regression has two features, variable selection and proportional coefficient shrinkage, that allow researchers to build models tailored to hypothesized characteristics of the genotype-phenotype map. The following work uses the Genetic Analysis Workshop 18 data to investigate the performance of a spectrum of penalized regressions using at first only CVs or only RVs to predict systolic blood pressure (SBP). Next, combinations of CVs and RVs are used to model SBP, and the impact on prediction is quantified. The study demonstrates that penalized regression improves blood pressure prediction for any combination of CVs and RVs compared with maximum likelihood estimation. More significantly, models using both types of variants provide better predictions of SBP than those using only CVs or only RVs. The predictive mean squared error was reduced by up to 11.5% when RVs were added to CV-only penalized regression models. Elastic net regression with equally weighted LASSO and ridge components, in particular, can use large numbers of single-nucleotide polymorphisms to improve prediction.

Original languageEnglish (US)
Article numberS94
JournalBMC Proceedings
Volume8
DOIs
StatePublished - Jun 17 2014

    Fingerprint

Cite this