TY - JOUR
T1 - Genetic ancestry inference using support vector machines, and the active emergence of a unique American population This article has been corrected since online publication and a corrigendum is also printed in this issue
AU - Haasl, Ryan J.
AU - McCarty, Catherine A.
AU - Payseur, Bret A.
PY - 2013/5
Y1 - 2013/5
N2 - We use genotype data from the Marshfield Clinical Research Foundation Personalized Medicine Research Project to investigate genetic similarity and divergence between Europeans and the sampled population of European Americans in Central Wisconsin, USA. To infer recent genetic ancestry of the sampled Wisconsinites, we train support vector machines (SVMs) on the positions of Europeans along top principal components (PCs). Our SVM models partition continent-wide European genetic variance into eight regional classes, which is an improvement over the geographically broader categories of recent ancestry reported by personal genomics companies. After correcting for misclassification error associated with the SVMs (o10%, in all cases), we observe a 414% discrepancy between insular ancestries reported by Wisconsinites and those inferred by SVM. Values of FST as well as Mantel tests for correlation between genetic and European geographic distances indicate minimal divergence between Europe and the local Wisconsin population. However, we find that individuals from the Wisconsin sample show greater dispersion along higher-order PCs than individuals from Europe. Hypothesizing that this pattern is characteristic of nascent divergence, we run computer simulations that mimic the recent peopling of Wisconsin. Simulations corroborate the pattern in higher-order PCs, demonstrate its transient nature, and show that admixture accelerates the rate of divergence between the admixed population and its parental sources relative to drift alone. Together, empirical and simulation results suggest that genetic divergence between European source populations and European Americans in Central Wisconsin is subtle but already under way.
AB - We use genotype data from the Marshfield Clinical Research Foundation Personalized Medicine Research Project to investigate genetic similarity and divergence between Europeans and the sampled population of European Americans in Central Wisconsin, USA. To infer recent genetic ancestry of the sampled Wisconsinites, we train support vector machines (SVMs) on the positions of Europeans along top principal components (PCs). Our SVM models partition continent-wide European genetic variance into eight regional classes, which is an improvement over the geographically broader categories of recent ancestry reported by personal genomics companies. After correcting for misclassification error associated with the SVMs (o10%, in all cases), we observe a 414% discrepancy between insular ancestries reported by Wisconsinites and those inferred by SVM. Values of FST as well as Mantel tests for correlation between genetic and European geographic distances indicate minimal divergence between Europe and the local Wisconsin population. However, we find that individuals from the Wisconsin sample show greater dispersion along higher-order PCs than individuals from Europe. Hypothesizing that this pattern is characteristic of nascent divergence, we run computer simulations that mimic the recent peopling of Wisconsin. Simulations corroborate the pattern in higher-order PCs, demonstrate its transient nature, and show that admixture accelerates the rate of divergence between the admixed population and its parental sources relative to drift alone. Together, empirical and simulation results suggest that genetic divergence between European source populations and European Americans in Central Wisconsin is subtle but already under way.
KW - Admixture
KW - Genetic ancestry
KW - Population structure
KW - Principal component analysis
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=84876664978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876664978&partnerID=8YFLogxK
U2 - 10.1038/ejhg.2012.258
DO - 10.1038/ejhg.2012.258
M3 - Article
C2 - 23211701
AN - SCOPUS:84876664978
SN - 1018-4813
VL - 21
SP - 554
EP - 562
JO - European Journal of Human Genetics
JF - European Journal of Human Genetics
IS - 5
ER -