Genetic ancestry inference using support vector machines, and the active emergence of a unique American population This article has been corrected since online publication and a corrigendum is also printed in this issue

Ryan J. Haasl, Catherine A. McCarty, Bret A. Payseur

Research output: Contribution to journalArticle

6 Scopus citations

Abstract

We use genotype data from the Marshfield Clinical Research Foundation Personalized Medicine Research Project to investigate genetic similarity and divergence between Europeans and the sampled population of European Americans in Central Wisconsin, USA. To infer recent genetic ancestry of the sampled Wisconsinites, we train support vector machines (SVMs) on the positions of Europeans along top principal components (PCs). Our SVM models partition continent-wide European genetic variance into eight regional classes, which is an improvement over the geographically broader categories of recent ancestry reported by personal genomics companies. After correcting for misclassification error associated with the SVMs (o10%, in all cases), we observe a 414% discrepancy between insular ancestries reported by Wisconsinites and those inferred by SVM. Values of FST as well as Mantel tests for correlation between genetic and European geographic distances indicate minimal divergence between Europe and the local Wisconsin population. However, we find that individuals from the Wisconsin sample show greater dispersion along higher-order PCs than individuals from Europe. Hypothesizing that this pattern is characteristic of nascent divergence, we run computer simulations that mimic the recent peopling of Wisconsin. Simulations corroborate the pattern in higher-order PCs, demonstrate its transient nature, and show that admixture accelerates the rate of divergence between the admixed population and its parental sources relative to drift alone. Together, empirical and simulation results suggest that genetic divergence between European source populations and European Americans in Central Wisconsin is subtle but already under way.

Original languageEnglish (US)
Pages (from-to)554-562
Number of pages9
JournalEuropean Journal of Human Genetics
Volume21
Issue number5
DOIs
StatePublished - May 2013

Keywords

  • Admixture
  • Genetic ancestry
  • Population structure
  • Principal component analysis
  • Support vector machine

Fingerprint Dive into the research topics of 'Genetic ancestry inference using support vector machines, and the active emergence of a unique American population This article has been corrected since online publication and a corrigendum is also printed in this issue'. Together they form a unique fingerprint.

Cite this