Effect of sample stratification on dairy GWAS results

Li Ma, George R. Wiggans, Shengwen Wang, Tad S. Sonstegard, Jing Yang, Brian A. Crooker, John B. Cole, Curtis P. Van Tassell, Thomas J. Lawlor, Yang Da

Research output: Contribution to journalArticlepeer-review

29 Scopus citations


Background: Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach.Results: Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10-15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA's Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows.Conclusions: Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.

Original languageEnglish (US)
Article number536
JournalBMC Genomics
Issue number1
StatePublished - Oct 6 2012

Bibliographical note

Funding Information:
This project was supported by National Research Initiative Competitive Grant no. 2008-35205-18846 and 2011-67015-30333 from the USDA National Institute of Food and Agriculture; Holstein Association USA; project MN-16-043 of the Agricultural Experiment Station at the University of Minnesota; and the Minnesota Supercomputer Institute. DNA samples were contributed by T Lawlor (Holstein Association USA), M. Cowan (Genetic Visions), R Wilson (Genex Cooperative), C Dechow (Pennsylvania State University), The Cooperative Dairy DNA Repository (USDA/ARS), H Blackburn (National Center for Genetic Resources Preservation, USDA/ARS), L Hansen (University of Minnesota), D Spurlock (Iowa State University), A de Vries (University of Florida), and B Cassell (Virginia Polytechnic Institute and State University). The authors wish to thank two anonymous reviewers and the Associate Editor for constructive comments and suggestions.


Dive into the research topics of 'Effect of sample stratification on dairy GWAS results'. Together they form a unique fingerprint.

Cite this