Detecting population-differentiation copy number variants in human population tree by sparse group selection

Huanan Zhang, David Roe, Rui Kuang

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Copy-number variants (CNVs) account for a substantial proportion of human genetic variations. Understanding the CNV diversities across populations is a computational challenge because CNV patterns are often present in several related populations and only occur in a subgroup of individuals within each of the population. This paper introduces a tree-guided sparse group selection algorithm (treeSGS) to detect population-differentiation CNV markers of subgroups across populations organized by a phylogenetic tree of human populations. The treeSGS algorithm detects CNV markers of populations associated with nodes from all levels of the tree such that the evolutionary relations among the populations are incorporated for more accurate detection of population-differentiation CNVs. We applied treeSGS algorithm to study the 1,179 samples from the 11 populations in Hapmap3 CNV data. The treeSGS algorithm accurately identifies CNV markers of each population and the collection of populations organized under the branches of the human population tree, validated by consistency among family trios and SNP characterizations of the CNV regions. Further comparison between the detected CNV markers and other population-differentiation CNVs reported in 1,000 genome data and other recent studies also shows that treeSGS can significantly improve the current annotations of population-differentiation CNV markers. TreeSGS package is available at

Original languageEnglish (US)
Article number8168351
Pages (from-to)538-549
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Issue number2
StatePublished - Mar 1 2019

Bibliographical note

Publisher Copyright:
© 2017 IEEE.


  • DNA copy number variants
  • Machine learning algorithms
  • bioinformatics
  • computational biology
  • group LASSO
  • population genetics
  • tree algorithms


Dive into the research topics of 'Detecting population-differentiation copy number variants in human population tree by sparse group selection'. Together they form a unique fingerprint.

Cite this