SubPatCNV: Approximate subspace pattern mining for mapping copy-number variations

Nicholas Johnson, Huanan Zhang, Gang Fang, Vipin Kumar, Rui Kuang

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as "Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?". Results: We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data. Conclusions: SubPatCNV available through http://sourceforge.net/projects/subpatcnv/ is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data.

Original languageEnglish (US)
Article number16
JournalBMC bioinformatics
Volume16
Issue number1
DOIs
StatePublished - Jan 16 2015

Keywords

  • Approximate pattern mining
  • Cancer
  • DNA copy-number variations
  • HapMap

Fingerprint Dive into the research topics of 'SubPatCNV: Approximate subspace pattern mining for mapping copy-number variations'. Together they form a unique fingerprint.

Cite this