Identification of co-occurring insertions in cancer genomes using association analysis

Michael Steinbach, Sean Landman, Vipin Kumar, Haoyu Yu

Research output: Contribution to journalArticlepeer-review


Collections of tumour genomes created by insertional mutagenesis experiments, e.g., the Retroviral Tagged Cancer Gene Database (RTCGD), can be analysed to find connections between mutations of specific genes and cancer. Such connections are found by identifying the locations of insertions or groups of insertions that frequently occur in the collection of tumour genomes. Recent work has employed a kernel density approach to find such commonly occurring insertions or co-occurring pairs of insertions. Unfortunately, this approach is extremely compute intensive for pairs of insertions and even more intractable for triples, etc. We present a technique that can efficiently find commonly co-occurring sets of insertions (or other genomic features) of any length by applying Association Analysis (AA) (frequent pattern mining) techniques from data mining. A comparison to the kernel density approach on RTCGD is provided, as well as results of the association approach on two other tumour data sets.

Original languageEnglish (US)
Pages (from-to)65-82
Number of pages18
JournalInternational Journal of Data Mining and Bioinformatics
Issue number1
StatePublished - 2014


  • AA
  • Association analysis
  • Bioinformatics
  • Cancer genomes
  • Frequent patterns
  • Insertion identification
  • Kernel density estimation
  • Mutagenesis experiments
  • Mutations
  • Oncogenes
  • Tumours


Dive into the research topics of 'Identification of co-occurring insertions in cancer genomes using association analysis'. Together they form a unique fingerprint.

Cite this