Abstract
Collections of tumour genomes created by insertional mutagenesis experiments, e.g., the Retroviral Tagged Cancer Gene Database (RTCGD), can be analysed to find connections between mutations of specific genes and cancer. Such connections are found by identifying the locations of insertions or groups of insertions that frequently occur in the collection of tumour genomes. Recent work has employed a kernel density approach to find such commonly occurring insertions or co-occurring pairs of insertions. Unfortunately, this approach is extremely compute intensive for pairs of insertions and even more intractable for triples, etc. We present a technique that can efficiently find commonly co-occurring sets of insertions (or other genomic features) of any length by applying Association Analysis (AA) (frequent pattern mining) techniques from data mining. A comparison to the kernel density approach on RTCGD is provided, as well as results of the association approach on two other tumour data sets.
Original language | English (US) |
---|---|
Pages (from-to) | 65-82 |
Number of pages | 18 |
Journal | International Journal of Data Mining and Bioinformatics |
Volume | 10 |
Issue number | 1 |
DOIs | |
State | Published - 2014 |
Keywords
- AA
- Association analysis
- Bioinformatics
- Cancer genomes
- Frequent patterns
- Insertion identification
- Kernel density estimation
- Mutagenesis experiments
- Mutations
- Oncogenes
- Tumours