TY - GEN
T1 - Integrative biomarker discovery for breast cancer metastasis from gene expression and protein interaction data using error-tolerant pattern mining
AU - Gupta, Rohit
AU - Agrawal, Smita
AU - Rao, Navneet
AU - Tian, Ze
AU - Kuang, Rui
AU - Kumar, Vipin
PY - 2010
Y1 - 2010
N2 - Biomarker discovery for complex diseases is a challenging problem. Most of the existing approaches identify individual genes as disease markers, thereby missing the interactions among genes. Moreover, often only single biological data source is used to discover biomarkers. These factors account for the discovery of inconsistent biomark-ers. In this paper, we propose a novel error-tolerant pattern mining approach for integrated analysis of gene expression and protein interaction data. This integrated approach incorporates constraints from protein interaction network and efficiently discovers patterns (groups of genes) in a bottom-up fashion from the gene-expression data. We call these patterns active sub-network biomarkers. To illustrate the efficacy of our proposed approach, we used four breast cancer gene expression data sets and a human protein interaction network and showed that active sub-network biomark-ers are more biologically plausible and genes discovered are more reproducible across studies. Finally, through pathway analysis, we also showed a substantial enrichment for known cancer genes and hence were able to generate relevant hypotheses for understanding the molecular mechanisms of breast cancer metastasis.
AB - Biomarker discovery for complex diseases is a challenging problem. Most of the existing approaches identify individual genes as disease markers, thereby missing the interactions among genes. Moreover, often only single biological data source is used to discover biomarkers. These factors account for the discovery of inconsistent biomark-ers. In this paper, we propose a novel error-tolerant pattern mining approach for integrated analysis of gene expression and protein interaction data. This integrated approach incorporates constraints from protein interaction network and efficiently discovers patterns (groups of genes) in a bottom-up fashion from the gene-expression data. We call these patterns active sub-network biomarkers. To illustrate the efficacy of our proposed approach, we used four breast cancer gene expression data sets and a human protein interaction network and showed that active sub-network biomark-ers are more biologically plausible and genes discovered are more reproducible across studies. Finally, through pathway analysis, we also showed a substantial enrichment for known cancer genes and hence were able to generate relevant hypotheses for understanding the molecular mechanisms of breast cancer metastasis.
UR - http://www.scopus.com/inward/record.url?scp=80052672832&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052672832&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80052672832
SN - 9781617381119
T3 - 2nd International Conference on Bioinformatics and Computational Biology 2010, BICoB 2010
SP - 171
EP - 176
BT - 2nd International Conference on Bioinformatics and Computational Biology 2010, BICoB 2010
T2 - 2nd International Conference on Bioinformatics and Computational Biology 2010, BICoB 2010
Y2 - 24 March 2010 through 26 March 2010
ER -