TY - JOUR
T1 - Transfer learning across cancers on DNA copy number variation analysis
AU - Zhang, Huanan
AU - Tian, Ze
AU - Kuang, Rui
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - DNA copy number variations (CNVs) are prevalent in all types of tumors. It is still a challenge to study how CNVs play a role in driving tumorgenic mechanisms that are either universal or specific in different cancer types. To address the problem, we introduce a transfer learning framework to discover common CNVs shared across different tumor types as well as CNVs specific to each tumor type from genome-wide CNV data measured by array CGH and SNP genotyping array. The proposed model, namely Transfer Learning with Fused LASSO (TLFL), detects latent CNV components from multiple CNV datasets of different tumor types to distinguish the CNVs that are common across the datasets and those that are specific in each dataset. Both the common and type-specific CNVs are detected as latent components in matrix factorization coupled with fused LASSO on adjacent CNV probe features. TLFL considers the common latent components underlying the multiple datasets to transfer knowledge across different tumor types. In simulations and experiments on real cancer CNV datasets, TLFL detected better latent components that can be used as features to improve classification of patient samples in each individual dataset compared with the model without the knowledge transfer. In cross-dataset analysis on bladder cancer and cross-domain analysis on breast cancer and ovarian cancer, TLFL also learned latent CNV components that are both predictive of tumor stages and correlate with known cancer genes.
AB - DNA copy number variations (CNVs) are prevalent in all types of tumors. It is still a challenge to study how CNVs play a role in driving tumorgenic mechanisms that are either universal or specific in different cancer types. To address the problem, we introduce a transfer learning framework to discover common CNVs shared across different tumor types as well as CNVs specific to each tumor type from genome-wide CNV data measured by array CGH and SNP genotyping array. The proposed model, namely Transfer Learning with Fused LASSO (TLFL), detects latent CNV components from multiple CNV datasets of different tumor types to distinguish the CNVs that are common across the datasets and those that are specific in each dataset. Both the common and type-specific CNVs are detected as latent components in matrix factorization coupled with fused LASSO on adjacent CNV probe features. TLFL considers the common latent components underlying the multiple datasets to transfer knowledge across different tumor types. In simulations and experiments on real cancer CNV datasets, TLFL detected better latent components that can be used as features to improve classification of patient samples in each individual dataset compared with the model without the knowledge transfer. In cross-dataset analysis on bladder cancer and cross-domain analysis on breast cancer and ovarian cancer, TLFL also learned latent CNV components that are both predictive of tumor stages and correlate with known cancer genes.
KW - Cancer Genomics
KW - DNA Copy Number
KW - Fused LASSO Components
KW - Transfer Learning
UR - http://www.scopus.com/inward/record.url?scp=84894672836&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894672836&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2013.58
DO - 10.1109/ICDM.2013.58
M3 - Conference article
AN - SCOPUS:84894672836
SN - 1550-4786
SP - 1283
EP - 1288
JO - Proceedings - IEEE International Conference on Data Mining, ICDM
JF - Proceedings - IEEE International Conference on Data Mining, ICDM
M1 - 6729635
T2 - 13th IEEE International Conference on Data Mining, ICDM 2013
Y2 - 7 December 2013 through 10 December 2013
ER -