TY - GEN
T1 - A novel combinatorial score for feature selection with P-Tree in DNA microarray data analysis
AU - Wang, Yan
AU - Lu, Tingda
AU - Perrizo, William
PY - 2010/12/1
Y1 - 2010/12/1
N2 - DNA microarray experiments are being used to gather information from tissue and cell samples by generating thousands of gene expression measurements. Many researchers are conducting researches regarding gene expression differences, which is useful in disease diagnose, outcome prediction, cancer type classification and etc. In mining high-dimensional microarray data, feature selection is an important pre-processing stage. In the literature nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose a novel score using the label correlation in combination with the correlation between features. We design a Combinatorial Score feature selection algorithm in P-Tree structure and combine it with K-Nearest-Neighbor algorithm for breast cancer clinic metastasis time prediction. Our experiments suggest that our Combinatorial Score feature selection algorithm can find a subset of genes with high computation efficiency and significant performance for breast cancer clinical metastasis prediction.
AB - DNA microarray experiments are being used to gather information from tissue and cell samples by generating thousands of gene expression measurements. Many researchers are conducting researches regarding gene expression differences, which is useful in disease diagnose, outcome prediction, cancer type classification and etc. In mining high-dimensional microarray data, feature selection is an important pre-processing stage. In the literature nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose a novel score using the label correlation in combination with the correlation between features. We design a Combinatorial Score feature selection algorithm in P-Tree structure and combine it with K-Nearest-Neighbor algorithm for breast cancer clinic metastasis time prediction. Our experiments suggest that our Combinatorial Score feature selection algorithm can find a subset of genes with high computation efficiency and significant performance for breast cancer clinical metastasis prediction.
UR - http://www.scopus.com/inward/record.url?scp=84883643376&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84883643376&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84883643376
SN - 9781617386077
T3 - 19th International Conference on Software Engineering and Data Engineering 2010, SEDE 2010
SP - 295
EP - 299
BT - 19th International Conference on Software Engineering and Data Engineering 2010, SEDE 2010
T2 - 19th International Conference on Software Engineering and Data Engineering 2010, SEDE 2010
Y2 - 16 June 2010 through 18 June 2010
ER -