TY - JOUR
T1 - A three-stage framework for gene expression data analysis by L1-norm support vector regression.
AU - Kim, Hyunsoo
AU - Zhou, Jeff X.
AU - Morse, Herbert C.
AU - Park, Haesun
PY - 2005
Y1 - 2005
N2 - The identification of discriminative genes for categorical phenotypes in microarray gene expression data analysis has been extensively studied, especially for disease diagnosis. In recent biological experiments, continuous phenotypes have also been dealt with. For example, the extent of programmed cell death (apoptosis) can be measured by the level of caspase 3 enzyme. Thus, an effective gene selection method for continuous phenotypes is desirable. In this paper, we describe a three-stage framework for gene expression data analysis based on L1-norm support vector regression (L1-SVR). The first stage ranks genes by recursive multiple feature elimination based on L1-SVR. In the second stage, the minimal genes are determined by a kernel regression, which yields the lowest ten-fold cross-validation error. In the last stage, the final non-linear regression model is built with the minimal genes and optimal parameters found by leave-one-out cross-validation. The experimental results show a significant improvement over the current state-of-the-art approach, i.e., the two-stage process, which consists of the gene selection based on L1-SVR and the third stage of the proposed method.
AB - The identification of discriminative genes for categorical phenotypes in microarray gene expression data analysis has been extensively studied, especially for disease diagnosis. In recent biological experiments, continuous phenotypes have also been dealt with. For example, the extent of programmed cell death (apoptosis) can be measured by the level of caspase 3 enzyme. Thus, an effective gene selection method for continuous phenotypes is desirable. In this paper, we describe a three-stage framework for gene expression data analysis based on L1-norm support vector regression (L1-SVR). The first stage ranks genes by recursive multiple feature elimination based on L1-SVR. In the second stage, the minimal genes are determined by a kernel regression, which yields the lowest ten-fold cross-validation error. In the last stage, the final non-linear regression model is built with the minimal genes and optimal parameters found by leave-one-out cross-validation. The experimental results show a significant improvement over the current state-of-the-art approach, i.e., the two-stage process, which consists of the gene selection based on L1-SVR and the third stage of the proposed method.
UR - http://www.scopus.com/inward/record.url?scp=38449116819&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38449116819&partnerID=8YFLogxK
U2 - 10.1504/IJBRA.2005.006902
DO - 10.1504/IJBRA.2005.006902
M3 - Article
C2 - 18048121
AN - SCOPUS:38449116819
SN - 1744-5485
VL - 1
SP - 51
EP - 62
JO - International Journal of Bioinformatics Research and Applications
JF - International Journal of Bioinformatics Research and Applications
IS - 1
ER -