TY - JOUR
T1 - Invisible fence methods and the identification of differentially expressed gene sets
AU - Jiang, Jiming
AU - Nguyen, Thuan
AU - Rao, J. Sunil
PY - 2011
Y1 - 2011
N2 - The fence method (Jiang et al. 2008; Ann. Statist. 36, 1669-1692) is a recently developed strategy for model selection. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from amongst those within the fence according to a criterion which can be made flexible. The construction of the fence can be made adaptively to improve finite sample performance. We extend the fence method to situations where a true model may not exist or be among the candidate models. Furthermore, another look at the fence methods leads to a new procedure, known as invisible fence (IF). A fast algorithm is developed for IF in the case of subtractive measure of lack-of-fit. The main focus of the current paper is microarray gene-set analysis. In particular, Efron and Tibshirani (2007; Ann. Appl. Statist. 1, 107-129) proposed a gene set analysis (GSA) method based on testing the significance of gene-sets. In typical situations of microarray experiments the number of genes is much larger than the number of microarrays. This special feature presents a real challenge to implementation of IF to microarray gene-set analysis. We show how to solve this problem in this paper, and carry out an extensive Monte Carlo simulation study that compares the performances of IF and GSA in identifying differentially expressed gene-sets. The results show that IF outperforms GSA, in most cases significantly, uniformly across all the cases considered. Furthermore, we demonstrate both theoretically and empirically the consistency property of IF, while pointing out the inconsistency of GSA under certain situations. An application in tracking pathway involvement in late vs earlier stage colon cancers is considered.
AB - The fence method (Jiang et al. 2008; Ann. Statist. 36, 1669-1692) is a recently developed strategy for model selection. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from amongst those within the fence according to a criterion which can be made flexible. The construction of the fence can be made adaptively to improve finite sample performance. We extend the fence method to situations where a true model may not exist or be among the candidate models. Furthermore, another look at the fence methods leads to a new procedure, known as invisible fence (IF). A fast algorithm is developed for IF in the case of subtractive measure of lack-of-fit. The main focus of the current paper is microarray gene-set analysis. In particular, Efron and Tibshirani (2007; Ann. Appl. Statist. 1, 107-129) proposed a gene set analysis (GSA) method based on testing the significance of gene-sets. In typical situations of microarray experiments the number of genes is much larger than the number of microarrays. This special feature presents a real challenge to implementation of IF to microarray gene-set analysis. We show how to solve this problem in this paper, and carry out an extensive Monte Carlo simulation study that compares the performances of IF and GSA in identifying differentially expressed gene-sets. The results show that IF outperforms GSA, in most cases significantly, uniformly across all the cases considered. Furthermore, we demonstrate both theoretically and empirically the consistency property of IF, while pointing out the inconsistency of GSA under certain situations. An application in tracking pathway involvement in late vs earlier stage colon cancers is considered.
KW - Fast algorithm
KW - Finite sample performance
KW - Invisible fence
KW - Limited bootstrap
KW - Microarray gene set analysis
KW - Model selection
KW - Signal-consistency
KW - Subtractive measure
UR - http://www.scopus.com/inward/record.url?scp=84858063817&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858063817&partnerID=8YFLogxK
U2 - 10.4310/SII.2011.v4.n3.a14
DO - 10.4310/SII.2011.v4.n3.a14
M3 - Article
AN - SCOPUS:84858063817
SN - 1938-7989
VL - 4
SP - 403
EP - 415
JO - Statistics and its Interface
JF - Statistics and its Interface
IS - 3
ER -