TY - JOUR
T1 - Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms
AU - Tai, Feng
AU - Pan, Wei
N1 - Funding Information:
This research was partially supported by NIH grant HL65462 and a UM AHC Faculty Research Development grant.
PY - 2007/7/15
Y1 - 2007/7/15
N2 - Motivation: In the context of sample (e.g. tumor) classifications with microarray gene expression data, many methods have been proposed. However, almost all the methods ignore existing biological knowledge and treat all the genes equally a priori. On the other hand, because some genes have been identified by previous studies to have biological functions or to be involved in pathways related to the outcome (e.g. cancer), incorporating this type of prior knowledge into a classifier can potentially improve both the predictive performance and interpretability of the resulting model. Results: We propose a simple and general framework to incorporate such prior knowledge into building a penalized classifier. As two concrete examples, we apply the idea to two penalized classifiers, nearest shrunken centroids (also called PAM) and penalized partial least squares (PPLS). Instead of treating all the genes equally a priori as in standard penalized methods, we group the genes according to their functional associations based on existing biological knowledge or data, and adopt group-specific penalty terms and penalization parameters. Simulated and real data examples demonstrate that, if prior knowledge on gene grouping is indeed informative, our new methods perform better than the two standard penalized methods, yielding higher predictive accuracy and screening out more irrelevant genes.
AB - Motivation: In the context of sample (e.g. tumor) classifications with microarray gene expression data, many methods have been proposed. However, almost all the methods ignore existing biological knowledge and treat all the genes equally a priori. On the other hand, because some genes have been identified by previous studies to have biological functions or to be involved in pathways related to the outcome (e.g. cancer), incorporating this type of prior knowledge into a classifier can potentially improve both the predictive performance and interpretability of the resulting model. Results: We propose a simple and general framework to incorporate such prior knowledge into building a penalized classifier. As two concrete examples, we apply the idea to two penalized classifiers, nearest shrunken centroids (also called PAM) and penalized partial least squares (PPLS). Instead of treating all the genes equally a priori as in standard penalized methods, we group the genes according to their functional associations based on existing biological knowledge or data, and adopt group-specific penalty terms and penalization parameters. Simulated and real data examples demonstrate that, if prior knowledge on gene grouping is indeed informative, our new methods perform better than the two standard penalized methods, yielding higher predictive accuracy and screening out more irrelevant genes.
UR - http://www.scopus.com/inward/record.url?scp=34547887978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547887978&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btm234
DO - 10.1093/bioinformatics/btm234
M3 - Article
C2 - 17483507
AN - SCOPUS:34547887978
SN - 1367-4803
VL - 23
SP - 1775
EP - 1782
JO - Bioinformatics
JF - Bioinformatics
IS - 14
ER -