TY - GEN
T1 - A theoretical characterization of linear SVM-based feature selection
AU - Hardin, Douglas
AU - Tsamardinos, Ioannis
AU - Aliferis, Constantin F.
PY - 2004/12/1
Y1 - 2004/12/1
N2 - Most prevalent techniques in Support Vector Machine (SVM) feature selection are based on the intuition that the weights of features that are close to zero are not required for optimal classification. In this paper we show that indeed, in the sample limit, the irrelevant variables (in a theoretical and optimal sense) will be given zero weight by a linear SVM, both in the soft and the hard margin case. However, SVM-based methods have certain theoretical disadvantages too. We present examples where the linear SVM may assign zero weights to strongly relevant variables (i.e., variables required for optimal estimation of the distribution of the target variable) and where weakly relevant features (i.e., features that are superfluous for optimal feature selection given other features) may get non-zero weights. We contrast and theoretically compare with Markov-Blanket based feature selection algorithms that do not have such disadvantages in a broad class of distributions and could also be used for causal discovery.
AB - Most prevalent techniques in Support Vector Machine (SVM) feature selection are based on the intuition that the weights of features that are close to zero are not required for optimal classification. In this paper we show that indeed, in the sample limit, the irrelevant variables (in a theoretical and optimal sense) will be given zero weight by a linear SVM, both in the soft and the hard margin case. However, SVM-based methods have certain theoretical disadvantages too. We present examples where the linear SVM may assign zero weights to strongly relevant variables (i.e., variables required for optimal estimation of the distribution of the target variable) and where weakly relevant features (i.e., features that are superfluous for optimal feature selection given other features) may get non-zero weights. We contrast and theoretically compare with Markov-Blanket based feature selection algorithms that do not have such disadvantages in a broad class of distributions and could also be used for causal discovery.
UR - http://www.scopus.com/inward/record.url?scp=14344264951&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=14344264951&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:14344264951
SN - 1581138385
SN - 9781581138382
T3 - Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
SP - 377
EP - 384
BT - Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
A2 - Greiner, R.
A2 - Schuurmans, D.
T2 - Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
Y2 - 4 July 2004 through 8 July 2004
ER -