Binary support vector machines (SVMs) have been proven to deliver high performance. In multiclass classification, however, issues remain with respect to variable selection. One challenging issue is classification and variable selection in the presence of variables in the magnitude of thousands, greatly exceeding the size of training sample. This often occurs in genomics classification. To meet the challenge, this article proposes a novel multiclass support vector machine, which performs classification and variable selection simultaneously through an L1-norm penalized sparse representation. The proposed methodology, together with the developed regularization solution path, permits variable selection in such a situation. For the proposed methodology, a statistical learning theory is developed to quantify the generalization error in an attempt to gain insight into the basic structure of sparse learning, permitting the number of variables to greatly exceed the sample size. The operating characteristics of the methodology are examined through both simulated and benchmark data and are compared against some competitors in terms of accuracy of prediction. The numerical results suggest that the proposed methodology is highly competitive.
Bibliographical noteFunding Information:
Lifeng Wang is Postdoctoral Fellow, Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104 (E-mail: email@example.com). Xiaotong Shen is Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455 (E-mail: firstname.lastname@example.org). This research was supported in part by National Science Foundation grants IIS-0328802 and DMS-06-04394. The authors thank the joint editor, the associate editor, and three anonymous referees for helpful comments and suggestions.
- High-dimension but low sample size
- Margin classification
- Variable selection