Many applications of machine learning involve analysis of sparse high-dimensional data, in which the number of input features is larger than the number of data samples. Standard inductive learning methods may not be sufficient for such data, and this provides motivation for nonstandard learning settings. This paper investigates a new learning methodology called learning through contradictions or Universum support vector machine (U-SVM). U-SVM incorporates a priori knowledge about application data, in the form of additional Universum samples, into the learning process. This paper investigates possible advantages of U-SVM versus standard SVM, and describes the practical conditions necessary for the effectiveness of the U-SVM. These conditions are based on the analysis of the univariate histograms of projections of training samples onto the normal direction vector of (standard) SVM decision boundary. Several empirical comparisons are presented to illustrate the practical utility of the proposed approach.
Bibliographical noteFunding Information:
Manuscript received February 24, 2010; accepted May 6, 2011. Date of publication June 30, 2011; date of current version August 3, 2011. This work was supported in part by the National Science Foundation under Grant ECCS-0802056, and by BioInformatics and Computational Biology, a grant from the University of Minnesota, Rochester.
- Learning through contradiction
- Universum SVM
- model selection
- support vector machines (SVMs)