We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, ε-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approaches commonly used in SVM applications. In particular, we describe a new analytical prescription for setting the value of insensitive zone ε, as a function of training sample size. Good generalization performance of the proposed parameter selection is demonstrated empirically using several low- and high-dimensional regression problems. Further, we point out the importance of Vapnik's ε-insensitive loss for regression problems with finite samples. To this end, we compare generalization performance of SVM regression (using proposed selection of ε-values) with regression using 'least-modulus' loss (ε=0) and standard squared loss. These comparisons indicate superior generalization performance of SVM regression under sparse sample settings, for various types of additive noise.
Bibliographical noteFunding Information:
The authors thank Dr V. Vapnik for many useful discussions. This work was supported, in part, by NSF grant ECS-0099906.
- Complexity control
- Loss function
- Parameter selection
- Prediction accuracy
- Support vector machine regression
- VC theory