TY - JOUR
T1 - Generalization error for multi-class margin classification
AU - Shen, Xiaotong
AU - Wang, Lifeng
PY - 2007
Y1 - 2007
N2 - In this article, we study rates of convergence of the generalization error of multi-class margin classifiers. In particular, we develop an upper bound theory quantifying the generalization error of various large margin classifiers. The theory permits a treatment of general margin losses, convex or nonconvex, in presence or absence of a dominating class. Three main results are established. First, for any fixed margin loss, there may be a trade-off between the ideal and actual generalization performances with respect to the choice of the class of candidate decision functions, which is governed by the trade-off between the approximation and estimation errors. In fact, different margin losses lead to different ideal or actual performances in specific cases. Second, we demonstrate, in a problem of linear learning, that the convergence rate can be arbitrarily fast in the sample size n depending on the joint distribution of the input/output pair. This goes beyond the anticipated rate O(n−1). Third, we establish rates of convergence of several margin classifiers in feature selection with the number of candidate variables p allowed to greatly exceed the sample size n but no faster than exp(n).
AB - In this article, we study rates of convergence of the generalization error of multi-class margin classifiers. In particular, we develop an upper bound theory quantifying the generalization error of various large margin classifiers. The theory permits a treatment of general margin losses, convex or nonconvex, in presence or absence of a dominating class. Three main results are established. First, for any fixed margin loss, there may be a trade-off between the ideal and actual generalization performances with respect to the choice of the class of candidate decision functions, which is governed by the trade-off between the approximation and estimation errors. In fact, different margin losses lead to different ideal or actual performances in specific cases. Second, we demonstrate, in a problem of linear learning, that the convergence rate can be arbitrarily fast in the sample size n depending on the joint distribution of the input/output pair. This goes beyond the anticipated rate O(n−1). Third, we establish rates of convergence of several margin classifiers in feature selection with the number of candidate variables p allowed to greatly exceed the sample size n but no faster than exp(n).
KW - Convex and nonconvex losses
KW - Import vector machines
KW - Small n and large p
KW - Sparse learning
KW - Support vector machines
KW - ψ-learning
UR - http://www.scopus.com/inward/record.url?scp=59349112940&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=59349112940&partnerID=8YFLogxK
U2 - 10.1214/07-EJS069
DO - 10.1214/07-EJS069
M3 - Article
AN - SCOPUS:59349112940
SN - 1935-7524
VL - 1
SP - 307
EP - 330
JO - Electronic Journal of Statistics
JF - Electronic Journal of Statistics
ER -