TY - JOUR

T1 - Minimax nonparametric classification - Part I

T2 - Rates of convergence

AU - Yang, Yuhong

PY - 1999

Y1 - 1999

N2 - This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f, is assumed to be in a general nonparametric class. We show the minimax rate of convergence under square L2 loss is determined by the massiveness of the class as measured by metric entropy. The second part of the paper studies minimax classification. The loss of interest is the difference between the probability of misclassification of a classifier and that of the Bayes decision. As is well known, an upper bound on risk for estimating f gives an upper bound on the risk for classification, but the rate is known to be suboptimal for the class of monotone functions. This suggests that one does not have to estimate f well in order to classify well. However, we show that the two problems are in fact of the same difficulty in terms of rates of convergence under a sufficient condition, which is satisfied by many function classes including Besov (Sobolev), Lipschitz, and bounded variation. This is somewhat surprising in view of a result of Devroye, Gyorfi, and Lugosi (1996).

AB - This paper studies minimax aspects of nonparametric classification. We first study minimax estimation of the conditional probability of a class label, given the feature variable. This function, say f, is assumed to be in a general nonparametric class. We show the minimax rate of convergence under square L2 loss is determined by the massiveness of the class as measured by metric entropy. The second part of the paper studies minimax classification. The loss of interest is the difference between the probability of misclassification of a classifier and that of the Bayes decision. As is well known, an upper bound on risk for estimating f gives an upper bound on the risk for classification, but the rate is known to be suboptimal for the class of monotone functions. This suggests that one does not have to estimate f well in order to classify well. However, we show that the two problems are in fact of the same difficulty in terms of rates of convergence under a sufficient condition, which is satisfied by many function classes including Besov (Sobolev), Lipschitz, and bounded variation. This is somewhat surprising in view of a result of Devroye, Gyorfi, and Lugosi (1996).

UR - http://www.scopus.com/inward/record.url?scp=0033321586&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033321586&partnerID=8YFLogxK

U2 - 10.1109/18.796368

DO - 10.1109/18.796368

M3 - Article

AN - SCOPUS:0033321586

VL - 45

SP - 2271

EP - 2284

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

SN - 0018-9448

IS - 7

ER -