TY - GEN
T1 - Estimation of false negatives in classification
AU - Mane, Sandeep
AU - Srivastava, Jaideep
AU - Hwang, San Yih
AU - Vayghan, Jamshid
PY - 2004
Y1 - 2004
N2 - In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier. However, the high costs as well as time constraints on an expert's time prevent further analysis of the "predicted false" class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI Machine Learning Repository are presented.
AB - In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier. However, the high costs as well as time constraints on an expert's time prevent further analysis of the "predicted false" class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI Machine Learning Repository are presented.
UR - http://www.scopus.com/inward/record.url?scp=19544363935&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=19544363935&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2004.10048
DO - 10.1109/ICDM.2004.10048
M3 - Conference contribution
AN - SCOPUS:19544363935
SN - 0769521428
SN - 9780769521428
T3 - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
SP - 475
EP - 478
BT - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
A2 - Rastogi, R.
A2 - Morik, K.
A2 - Bramer, M.
A2 - Wu, X.
T2 - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
Y2 - 1 November 2004 through 4 November 2004
ER -