Many real-world problems involve learning models for rare classes in situations where there are no gold standard labels for training samples but imperfect labels are available for all instances. In this paper, we present RAPT, a three step predictive modeling framework for classifying rare class in such problem settings. The first step of the proposed framework learns a classifier that jointly optimizes precision and recall by only using imperfectly labeled training samples. We also show that, under certain assumptions on the imperfect labels, the quality of this classifier is almost as good as the one constructed using perfect labels. The second and third steps of the framework make use of the fact that imperfect labels are available for all instances to further improve the precision and recall of the rare class. We evaluate the RAPT framework on two real-world applications of mapping forest fires and urban extent from earth observing satellite data. The experimental results indicate that RAPT can be used to identify forest fires and urban areas with high precision and recall by using imperfect labels, even though obtaining expert annotated samples on a global scale is infeasible in these applications.
|Original language||English (US)|
|Number of pages||14|
|Journal||IEEE Transactions on Knowledge and Data Engineering|
|State||Published - Nov 1 2017|
Bibliographical noteFunding Information:
This research was supported by a U of M Dissertation Fellowship, US National Science Foundation Grants IIS-1029711 and IIS-0905581, and NASA grant NNX12AP37G.
© 2017 IEEE.
- Classification algorithms
- machine learning algorithms