Abstract
This paper presents a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE). Feature selection is an important step in machine learning. In an incremental feature selection approach, past approaches have attempted to increase class relevance while simultaneously minimizing redundancy with previously selected features. One example of such an approach is the feature selection method of minimum Redundancy Maximum Relevance (mRMR). The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. For the arrhythmia dataset, the proposed method achieves 30 percent higher sensitivity at the expense of 7 percent loss of specificity. For the Gisette dataset, the proposed method achieves 15 percent higher accuracy for Class 2, at the expense of 3 percent lower accuracy for Class 1. With respect to seizure prediction among 5 dogs and 2 humans, the proposed method achieves higher area-under-curve (AUC) for all subjects.
Original language | English (US) |
---|---|
Pages (from-to) | 1750-1764 |
Number of pages | 15 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 31 |
Issue number | 9 |
DOIs | |
State | Published - Sep 2019 |
Bibliographical note
Funding Information:The authors are grateful to Dr. Benjamin H. Brinkmann and Dr. Gregory A. Worrell from the Mayo Clinic for providing the ground truth of the testing data from the American Epilepsy Society Seizure Prediction Challenge database.
Publisher Copyright:
© 2018 IEEE.
Keywords
- Feature selection
- conditional entropy
- impurity
- mutual information
- redundancy
- relevance
- sample elimination
- uncertainty score