MUSE: Minimum Uncertainty and Sample Elimination Based Binary Feature Selection

Zisheng Zhang, Keshab K. Parhi

Research output: Contribution to journalArticle

5 Scopus citations

Abstract

This paper presents a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE). Feature selection is an important step in machine learning. In an incremental feature selection approach, past approaches have attempted to increase class relevance while simultaneously minimizing redundancy with previously selected features. One example of such an approach is the feature selection method of minimum Redundancy Maximum Relevance (mRMR). The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. For the arrhythmia dataset, the proposed method achieves 30% higher sensitivity at the expense of 7% loss of specificity. For the Gisette dataset, the proposed method achieves 15% higher accuracy for Class 2, at the expense of 3% lower accuracy for Class 1. With respect to seizure prediction among 5 dogs and 2 humans, the proposed method achieves higher area-under-curve (AUC) for all subjects.

Original languageEnglish (US)
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
StateAccepted/In press - Aug 16 2018

    Fingerprint

Keywords

  • Feature selection
  • conditional entropy
  • impurity
  • mutual information
  • redundancy
  • relevance
  • sample elimination
  • uncertainty score

Cite this