MUSE

Minimum Uncertainty and Sample Elimination Based Binary Feature Selection

Zisheng Zhang, Keshab K Parhi

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper presents a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE). Feature selection is an important step in machine learning. In an incremental feature selection approach, past approaches have attempted to increase class relevance while simultaneously minimizing redundancy with previously selected features. One example of such an approach is the feature selection method of minimum Redundancy Maximum Relevance (mRMR). The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. For the arrhythmia dataset, the proposed method achieves 30% higher sensitivity at the expense of 7% loss of specificity. For the Gisette dataset, the proposed method achieves 15% higher accuracy for Class 2, at the expense of 3% lower accuracy for Class 1. With respect to seizure prediction among 5 dogs and 2 humans, the proposed method achieves higher area-under-curve (AUC) for all subjects.

Original languageEnglish (US)
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
StateAccepted/In press - Aug 16 2018

Fingerprint

Feature extraction
Redundancy
Bins
Impurities
Learning systems
Uncertainty
Entropy

Keywords

  • conditional entropy
  • Feature selection
  • impurity
  • mutual information
  • redundancy
  • relevance
  • sample elimination
  • uncertainty score

Cite this

@article{45e74deb12e0456093b9eda63a3e7011,
title = "MUSE: Minimum Uncertainty and Sample Elimination Based Binary Feature Selection",
abstract = "This paper presents a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE). Feature selection is an important step in machine learning. In an incremental feature selection approach, past approaches have attempted to increase class relevance while simultaneously minimizing redundancy with previously selected features. One example of such an approach is the feature selection method of minimum Redundancy Maximum Relevance (mRMR). The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. For the arrhythmia dataset, the proposed method achieves 30{\%} higher sensitivity at the expense of 7{\%} loss of specificity. For the Gisette dataset, the proposed method achieves 15{\%} higher accuracy for Class 2, at the expense of 3{\%} lower accuracy for Class 1. With respect to seizure prediction among 5 dogs and 2 humans, the proposed method achieves higher area-under-curve (AUC) for all subjects.",
keywords = "conditional entropy, Feature selection, impurity, mutual information, redundancy, relevance, sample elimination, uncertainty score",
author = "Zisheng Zhang and Parhi, {Keshab K}",
year = "2018",
month = "8",
day = "16",
doi = "10.1109/TKDE.2018.2865778",
language = "English (US)",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - MUSE

T2 - Minimum Uncertainty and Sample Elimination Based Binary Feature Selection

AU - Zhang, Zisheng

AU - Parhi, Keshab K

PY - 2018/8/16

Y1 - 2018/8/16

N2 - This paper presents a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE). Feature selection is an important step in machine learning. In an incremental feature selection approach, past approaches have attempted to increase class relevance while simultaneously minimizing redundancy with previously selected features. One example of such an approach is the feature selection method of minimum Redundancy Maximum Relevance (mRMR). The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. For the arrhythmia dataset, the proposed method achieves 30% higher sensitivity at the expense of 7% loss of specificity. For the Gisette dataset, the proposed method achieves 15% higher accuracy for Class 2, at the expense of 3% lower accuracy for Class 1. With respect to seizure prediction among 5 dogs and 2 humans, the proposed method achieves higher area-under-curve (AUC) for all subjects.

AB - This paper presents a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE). Feature selection is an important step in machine learning. In an incremental feature selection approach, past approaches have attempted to increase class relevance while simultaneously minimizing redundancy with previously selected features. One example of such an approach is the feature selection method of minimum Redundancy Maximum Relevance (mRMR). The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. For the arrhythmia dataset, the proposed method achieves 30% higher sensitivity at the expense of 7% loss of specificity. For the Gisette dataset, the proposed method achieves 15% higher accuracy for Class 2, at the expense of 3% lower accuracy for Class 1. With respect to seizure prediction among 5 dogs and 2 humans, the proposed method achieves higher area-under-curve (AUC) for all subjects.

KW - conditional entropy

KW - Feature selection

KW - impurity

KW - mutual information

KW - redundancy

KW - relevance

KW - sample elimination

KW - uncertainty score

UR - http://www.scopus.com/inward/record.url?scp=85051792682&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051792682&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2865778

DO - 10.1109/TKDE.2018.2865778

M3 - Article

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

ER -