Group Learning for High-Dimensional Sparse Data

Vladimir Cherkassky, Hsiang Han Chen, Han Tai Shiao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe new methodology for supervised learning with sparse data, i.e., when the number of input features is (much) larger than the number of training samples (n). Under the proposed approach, all available (d) input features are split into several (t) subsets, effectively resulting in a larger number (t*n) of labeled training samples in lower-dimensional input space (of dimensionality d/t). This (modified) training data is then used to estimate a classifier for making predictions in lower-dimensional space. In this paper, standard SVM is used for training a classifier. During testing (prediction), a group of t predictions made by SVM classifier needs to be combined via intelligent post-processing rules, in order to make a prediction for a test input (in the original d-dimensional space). The novelty of our approach is in the design and empirical validation of these post-processing rules under Group Learning setting. We demonstrate that such post-processing rules effectively reflect general (common-sense) a priori knowledge (about application data). Specifically, we propose two different post-processing schemes and demonstrate their effectiveness for two real-life application domains, i.e., handwritten digit recognition and seizure prediction from iEEG signal. These empirical results show superior performance of the Group Learning approach for sparse data, under both balanced and unbalanced classification settings

Original languageEnglish (US)
Title of host publication2019 International Joint Conference on Neural Networks, IJCNN 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728119854
DOIs
StatePublished - Jul 2019
Event2019 International Joint Conference on Neural Networks, IJCNN 2019 - Budapest, Hungary
Duration: Jul 14 2019Jul 19 2019

Publication series

NameProceedings of the International Joint Conference on Neural Networks
Volume2019-July

Conference

Conference2019 International Joint Conference on Neural Networks, IJCNN 2019
CountryHungary
CityBudapest
Period7/14/197/19/19

Fingerprint

Classifiers
Processing
Supervised learning
Testing

Keywords

  • binary classification
  • digit recognition
  • feature selection
  • Group Learning
  • histogram of projections
  • iEEG
  • seizure prediction
  • SVM
  • unbalanced data.

Cite this

Cherkassky, V., Chen, H. H., & Shiao, H. T. (2019). Group Learning for High-Dimensional Sparse Data. In 2019 International Joint Conference on Neural Networks, IJCNN 2019 [8852183] (Proceedings of the International Joint Conference on Neural Networks; Vol. 2019-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN.2019.8852183

Group Learning for High-Dimensional Sparse Data. / Cherkassky, Vladimir; Chen, Hsiang Han; Shiao, Han Tai.

2019 International Joint Conference on Neural Networks, IJCNN 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 8852183 (Proceedings of the International Joint Conference on Neural Networks; Vol. 2019-July).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cherkassky, V, Chen, HH & Shiao, HT 2019, Group Learning for High-Dimensional Sparse Data. in 2019 International Joint Conference on Neural Networks, IJCNN 2019., 8852183, Proceedings of the International Joint Conference on Neural Networks, vol. 2019-July, Institute of Electrical and Electronics Engineers Inc., 2019 International Joint Conference on Neural Networks, IJCNN 2019, Budapest, Hungary, 7/14/19. https://doi.org/10.1109/IJCNN.2019.8852183
Cherkassky V, Chen HH, Shiao HT. Group Learning for High-Dimensional Sparse Data. In 2019 International Joint Conference on Neural Networks, IJCNN 2019. Institute of Electrical and Electronics Engineers Inc. 2019. 8852183. (Proceedings of the International Joint Conference on Neural Networks). https://doi.org/10.1109/IJCNN.2019.8852183
Cherkassky, Vladimir ; Chen, Hsiang Han ; Shiao, Han Tai. / Group Learning for High-Dimensional Sparse Data. 2019 International Joint Conference on Neural Networks, IJCNN 2019. Institute of Electrical and Electronics Engineers Inc., 2019. (Proceedings of the International Joint Conference on Neural Networks).
@inproceedings{9444a57b11e2463ba324a6a6acb26c69,
title = "Group Learning for High-Dimensional Sparse Data",
abstract = "We describe new methodology for supervised learning with sparse data, i.e., when the number of input features is (much) larger than the number of training samples (n). Under the proposed approach, all available (d) input features are split into several (t) subsets, effectively resulting in a larger number (t*n) of labeled training samples in lower-dimensional input space (of dimensionality d/t). This (modified) training data is then used to estimate a classifier for making predictions in lower-dimensional space. In this paper, standard SVM is used for training a classifier. During testing (prediction), a group of t predictions made by SVM classifier needs to be combined via intelligent post-processing rules, in order to make a prediction for a test input (in the original d-dimensional space). The novelty of our approach is in the design and empirical validation of these post-processing rules under Group Learning setting. We demonstrate that such post-processing rules effectively reflect general (common-sense) a priori knowledge (about application data). Specifically, we propose two different post-processing schemes and demonstrate their effectiveness for two real-life application domains, i.e., handwritten digit recognition and seizure prediction from iEEG signal. These empirical results show superior performance of the Group Learning approach for sparse data, under both balanced and unbalanced classification settings",
keywords = "binary classification, digit recognition, feature selection, Group Learning, histogram of projections, iEEG, seizure prediction, SVM, unbalanced data.",
author = "Vladimir Cherkassky and Chen, {Hsiang Han} and Shiao, {Han Tai}",
year = "2019",
month = "7",
doi = "10.1109/IJCNN.2019.8852183",
language = "English (US)",
series = "Proceedings of the International Joint Conference on Neural Networks",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2019 International Joint Conference on Neural Networks, IJCNN 2019",

}

TY - GEN

T1 - Group Learning for High-Dimensional Sparse Data

AU - Cherkassky, Vladimir

AU - Chen, Hsiang Han

AU - Shiao, Han Tai

PY - 2019/7

Y1 - 2019/7

N2 - We describe new methodology for supervised learning with sparse data, i.e., when the number of input features is (much) larger than the number of training samples (n). Under the proposed approach, all available (d) input features are split into several (t) subsets, effectively resulting in a larger number (t*n) of labeled training samples in lower-dimensional input space (of dimensionality d/t). This (modified) training data is then used to estimate a classifier for making predictions in lower-dimensional space. In this paper, standard SVM is used for training a classifier. During testing (prediction), a group of t predictions made by SVM classifier needs to be combined via intelligent post-processing rules, in order to make a prediction for a test input (in the original d-dimensional space). The novelty of our approach is in the design and empirical validation of these post-processing rules under Group Learning setting. We demonstrate that such post-processing rules effectively reflect general (common-sense) a priori knowledge (about application data). Specifically, we propose two different post-processing schemes and demonstrate their effectiveness for two real-life application domains, i.e., handwritten digit recognition and seizure prediction from iEEG signal. These empirical results show superior performance of the Group Learning approach for sparse data, under both balanced and unbalanced classification settings

AB - We describe new methodology for supervised learning with sparse data, i.e., when the number of input features is (much) larger than the number of training samples (n). Under the proposed approach, all available (d) input features are split into several (t) subsets, effectively resulting in a larger number (t*n) of labeled training samples in lower-dimensional input space (of dimensionality d/t). This (modified) training data is then used to estimate a classifier for making predictions in lower-dimensional space. In this paper, standard SVM is used for training a classifier. During testing (prediction), a group of t predictions made by SVM classifier needs to be combined via intelligent post-processing rules, in order to make a prediction for a test input (in the original d-dimensional space). The novelty of our approach is in the design and empirical validation of these post-processing rules under Group Learning setting. We demonstrate that such post-processing rules effectively reflect general (common-sense) a priori knowledge (about application data). Specifically, we propose two different post-processing schemes and demonstrate their effectiveness for two real-life application domains, i.e., handwritten digit recognition and seizure prediction from iEEG signal. These empirical results show superior performance of the Group Learning approach for sparse data, under both balanced and unbalanced classification settings

KW - binary classification

KW - digit recognition

KW - feature selection

KW - Group Learning

KW - histogram of projections

KW - iEEG

KW - seizure prediction

KW - SVM

KW - unbalanced data.

UR - http://www.scopus.com/inward/record.url?scp=85073220194&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073220194&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2019.8852183

DO - 10.1109/IJCNN.2019.8852183

M3 - Conference contribution

AN - SCOPUS:85073220194

T3 - Proceedings of the International Joint Conference on Neural Networks

BT - 2019 International Joint Conference on Neural Networks, IJCNN 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -