TY - GEN
T1 - Predictive learning with sparse heterogeneous data
AU - Cherkassky, Vladimir S
AU - Cai, Feng
AU - Liang, Lichen
PY - 2009/11/18
Y1 - 2009/11/18
N2 - Many applications of machine learning involve sparse and heterogeneous data. For example, estimation of predictive (diagnostic) models using patients' data from clinical studies requires effective integration of genetic, clinical and demographic data. Typically all heterogeneous inputs are properly encoded and mapped onto a single feature vector, used for estimating (training) a predictive model. This approach, known as standard inductive learning, is used in most application studies. More recently, several new learning methodologies have emerged. In particular, when training data can be naturally separated into several groups (or structured), we can view learning (estimation) for each group as a separate task, leading to Multi-Task Learning framework. Similarly, a setting where training data is structured, but the objective is to estimate a single predictive model (for all groups), leads to Learning with Structured Data and SVM+ methodology recently proposed by Vapnik. This paper demonstrates advantages and limitations of these new data modeling approaches for modeling heterogeneous data (relative to standard inductive SVM) via empirical comparisons using several publicly available medical data sets.
AB - Many applications of machine learning involve sparse and heterogeneous data. For example, estimation of predictive (diagnostic) models using patients' data from clinical studies requires effective integration of genetic, clinical and demographic data. Typically all heterogeneous inputs are properly encoded and mapped onto a single feature vector, used for estimating (training) a predictive model. This approach, known as standard inductive learning, is used in most application studies. More recently, several new learning methodologies have emerged. In particular, when training data can be naturally separated into several groups (or structured), we can view learning (estimation) for each group as a separate task, leading to Multi-Task Learning framework. Similarly, a setting where training data is structured, but the objective is to estimate a single predictive model (for all groups), leads to Learning with Structured Data and SVM+ methodology recently proposed by Vapnik. This paper demonstrates advantages and limitations of these new data modeling approaches for modeling heterogeneous data (relative to standard inductive SVM) via empirical comparisons using several publicly available medical data sets.
UR - http://www.scopus.com/inward/record.url?scp=70449364748&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449364748&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2009.5179036
DO - 10.1109/IJCNN.2009.5179036
M3 - Conference contribution
AN - SCOPUS:70449364748
SN - 9781424435531
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 544
EP - 551
BT - 2009 International Joint Conference on Neural Networks, IJCNN 2009
T2 - 2009 International Joint Conference on Neural Networks, IJCNN 2009
Y2 - 14 June 2009 through 19 June 2009
ER -