Predictive learning in the presence of heterogeneity and limited training data

Anuj Karpatne, Ankush Khandelwal, Shyam Boriah, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

A large number of real-world domains possess heterogeneity in their data, which implies that different partitions of the data show different relationships between explanatory and response variables. This increases the overall model complexity of predictive learning in the presence of heterogeneity. Additionally, a number of real-world domains lack sufficient training data, making the learning algorithm prone to over-fitting, especially when the model complexity is large. However, there often exists a structure among the data instances and their partitions which can be appropriately leveraged for reducing the model complexity along with addressing heterogeneity. In this paper, we present a framework for learning robust predictive models in real-world heterogeneous datasets which lack sufficient number of training samples. We demonstrate the usefulness of our framework in the domain of remote sensing for forest cover estimation. Through a series of comparative experiments with baseline approaches, we are able to show that our framework: (a) captures meaningful information about heterogeneity in the data, (b) improves prediction performance by addressing data heterogeneity, (c) is robust to over-fitting in the presence of limited training data, and (d) is robust to the choice of the number of partitions used for representing heterogeneitv.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2014, SDM 2014
EditorsMohammed J. Zaki, Arindam Banerjee, Srinivasan Parthasarathy, Pang Ning-Tan, Zoran Obradovic, Chandrika Kamath
PublisherSociety for Industrial and Applied Mathematics Publications
Pages253-261
Number of pages9
ISBN (Electronic)9781510811515
DOIs
StatePublished - 2014
Event14th SIAM International Conference on Data Mining, SDM 2014 - Philadelphia, United States
Duration: Apr 24 2014Apr 26 2014

Publication series

NameSIAM International Conference on Data Mining 2014, SDM 2014
Volume1

Other

Other14th SIAM International Conference on Data Mining, SDM 2014
CountryUnited States
CityPhiladelphia
Period4/24/144/26/14

Bibliographical note

Funding Information:
This research was supported in part by the National Science Foundation under Grants IIS-1029711 and IIS- 0905581, as well as the Planetary Skin Institute. Access to computing facilities was provided by the University of Minnesota Supercomputing Institute.

Fingerprint Dive into the research topics of 'Predictive learning in the presence of heterogeneity and limited training data'. Together they form a unique fingerprint.

Cite this