High performance data mining

Vipin Kumar, Mahesh V. Joshi, Eui Hong Han, Pang Ning Tan, Michael Steinbach

Research output: Contribution to journalArticle

2 Scopus citations

Abstract

Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in an unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. Data mining, an important step in this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns hidden in the data [SAD+93, CHY96]. The field of data mining builds upon the ideas from diverse fields such as machine learning, pattern recognition, statistics, database systems, and data visualization. But, techniques developed in these traditional disciplines are often unsuitable due to some unique characteristics of today's data-sets, such as their enormous sizes, high-dimensionality, and heterogeneity. There is a necessity to develop effective parallel algorithms for various data mining techniques. However, designing such algorithms is challenging, and the main focus of the paper is a description of the parallel formulations of two important data mining algorithms: discovery of association rules, and induction of decision trees for classification. We also briefly discuss an application of data mining to the analysis of large data sets collected by Earth observing satellites that need to be processed to better understand global scale changes in biosphere processes and patterns.

Original languageEnglish (US)
Pages (from-to)111-125
Number of pages15
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2565
StatePublished - Dec 1 2003

Fingerprint Dive into the research topics of 'High performance data mining'. Together they form a unique fingerprint.

  • Cite this