Parallel formulations of decision-tree classification algorithms

Anurag Srivastava, Eui Hong Han, Vipin Kumar, Vineet Singh

Research output: Contribution to journalArticlepeer-review

90 Scopus citations

Abstract

Classification decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classification decision trees have a natural concurrency, but are difficult to parallelize due to the inherent dynamic nature of the computation. In this paper, we present parallel formulations of classification decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. We also provide the analysis of the cost of computation and communication of the proposed hybrid method. Moreover, experimental results on an IBM SP-2 demonstrate excellent speedups and scalability.

Original languageEnglish (US)
Pages (from-to)237-261
Number of pages25
JournalData Mining and Knowledge Discovery
Volume3
Issue number3
DOIs
StatePublished - 1999

Bibliographical note

Funding Information:
A significant part of this work was done while Anurag Srivastava and Vineet Singh were at IBM TJ Watson Research Center. This work was supported by NSF grant ASC-9634719, Army Research Office contract DA/DAAH04-95-1-0538, Cray Research Inc. Fellowship, and IBM partnership award, the content of which does not necessarily reflect the policy of the government, and no official endorsement should be inferred. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute, Cray Research Inc., and NSF grant CDA-9414015.

Keywords

  • Classification
  • Data mining
  • Decision trees
  • Parallel processing
  • Scalability

Fingerprint

Dive into the research topics of 'Parallel formulations of decision-tree classification algorithms'. Together they form a unique fingerprint.

Cite this