Vertical data processing for mining big data: A predicate tree approach

Mohammad Hossain, Maninder Singh, Sameer Abufardeh

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    Time is a critical factor in processing a very large volume of data a.k.a ‘Big Data’. Many existing data mining algorithms (supervised and unsupervised) become futile because of the ubiquitous use of horizontal processing i.e. row-by-row processing of stored data. Processing time for big data is further exacerbated by its high dimensionality (# of features) and high cardinality (# of records). To address this processing-time issue, we proposed a vertical approach with predicate trees (pTree). Our approach structures data into columns of bit slices, which range from few to hundreds and are processed vertically i.e. column by column. We tested and compared our vertical approach to traditional (horizontal) approach using three basic Boolean operations namely addition, subtraction and multiplication with 10 data sizes. The length of data size ranged from half a billion bits to 5 billion bits. The results are analyzed w.r.t processing speed time and speed gain for both the approaches. The result shows that our vertical approach outperformed the traditional approach for all Boolean operations (add, subtract and multiply) across all data sizes and results in speed-gain between 24% to 96%. We concluded from our results that our approach being in data-mining ready format is best suited to apply to operations involving complex computations in big data application to achieve significant speed gain.

    Original languageEnglish (US)
    Pages (from-to)68-77
    Number of pages10
    JournalEPiC Series in Computing
    Volume64
    DOIs
    StatePublished - 2019
    Event28th International Conference on Software Engineering and Data Engineering, SEDE 2019 - San Diego, United States
    Duration: Sep 30 2019Oct 2 2019

    Fingerprint

    Dive into the research topics of 'Vertical data processing for mining big data: A predicate tree approach'. Together they form a unique fingerprint.

    Cite this