On producing high and early result throughput in multijoin query plans

Justin K. Levandoski, Mohamed E. Khalefa, Mohamed F. Mokbel

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

This paper introduces an efficient framework for producing high and early result throughput in multijoin query plans. While most previous research focuses on optimizing for cases involving a single join operator, this work takes a radical step by addressing query plans with multiple join operators. The proposed framework consists of two main methods, a flush algorithm and operator state manager. The framework assumes a symmetric hash join, a common method for producing early results, when processing incoming data. In this way, our methods can be applied to a group of previous join operators (optimized for single-join queries) when taking part in multijoin query plans. Specifically, our framework can be applied by 1) employing a new flushing policy to write in-memory data to disk, once memory allotment is exhausted, in a way that helps increase the probability of producing early result throughput in multijoin queries, and 2) employing a state manager that adaptively switches operators in the plan between joining in-memory data and disk-resident data in order to positively affect the early result throughput. Extensive experimental results show that the proposed methods outperform the state-of-the-art join operators optimized for both single and multijoin query plans.

Original languageEnglish (US)
Article number5590243
Pages (from-to)1888-1902
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume23
Issue number12
DOIs
StatePublished - 2011

Bibliographical note

Funding Information:
This work is supported in part by the US National Science Foundation under Grants IIS0811998, IIS0811935, IIS-0952977, CNS0708604, and by Microsoft Research Gifts.

Keywords

  • Database management
  • query processing
  • systems

Fingerprint

Dive into the research topics of 'On producing high and early result throughput in multijoin query plans'. Together they form a unique fingerprint.

Cite this