Assuring demanded read performance of data deduplication storage with backup datasets

Young Jin Nam, Dongchul Park, David H Du

Research output: Chapter in Book/Report/Conference proceedingConference contribution

42 Scopus citations

Abstract

Data deduplication has been widely adopted in contemporary backup storage systems. It not only saves storage space considerably, but also shortens the data backup time significantly. Since the major goal of the original data deduplication lies in saving storage space, its design has been focused primarily on improving write performance by removing as many duplicate data as possible from incoming data streams. Although fast recovery from a system crash relies mainly on read performance provided by deduplication storage, little investigation into read performance improvement has been made. In general, as the amount of deduplicated data increases, write performance improves accordingly, whereas associated read performance becomes worse. In this paper, we newly propose a deduplication scheme that assures demanded read performance of each data stream while achieving its write performance at a reasonable level, eventually being able to guarantee a target system recovery time. For this, we first propose an indicator called cache aware Chunk Fragmentation Level (CFL) that estimates degraded read performance on the fly by taking into account both incoming chunk information and read cache effects. We also show a strong correlation between this CFL and read performance in the backup datasets. In order to guarantee demanded read performance expressed in terms of a CFL value, we propose a read performance enhancement scheme called selective duplication that is activated whenever the current CFL becomes worse than the demanded one. The key idea is to judiciously write non-unique (shared) chunks into storage together with unique chunks unless the shared chunks exhibit good enough spatial locality. We quantify the spatial locality by using a selective duplication threshold value. Our experiments with the actual backup datasets demonstrate that the proposed scheme achieves demanded read performance in most cases at the reasonable cost of write performance.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012
Pages201-208
Number of pages8
DOIs
StatePublished - 2012
Event2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012 - Washington, DC, United States
Duration: Aug 7 2012Aug 9 2012

Publication series

NameProceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012

Other

Other2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012
CountryUnited States
CityWashington, DC
Period8/7/128/9/12

Keywords

  • data deduplication
  • read performance
  • storage

Fingerprint Dive into the research topics of 'Assuring demanded read performance of data deduplication storage with backup datasets'. Together they form a unique fingerprint.

  • Cite this

    Nam, Y. J., Park, D., & Du, D. H. (2012). Assuring demanded read performance of data deduplication storage with backup datasets. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012 (pp. 201-208). [6298180] (Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012). https://doi.org/10.1109/MASCOTS.2012.32