ADMAD: Application-driven metadata aware de-duplication archival storage system

Chuanyi Liu, Yingping Lu, Chunhui Shi, Guanlin Lu, David H.C. Du, Dong Sheng Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

45 Scopus citations

Abstract

There is a huge amount of duplicated or redundant data in current storage systems. So Data Deduplication, which uses lossless data compression schemes to minimize the duplicated data at the interfile level, has been receiving broad attention in recent years. But there are still research challenges in current approaches and storage systems, such as: how to chunking the files more efficiently and better leverage potential similarity and identity among dedicated applications; how to store the chunks effectively and reliably into secondary storage devices. In this paper, we propose ADMAD: an Application-Driven Metadata Aware De-duplication Archival Storage System, which makes use of certain meta-data information of different levels in the I/O path to direct the file partitioning into more Meaningful data Chunks (MC) to maximally reduce the inter-file level duplications. However, the chunks may be with different lengths and variable sizes, storing them into storage devices may result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. Therefore, in ADMAD, chunks are further packaged into fixed sized Objects as the storage units to speed up the I/O performance as well as to ease the data management. Preliminary experiments have demonstrated that the proposed system can further reduce the required storage space when compared with current methods (from 20% to near 50% according to several datasets), and largely improves the writing performance (about 50%-70% in average).

Original languageEnglish (US)
Title of host publicationProceedings - 5th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008
Pages29-35
Number of pages7
DOIs
StatePublished - 2008
Event5th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008 - Baltimore, MD, United States
Duration: Sep 22 2008Sep 22 2008

Publication series

NameProceedings - 5th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008

Other

Other5th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008
CountryUnited States
CityBaltimore, MD
Period9/22/089/22/08

Fingerprint Dive into the research topics of 'ADMAD: Application-driven metadata aware de-duplication archival storage system'. Together they form a unique fingerprint.

  • Cite this

    Liu, C., Lu, Y., Shi, C., Lu, G., Du, D. H. C., & Wang, D. S. (2008). ADMAD: Application-driven metadata aware de-duplication archival storage system. In Proceedings - 5th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008 (pp. 29-35). [4685844] (Proceedings - 5th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008). https://doi.org/10.1109/SNAPI.2008.11