Data deduplication is an effective way of improving storage space utilization. The data generated by deduplication is persistently stored in data chunks or data containers (a container consisting of a few hundreds or thousands of data chunks). The data restore process is rather slow due to data fragmentation and read amplification. To speed up the restore process, data chunk rewrite (a rewrite is to store a duplicate data chunk) schemes have been proposed to effectively improve data chunk locality and reduce the number of container reads for restoring the original data. However, rewrites will decrease the deduplication ratio since more storage space is used to store the duplicate data chunks. To remedy this, we focus on reducing the data fragmentation and read amplification of container-based deduplication systems. We first propose a flexible container referenced count based rewrite scheme, which can make a better tradeoff between the deduplication ratio and the number of required container reads than that of capping which is an existing rewrite scheme. To further improve the rewrite candidate selection accuracy, we propose a sliding look-back window based design, which can make more accurate rewrite decisions by considering the caching effect, data chunk localities, and data chunk closeness in the current and future windows. According to our evaluation, our proposed approach can always achieve a higher restore performance than that of capping especially when the reduction of deduplication ratio is small.
|Original language||English (US)|
|Title of host publication||Proceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019|
|Number of pages||14|
|State||Published - 2019|
|Event||17th USENIX Conference on File and Storage Technologies, FAST 2019 - Boston, United States|
Duration: Feb 25 2019 → Feb 28 2019
|Name||Proceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019|
|Conference||17th USENIX Conference on File and Storage Technologies, FAST 2019|
|Period||2/25/19 → 2/28/19|
Bibliographical noteFunding Information:
We thank all the members in CRIS group for providing the useful comments to improve our design. We would like to thank our shepherd, Keith Smith, for his useful comments, suggestions, and help in the paper revision. This work was partially supported by NSF awards 1421913, 1439622, 1525617, and 1812537.