A Lookahead Read Cache: Improving Read Performance for Deduplication Backup Storage

Dongchul Park, Ziqi Fan, Young Jin Nam, David H.C. Du

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


Data deduplication (dedupe for short) is a special data compression technique. It has been widely adopted to save backup time as well as storage space, particularly in backup storage systems. Therefore, most dedupe research has primarily focused on improving dedupe write performance. However, backup storage dedupe read performance is also a crucial problem for storage recovery. This paper designs a new dedupe storage read cache for backup applications that improves read performance by exploiting a special characteristic: the read sequence is the same as the write sequence. Consequently, for better cache utilization, by looking ahead for future references within a moving window, it evicts victims from the cache having the smallest future access. Moreover, to further improve read cache performance, it maintains a small log buffer to judiciously cache future access data chunks. Extensive experiments with real-world backup workloads demonstrate that the proposed read cache scheme improves read performance by up to 64.3%

Original languageEnglish (US)
Pages (from-to)26-40
Number of pages15
JournalJournal of Computer Science and Technology
Issue number1
StatePublished - Jan 1 2017

Bibliographical note

Funding Information:
This work is partially supported by the National Science Foundation Awards of USA under Grant Nos. 121756, 1305237, 142191 and 1439622.

Publisher Copyright:
© 2017, Springer Science+Business Media New York.


  • backup
  • dedupe
  • deduplication
  • read cache


Dive into the research topics of 'A Lookahead Read Cache: Improving Read Performance for Deduplication Backup Storage'. Together they form a unique fingerprint.

Cite this