Large-Scale Analysis of Docker Images and Performance Implications for Container Storage Systems

Nannan Zhao, Vasily Tarasov, Hadeel Albahar, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Arnab K. Paul, Keren Chen, Ali R. Butt

Research output: Contribution to journalArticlepeer-review

31 Scopus citations


Docker containers have become a prominent solution for supporting modern enterprise applications due to the highly desirable features of isolation, low overhead, and efficient packaging of the application's execution environment. Containers are created from images which are shared between users via a registry. The amount of data registries store is massive. For example, Docker Hub, a popular public registry, stores at least half a million public images. In this article, we analyze over 167 TB of uncompressed Docker Hub images, characterize them using multiple metrics and evaluate the potential of file-level deduplication. Our analysis helps to make conscious decisions when designing storage for containers in general and Docker registries in particular. For example, only 3 percent of the files in images are unique while others are redundant file copies, which means file-level deduplication has a great potential to save storage space. Furthermore, we carry out a comprehensive analysis of both small I/O request performance and copy-on-write performance for multiple popular container storage drivers. Our findings can motivate and help improve the design of data reduction and caching methods for images, pulling optimizations for registries, and storage drivers.

Original languageEnglish (US)
Article number9242268
Pages (from-to)918-930
Number of pages13
JournalIEEE Transactions on Parallel and Distributed Systems
Issue number4
StatePublished - Apr 1 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1990-2012 IEEE.


  • Containers
  • Docker
  • Docker hub
  • container images
  • container registry
  • container storage drivers
  • deduplication


Dive into the research topics of 'Large-Scale Analysis of Docker Images and Performance Implications for Container Storage Systems'. Together they form a unique fingerprint.

Cite this