HatS: A heterogeneity-aware tiered storage for hadoop

K. R. Krish, Ali Anwar, Ali R. Butt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

66 Scopus citations

Abstract

Hadoop has become the de-facto large-scale data processing framework for modern analytics applications. A major obstacle for sustaining high performance and scalability in Hadoop is managing the data growth while meeting the ever higher I/O demand. To this end, a promising trend in storage systems is to utilize hybrid and heterogeneous devices - Solid State Disks (SSD), ram disks and Network Attached Storage (NAS), which can help achieve very high I/O rates at acceptable cost. However, the Hadoop Distributed File System (HDFS) that is unable to exploit such heterogeneous storage. This is because HDFS works on the assumption that the underlying devices are homogeneous storage blocks, disregarding their individual I/O characteristics, which leads to performance degradation. In this paper, we present hatS, a Heterogeneity-Aware Tiered Storage, which is a novel redesign of HDFS into a multi-tiered storage system that seamlessly integrates heterogeneous storage technologies into the Hadoop ecosystem. hatS also proposes data placement and retrieval policies, which improve the utilization of the storage devices based on their characteristics such as I/O throughput and capacity. We evaluate hatS using an actual implementation on a medium-sized cluster consisting of HDDs and two types of SSDs (i.e., SATA SSD and PCIe SSD). Experiments show that hatS achieves 32.6% higher read bandwidth, on average, than HDFS for the test Hadoop jobs (such as Grep and Test DFSIO) by directing 64% of the I/O accesses to the SSD tiers. We also evaluate our approach with trace-driven simulations using synthetic Facebook workloads, and show that compared to the standard setup, hatS improves the average I/O rate by 36%, which results in 26% improvement in the job completion time.

Original languageEnglish (US)
Title of host publicationProceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014
PublisherIEEE Computer Society
Pages502-511
Number of pages10
ISBN (Print)9781479927838
DOIs
StatePublished - 2014
Externally publishedYes
Event14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014 - Chicago, IL, United States
Duration: May 26 2014May 29 2014

Publication series

NameProceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014

Conference

Conference14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014
Country/TerritoryUnited States
CityChicago, IL
Period5/26/145/29/14

Keywords

  • Hadoop Distributed File System (HDFS)
  • Tiered storage
  • data placement and retrieval policy

Fingerprint

Dive into the research topics of 'HatS: A heterogeneity-aware tiered storage for hadoop'. Together they form a unique fingerprint.

Cite this