TY - GEN
T1 - Redefining data locality for cross-data center storage
AU - Oh, Kwangsung
AU - Raghavan, Ajaykrishna
AU - Chandra, Abhishek
AU - Weissman, Jon
PY - 2015/6/16
Y1 - 2015/6/16
N2 - Many Cloud applications exploit the diversity of storage options in a data center to achieve desired cost, performance, and durability tradeoffs. It is common to see applications using a combination of memory, local disk, and archival storage tiers within a single data center to meet their needs. For example, hot data can be kept in memory using ElastiCache, and colder data in cheaper, slower storage such as S3, using Amazon as an example. For user-facing applications, a recent trend is to exploit multiple data centers for data placement to enable better latency of access from users to their data. The conventional wisdom is that co-location of computation and storage within the same data center is a key to application performance, so that applications running within a data center are often still limited to access local data. In this paper, using experiments on Amazon, Microsoft, and Google clouds, we show that this assumption is false, and that accessing data in nearby data centers may be faster than local access at different or even same points in the storage hierarchy. This can lead to not only better performance, but also reduced cost, simpler consistency policies and reconsidering data locality in multiple DCs environment. This argues for an expansion of cloud storage tiers to consider non-local storage options, and has interesting implications for the design of a distributed storage system.
AB - Many Cloud applications exploit the diversity of storage options in a data center to achieve desired cost, performance, and durability tradeoffs. It is common to see applications using a combination of memory, local disk, and archival storage tiers within a single data center to meet their needs. For example, hot data can be kept in memory using ElastiCache, and colder data in cheaper, slower storage such as S3, using Amazon as an example. For user-facing applications, a recent trend is to exploit multiple data centers for data placement to enable better latency of access from users to their data. The conventional wisdom is that co-location of computation and storage within the same data center is a key to application performance, so that applications running within a data center are often still limited to access local data. In this paper, using experiments on Amazon, Microsoft, and Google clouds, we show that this assumption is false, and that accessing data in nearby data centers may be faster than local access at different or even same points in the storage hierarchy. This can lead to not only better performance, but also reduced cost, simpler consistency policies and reconsidering data locality in multiple DCs environment. This argues for an expansion of cloud storage tiers to consider non-local storage options, and has interesting implications for the design of a distributed storage system.
KW - Data locality
KW - In memory storage
KW - Multi-tiered storage
KW - Mutli-DCs
KW - Wide area storage
UR - http://www.scopus.com/inward/record.url?scp=84979695533&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979695533&partnerID=8YFLogxK
U2 - 10.1145/2756594.2756596
DO - 10.1145/2756594.2756596
M3 - Conference contribution
AN - SCOPUS:84979695533
T3 - BigSystem 2015 - Proceedings of the 2nd International Workshop on Software-Defined Ecosystems, Part of HPDC 2015
SP - 15
EP - 22
BT - BigSystem 2015 - Proceedings of the 2nd International Workshop on Software-Defined Ecosystems, Part of HPDC 2015
PB - Association for Computing Machinery, Inc
T2 - 2nd International Workshop on Software-Defined Ecosystems, BigSystem 2015
Y2 - 16 June 2015
ER -