TY - JOUR
T1 - Trends in big data analytics
AU - Kambatla, Karthik
AU - Kollias, Giorgos
AU - Kumar, Vipin
AU - Grama, Ananth
PY - 2014/7
Y1 - 2014/7
N2 - One of the major applications of future generation parallel and distributed systems is in big-data analytics. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size. Beyond their sheer magnitude, these datasets and associated applications' considerations pose significant challenges for method and software development. Datasets are often distributed and their size and privacy considerations warrant distributed techniques. Data often resides on platforms with widely varying computational and network capabilities. Considerations of fault-tolerance, security, and access control are critical in many applications (Dean and Ghemawat, 2004; Apache hadoop). Analysis tasks often have hard deadlines, and data quality is a major concern in yet other applications. For most emerging applications, data-driven models and methods, capable of operating at scale, are as-yet unknown. Even when known methods can be scaled, validation of results is a major issue. Characteristics of hardware platforms and the software stack fundamentally impact data analytics. In this article, we provide an overview of the state-of-the-art and focus on emerging trends to highlight the hardware, software, and application landscape of big-data analytics.
AB - One of the major applications of future generation parallel and distributed systems is in big-data analytics. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size. Beyond their sheer magnitude, these datasets and associated applications' considerations pose significant challenges for method and software development. Datasets are often distributed and their size and privacy considerations warrant distributed techniques. Data often resides on platforms with widely varying computational and network capabilities. Considerations of fault-tolerance, security, and access control are critical in many applications (Dean and Ghemawat, 2004; Apache hadoop). Analysis tasks often have hard deadlines, and data quality is a major concern in yet other applications. For most emerging applications, data-driven models and methods, capable of operating at scale, are as-yet unknown. Even when known methods can be scaled, validation of results is a major issue. Characteristics of hardware platforms and the software stack fundamentally impact data analytics. In this article, we provide an overview of the state-of-the-art and focus on emerging trends to highlight the hardware, software, and application landscape of big-data analytics.
KW - Analytics
KW - Big-data
KW - Data centers
KW - Distributed systems
UR - http://www.scopus.com/inward/record.url?scp=84901640093&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901640093&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2014.01.003
DO - 10.1016/j.jpdc.2014.01.003
M3 - Article
AN - SCOPUS:84901640093
VL - 74
SP - 2561
EP - 2573
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
SN - 0743-7315
IS - 7
ER -