TY - JOUR
T1 - Scalable computational geometry in MapReduce
AU - Li, Yuan
AU - Eldawy, Ahmed
AU - Xue, Jie
AU - Knorozova, Nadezda
AU - Mokbel, Mohamed F.
AU - Janardan, Ravi
N1 - Publisher Copyright:
© 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2019/8/1
Y1 - 2019/8/1
N2 - Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.
AB - Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.
KW - Computational Geometry
KW - Distributed Systems
KW - Hadoop
KW - MapReduce
KW - Output-sensitive Algorithms
UR - http://www.scopus.com/inward/record.url?scp=85060150253&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060150253&partnerID=8YFLogxK
U2 - 10.1007/s00778-018-0534-5
DO - 10.1007/s00778-018-0534-5
M3 - Article
AN - SCOPUS:85060150253
SN - 1066-8888
VL - 28
SP - 523
EP - 548
JO - VLDB Journal
JF - VLDB Journal
IS - 4
ER -