Large Scale Analytics of Vector+Raster Big Spatial Data?

Ahmed Eldawy, David A Haynes, Lyuye Niu, Zhiba Su

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Significant increases in the volume of big spatial data have driven researchers and practitioners to build specialized systems to process and analyze this data. Existing efforts focus on either big raster data, e.g., remote sensing data or medical images, or big vector data, e.g., geotagged tweets or trajectories. However, when raster and vector data mix, one dataset must be converted to the other representation requiring vector-to-raster or raster-to-vector transformation before processing, which is extremely inefficient for large datasets. In this paper, we advocate a third approach that mixes the raw representations of both vector and raster data in the query processor. As a case study, we apply this to the zonal statistics problem, which computes the statistics over a raster layer for each polygon in a vector layer. We propose a novel method, called Scanline method, which does not require a conversion between raster and vector. Experimental evaluation on real datasets as large as 840 billion pixels shows up to three orders of magnitude speedup over the baseline methods.

Original languageEnglish (US)
Title of host publicationGIS
Subtitle of host publicationProceedings of the ACM International Symposium on Advances in Geographic Information Systems
EditorsSiva Ravada, Erik Hoel, Roberto Tamassia, Shawn Newsam, Goce Trajcevski, Goce Trajcevski
PublisherAssociation for Computing Machinery
ISBN (Print)9781450354905
DOIs
StatePublished - Nov 7 2017
Event25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017 - Redondo Beach, United States
Duration: Nov 7 2017Nov 10 2017

Publication series

NameGIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
Volume2017-November

Other

Other25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017
CountryUnited States
CityRedondo Beach
Period11/7/1711/10/17

Fingerprint

raster
Spatial Data
spatial data
Statistics
polygon
Medical Image
Experimental Evaluation
Data-driven
Large Data Sets
pixel
Remote Sensing
Polygon
trajectory
Remote sensing
Baseline
Speedup
remote sensing
Pixel
Pixels
Trajectories

Keywords

  • Big Spatial Data
  • Clipping
  • Raster
  • Satellite
  • Vector

Cite this

Eldawy, A., Haynes, D. A., Niu, L., & Su, Z. (2017). Large Scale Analytics of Vector+Raster Big Spatial Data?. In S. Ravada, E. Hoel, R. Tamassia, S. Newsam, G. Trajcevski, & G. Trajcevski (Eds.), GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems [62] (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems; Vol. 2017-November). Association for Computing Machinery. https://doi.org/10.1145/3139958.3140042

Large Scale Analytics of Vector+Raster Big Spatial Data?. / Eldawy, Ahmed; Haynes, David A; Niu, Lyuye; Su, Zhiba.

GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. ed. / Siva Ravada; Erik Hoel; Roberto Tamassia; Shawn Newsam; Goce Trajcevski; Goce Trajcevski. Association for Computing Machinery, 2017. 62 (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems; Vol. 2017-November).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Eldawy, A, Haynes, DA, Niu, L & Su, Z 2017, Large Scale Analytics of Vector+Raster Big Spatial Data?. in S Ravada, E Hoel, R Tamassia, S Newsam, G Trajcevski & G Trajcevski (eds), GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems., 62, GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, vol. 2017-November, Association for Computing Machinery, 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017, Redondo Beach, United States, 11/7/17. https://doi.org/10.1145/3139958.3140042
Eldawy A, Haynes DA, Niu L, Su Z. Large Scale Analytics of Vector+Raster Big Spatial Data?. In Ravada S, Hoel E, Tamassia R, Newsam S, Trajcevski G, Trajcevski G, editors, GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. Association for Computing Machinery. 2017. 62. (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems). https://doi.org/10.1145/3139958.3140042
Eldawy, Ahmed ; Haynes, David A ; Niu, Lyuye ; Su, Zhiba. / Large Scale Analytics of Vector+Raster Big Spatial Data?. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. editor / Siva Ravada ; Erik Hoel ; Roberto Tamassia ; Shawn Newsam ; Goce Trajcevski ; Goce Trajcevski. Association for Computing Machinery, 2017. (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems).
@inproceedings{18b5cf49907c412cafb5aff9e83e9026,
title = "Large Scale Analytics of Vector+Raster Big Spatial Data?",
abstract = "Significant increases in the volume of big spatial data have driven researchers and practitioners to build specialized systems to process and analyze this data. Existing efforts focus on either big raster data, e.g., remote sensing data or medical images, or big vector data, e.g., geotagged tweets or trajectories. However, when raster and vector data mix, one dataset must be converted to the other representation requiring vector-to-raster or raster-to-vector transformation before processing, which is extremely inefficient for large datasets. In this paper, we advocate a third approach that mixes the raw representations of both vector and raster data in the query processor. As a case study, we apply this to the zonal statistics problem, which computes the statistics over a raster layer for each polygon in a vector layer. We propose a novel method, called Scanline method, which does not require a conversion between raster and vector. Experimental evaluation on real datasets as large as 840 billion pixels shows up to three orders of magnitude speedup over the baseline methods.",
keywords = "Big Spatial Data, Clipping, Raster, Satellite, Vector",
author = "Ahmed Eldawy and Haynes, {David A} and Lyuye Niu and Zhiba Su",
year = "2017",
month = "11",
day = "7",
doi = "10.1145/3139958.3140042",
language = "English (US)",
isbn = "9781450354905",
series = "GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems",
publisher = "Association for Computing Machinery",
editor = "Siva Ravada and Erik Hoel and Roberto Tamassia and Shawn Newsam and Goce Trajcevski and Goce Trajcevski",
booktitle = "GIS",

}

TY - GEN

T1 - Large Scale Analytics of Vector+Raster Big Spatial Data?

AU - Eldawy, Ahmed

AU - Haynes, David A

AU - Niu, Lyuye

AU - Su, Zhiba

PY - 2017/11/7

Y1 - 2017/11/7

N2 - Significant increases in the volume of big spatial data have driven researchers and practitioners to build specialized systems to process and analyze this data. Existing efforts focus on either big raster data, e.g., remote sensing data or medical images, or big vector data, e.g., geotagged tweets or trajectories. However, when raster and vector data mix, one dataset must be converted to the other representation requiring vector-to-raster or raster-to-vector transformation before processing, which is extremely inefficient for large datasets. In this paper, we advocate a third approach that mixes the raw representations of both vector and raster data in the query processor. As a case study, we apply this to the zonal statistics problem, which computes the statistics over a raster layer for each polygon in a vector layer. We propose a novel method, called Scanline method, which does not require a conversion between raster and vector. Experimental evaluation on real datasets as large as 840 billion pixels shows up to three orders of magnitude speedup over the baseline methods.

AB - Significant increases in the volume of big spatial data have driven researchers and practitioners to build specialized systems to process and analyze this data. Existing efforts focus on either big raster data, e.g., remote sensing data or medical images, or big vector data, e.g., geotagged tweets or trajectories. However, when raster and vector data mix, one dataset must be converted to the other representation requiring vector-to-raster or raster-to-vector transformation before processing, which is extremely inefficient for large datasets. In this paper, we advocate a third approach that mixes the raw representations of both vector and raster data in the query processor. As a case study, we apply this to the zonal statistics problem, which computes the statistics over a raster layer for each polygon in a vector layer. We propose a novel method, called Scanline method, which does not require a conversion between raster and vector. Experimental evaluation on real datasets as large as 840 billion pixels shows up to three orders of magnitude speedup over the baseline methods.

KW - Big Spatial Data

KW - Clipping

KW - Raster

KW - Satellite

KW - Vector

UR - http://www.scopus.com/inward/record.url?scp=85040979149&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040979149&partnerID=8YFLogxK

U2 - 10.1145/3139958.3140042

DO - 10.1145/3139958.3140042

M3 - Conference contribution

AN - SCOPUS:85040979149

SN - 9781450354905

T3 - GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems

BT - GIS

A2 - Ravada, Siva

A2 - Hoel, Erik

A2 - Tamassia, Roberto

A2 - Newsam, Shawn

A2 - Trajcevski, Goce

A2 - Trajcevski, Goce

PB - Association for Computing Machinery

ER -