GeoFlink: A Framework for the Real-time Processing of Spatial Streams

7 Apr 2020  ·  Salman Ahmed Shaikh, Komal Mariam, Hiroyuki Kitagawa, Kyoung-Sook Kim ·

Apache Flink is an open-source system for the scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is the requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, GeoMesa and Parallel Secondo do not support streaming workloads and can only handle static/batch workloads. Hence this work presents GeoFlink, which extends Apache Flink to support spatial data types, index and continuous queries. To enable efficient processing of continuous spatial queries and for the effective data distribution among the Flink cluster nodes, a grid-based index is introduced. The grid index enables the pruning of spatial objects which cannot be part of a spatial query result and thus can guarantee efficient query processing, similarly it helps in preserving spatial data proximity, hence resulting in effective data distribution. GeoFlink currently supports spatial range, spatial $k$NN and spatial join queries. An extensive experimental study on real spatial data streams show that GeoFlink achieves several orders of magnitude higher query performance than other ordinary distributed approaches.

PDF Abstract

Categories


Databases Distributed, Parallel, and Cluster Computing

Datasets


  Add Datasets introduced or used in this paper