2.2 Defining Geospatial Big Data
Spatial data also known as geospatial data, geo-information, geodata, etc have many definitions depending from the
background of the author. All of them emphasize the geographic location of the phenomena to be described as
basic criteria. The nature of the digital representation of the continuous space can be grouped in 4 or 5 type. Traditionally
we consider two type of geospatial data vector and raster Elek, 2006 owing to the development of information
technology nowadays we can have higher abstraction type of data such as point clouds, graph networks. An additional
particular kind of location-aware data is also examined by analysts; social media-like data which requires a particular
approach to collect and process as well. Along with Big Data theory geospatial big data is defined as volume, variety and
update frequency rate that exceed the capability of spatial computing technology Lee and Kang, 2015, Li et al., 2015,
Kambatla et al., 2014. In Table 1. we have collected the main characteristics of geospatial big data for each type of
formats such as: representation formats, GIS operations, volume, velocity, variety and visualization aspects. To have a
better understanding on what are the main attributes of geospatial data because it is hard to delineate the margin
starting to
“exceed the capability of spatial computing technology
”. To estimate the size of the processable amount of data are use-case specific, there are some good examples
Evans et al., 2014 where the authors tried to identify the Geospatial Data and Geospatial Big Data differences.
Data type Formats
GIS operations Volume
Velocity Variety
Visualization Vector
point, line, polygon multi
Overlapping vector geoprocessing
available amount of vector data for instance nation-wide
cadastre, or land cover,roads, waterways, utility network, etc
real time monitoring, and
rapid response is emerging
consider in a GIS
processing several types
of previously mentioned
data need to be combined
to extract the relevant
information thanks to OGC standards
and web-gis platforms non researchers are able to use
GIS data for many different purposes
3D representation
point cloud,TIN or Triangular mesh
3D modeling, urban modeling,simulation,flight
from above camera view, visibility operations, semi-
automatic point cloud feature detection,
classification, terrestrial laser scanning, BIM
available amount of point cloud data or TIN for the creation of
DSM, DTM modelling, feature extraction and simulation
requires huge computational capacity
time sensitive 3D data requires rapid
processing disaster management and
simulation 3D view perspective
together with thematic content with reduced
information are essential to spreading information
for different level of end- users
Raster grid
Local, Focal, Zonal Global Map Algebra processing,
image analysis available free series terrestrial,
aerial and satellite multispectral and hyperspectral imagery
airplane, UAV, earth observation data requires huge computational
capacity using raster image processing methods
real time spatio temporal earth
observation data processing is need
more than ever independently
from the extent of the processing
to deliver results of earth observation monitoring
and processing novel solution in visualization
also needed to transform information human
readable
Network
graph nodes, edges, line
routing, network analysis, allocation Geo-business
trillions of edges, nodes for graph processing available from location
based networks also from social media originally big data concept
made for text-based and graph- like structures
real time monitoring of
moving objects, transportation
decision support is needed
in network analysis and routing visualization
techniques are indispensable to serve it
real time
Geolocation- aware media
text:post, tweets, web-logs,check-ins,
media: GPS tracks from smart
phones,UAV video,geoPDF
profiles: name, geocodes,
disaster management decision support, crowd
sourcing, human geography, sociology,
crime mapping uses geoprocessing, GEOINT
techniques data mining, geostatistical
techniques and predictive modelling are traditionally
considered as big data processing methods which requires
computing capacity even on web content analysis text based
media files real time social
media and information flow is
faster than ever, geospatial sector is
already taking part for location-based social
media visualization is the basis which exceed the
traditional barriers of GIS sector, novel solutions are
rising every day to collect and analysing geolocated
web content.
Table 1. Geospatial Big Data characteristics
According to the previously mentioned definitions and characteristics of Big Data and Geospatial Big Data
represented in the table are reasonable. We do not intend to identify other characteristics of Big Data, because they are
not closely related to Geoprocessing topic. In order to process Big Data distributed computing environment and techniques
have been introduced and applied to handle time-consuming operations. In our related work feasibility aspects were
targeted together with a conceptual framework for benchmarking experimental processing.
3 PREVIOUS AND RELATED WORKS 3.1 Distributed computing
Distributed computing environment is a software system where computational and storage components are on
networked computers ,communicating and coordinate their actions by passing messages through “network socket”
endpoints within the network. Components interact with each other to achieve a common goal. Three significant
characteristics of distributed systems are: concurrency of
This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper. doi:10.5194isprsannals-III-4-111-2016
112
components, lack of a global clock, and independent failure of components. Plainly, a distributed system is a collection of
computers within the same network, working together as one larger computer. Massive computational power and storage
capacity have been gained due to this architecture. We must
note that processes’ running in a distributed system does not share memory with each other, like parallel computing
systems. Processes in distributed system communicate through message queues. Two architectural models are
suggested for distributed computing systems:
● Client-Server model: where clients initiate communication or processing jobs to the server,
which distribute that requests to all processing and storage units if necessary to do the real work
and returning results to client. ● Peer-to-Peer model: where all units involved in
distributed system are the client and server at the same time, without any distinction between client
or server processes. The technology of used in distributed computing of
geospatial data is similar to any other process of distributed computing. Several solutions are introduced to accelerate
geoprocessing usually time consuming methods. Along with the available amount of data together with its particular
procedures to derive information geographical information systems constantly invokes novel solutions from the IT
sector.
3.2 Distributed geospatial computing