Defining Geospatial Big Data

2.2 Defining Geospatial Big Data

Spatial data also known as geospatial data, geo-information, geodata, etc have many definitions depending from the background of the author. All of them emphasize the geographic location of the phenomena to be described as basic criteria. The nature of the digital representation of the continuous space can be grouped in 4 or 5 type. Traditionally we consider two type of geospatial data vector and raster Elek, 2006 owing to the development of information technology nowadays we can have higher abstraction type of data such as point clouds, graph networks. An additional particular kind of location-aware data is also examined by analysts; social media-like data which requires a particular approach to collect and process as well. Along with Big Data theory geospatial big data is defined as volume, variety and update frequency rate that exceed the capability of spatial computing technology Lee and Kang, 2015, Li et al., 2015, Kambatla et al., 2014. In Table 1. we have collected the main characteristics of geospatial big data for each type of formats such as: representation formats, GIS operations, volume, velocity, variety and visualization aspects. To have a better understanding on what are the main attributes of geospatial data because it is hard to delineate the margin starting to “exceed the capability of spatial computing technology ”. To estimate the size of the processable amount of data are use-case specific, there are some good examples Evans et al., 2014 where the authors tried to identify the Geospatial Data and Geospatial Big Data differences. Data type Formats GIS operations Volume Velocity Variety Visualization Vector point, line, polygon multi Overlapping vector geoprocessing available amount of vector data for instance nation-wide cadastre, or land cover,roads, waterways, utility network, etc real time monitoring, and rapid response is emerging consider in a GIS processing several types of previously mentioned data need to be combined to extract the relevant information thanks to OGC standards and web-gis platforms non researchers are able to use GIS data for many different purposes 3D representation point cloud,TIN or Triangular mesh 3D modeling, urban modeling,simulation,flight from above camera view, visibility operations, semi- automatic point cloud feature detection, classification, terrestrial laser scanning, BIM available amount of point cloud data or TIN for the creation of DSM, DTM modelling, feature extraction and simulation requires huge computational capacity time sensitive 3D data requires rapid processing disaster management and simulation 3D view perspective together with thematic content with reduced information are essential to spreading information for different level of end- users Raster grid Local, Focal, Zonal Global Map Algebra processing, image analysis available free series terrestrial, aerial and satellite multispectral and hyperspectral imagery airplane, UAV, earth observation data requires huge computational capacity using raster image processing methods real time spatio temporal earth observation data processing is need more than ever independently from the extent of the processing to deliver results of earth observation monitoring and processing novel solution in visualization also needed to transform information human readable Network graph nodes, edges, line routing, network analysis, allocation Geo-business trillions of edges, nodes for graph processing available from location based networks also from social media originally big data concept made for text-based and graph- like structures real time monitoring of moving objects, transportation decision support is needed in network analysis and routing visualization techniques are indispensable to serve it real time Geolocation- aware media text:post, tweets, web-logs,check-ins, media: GPS tracks from smart phones,UAV video,geoPDF profiles: name, geocodes, disaster management decision support, crowd sourcing, human geography, sociology, crime mapping uses geoprocessing, GEOINT techniques data mining, geostatistical techniques and predictive modelling are traditionally considered as big data processing methods which requires computing capacity even on web content analysis text based media files real time social media and information flow is faster than ever, geospatial sector is already taking part for location-based social media visualization is the basis which exceed the traditional barriers of GIS sector, novel solutions are rising every day to collect and analysing geolocated web content. Table 1. Geospatial Big Data characteristics According to the previously mentioned definitions and characteristics of Big Data and Geospatial Big Data represented in the table are reasonable. We do not intend to identify other characteristics of Big Data, because they are not closely related to Geoprocessing topic. In order to process Big Data distributed computing environment and techniques have been introduced and applied to handle time-consuming operations. In our related work feasibility aspects were targeted together with a conceptual framework for benchmarking experimental processing. 3 PREVIOUS AND RELATED WORKS 3.1 Distributed computing Distributed computing environment is a software system where computational and storage components are on networked computers ,communicating and coordinate their actions by passing messages through “network socket” endpoints within the network. Components interact with each other to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper. doi:10.5194isprsannals-III-4-111-2016 112 components, lack of a global clock, and independent failure of components. Plainly, a distributed system is a collection of computers within the same network, working together as one larger computer. Massive computational power and storage capacity have been gained due to this architecture. We must note that processes’ running in a distributed system does not share memory with each other, like parallel computing systems. Processes in distributed system communicate through message queues. Two architectural models are suggested for distributed computing systems: ● Client-Server model: where clients initiate communication or processing jobs to the server, which distribute that requests to all processing and storage units if necessary to do the real work and returning results to client. ● Peer-to-Peer model: where all units involved in distributed system are the client and server at the same time, without any distinction between client or server processes. The technology of used in distributed computing of geospatial data is similar to any other process of distributed computing. Several solutions are introduced to accelerate geoprocessing usually time consuming methods. Along with the available amount of data together with its particular procedures to derive information geographical information systems constantly invokes novel solutions from the IT sector.

3.2 Distributed geospatial computing