SPATIAL QUERY TIMING EXPERIMENTS

open mongodb big_lidar database client = MongoClient db = client.test_database big_lidar = db.big_lidar add file to GridFS file = openfile_name, rb gfs = gridfs.GridFSdb gf = gfs.putfile, filename = file_name file.close add one tile to big_lidar database tile = {type: s, version: v, id: i, date: d, loc: loc, filename: file_name, gridfs_id: gf} big_lidar.inserttile Figure 4 shows a graphical result of the operation. It is the visualization of all stored tiles’ bounding boxes on top of a map using a desktop GIS system Quantum GIS. Since the bounding boxes are stored as BSON objects, it is straight forward to export them as GeoJSON files. We show a simple status report and an aggregation on the database in the MongoDB console. This confirms the successful storage of 1351 tiles and a total of 447728000747 points in the database. db.big_lidar.stats { ns : AHN2.big_lidar, count : 1351, size : 670096, avgObjSize : 496, storageSize : 696320, … } db.big_lidar.aggregate [ { group : { _id: null, total: { sum : num_points } } } ] { _id : null, total : NumberLong447728000747 }

6. SPATIAL QUERY

MongoDB like any NoSQL database allows for queries on the attributes of the document. When an index is created on a certain attribute queries are accelerated. As a specialty MongoDB allows spatial indices and spatial queries. Hence we can perform spatial Figure 4: Bounding Polygons of the tiles of the AHN2 dataset displayed over a map. This contribution has been peer-reviewed. Editors: M. Brédif, G. Patanè, and T. Dokken doi:10.5194isprsarchives-XL-3-W3-577-2015 580 queries on the bounding boxes of the LiDAR tiles. We show a very simple example of an interactive query on the Python console. The query uses a polygon to describe the search area. The database returns the tile metadata including the GridFS index. Using the GridFS index the actual point cloud data can be read from the database. It is simply the LiDAR point cloud file that was previously put into the database. It is important to note that no conversion or alteration of the files was done. tiles=listbig_lidar.find{ loc : { geoWithin : { geometry : {type: Polygon, coordinates : [[[5.1, 53.3], [5.5, 53.3], [5.5, 53.5], [5.1, 53.5], [5.1, 53.3] ]]}}} } lentiles 4 tiles[0][filename] ug01cz1.laz gf_id = tiles[0][gridfs_id] gf = gfs.getgf_id las_file = openexport_ + file_name, wb las_file.writegf.read las_file.close In Figure 5 we give the visualization representation of the query operation. Four tiles were retrieved by the spatial query. In the example we attach the filename as attributes and plot the bounding boxes over a base map using the filenames as labels. The leftmost tile corresponds to the point cloud visualized in Figure 1.

7. TIMING EXPERIMENTS

As we move file- centric storage from the operating systems’ file systems to a database, the performance of the transfer operation with respect to the time they need to complete is of interest. We have therefore performed some timing experiments for simple storage and retrieval of a single tile of about 40 MB. The experiments were performed on an Amazon EC2, a well-known high-performance computing environment Akioka and Muraoka, 2010. We used an EC2 small instance in all experiments. We separated local transfer from remote transfer. Local Transfer is in- between the operating systems’ file system and the database on the same machine. Remote transfer occurs between the EC2 instance and a machine outside the amazon cluster connected via the internet. To better assess the measured times we give comparisons to alternative transfer methods. We compare local transfer to the timing measured for direct file system transfer file-copy. Remote transfer is compared to transfer with Dropbox, a leading personal cloud storage service Drago et al., 2012. Figure 6 shows the results of local transfer. We can see that there is some overhead compared to file system copy. This is particularly the case for storing files put, less so for retrieving them get. Figure 7 shows the timings for remote transfer. The database performs at least on par with a common cloud storage system. This obviously depends on the internet connection. All experiments were performed on the same computers at nearly the same time. Figure 6: Transfer times for local transfer. Figure 7: Transfer times for remote transfer. 0.036 1.515 0.268 0.000 0.500 1.000 1.500 2.000 OS file copy GridFS - put GridFS - get Local Transfer ~40MB Tile in sec. 95.8 116.1 6.2 13.1 0.0 50.0 100.0 150.0 GridFS - upload Dropbox - upload GridFS -download Dropbox - download Remote Transfer ~40MB Tile in sec. Figure 5: Example result of a spatial query visualized using QGIS. In the top image the red box represents the search polygon. The bottom image shows the tiles that are completely within the search polygon. The MongoDB spatial query delivered four tiles on the coast of the Netherlands. This contribution has been peer-reviewed. Editors: M. Brédif, G. Patanè, and T. Dokken doi:10.5194isprsarchives-XL-3-W3-577-2015 581

8. CONCLUSIONS