open mongodb big_lidar database client = MongoClient
db = client.test_database big_lidar = db.big_lidar
add file to GridFS file = openfile_name, rb
gfs = gridfs.GridFSdb gf = gfs.putfile, filename = file_name
file.close add one tile to big_lidar database
tile = {type: s, version: v, id: i, date: d, loc: loc,
filename: file_name, gridfs_id: gf}
big_lidar.inserttile
Figure 4 shows a graphical result of the operation. It is the visualization of all stored tiles’ bounding boxes on top of a map
using a desktop GIS system Quantum GIS. Since the bounding boxes are stored as BSON objects, it is straight forward to export
them as GeoJSON files.
We show a simple status report and an aggregation on the database in the MongoDB console. This confirms the successful
storage of 1351 tiles and a total of 447728000747 points in the database.
db.big_lidar.stats {
ns : AHN2.big_lidar,
count : 1351,
size : 670096, avgObjSize : 496,
storageSize : 696320, …
} db.big_lidar.aggregate [ { group : { _id:
null, total: { sum : num_points } } } ] { _id : null, total :
NumberLong447728000747 }
6. SPATIAL QUERY
MongoDB like any NoSQL database allows for queries on the attributes of the document. When an index is created on a certain
attribute queries are accelerated. As a specialty MongoDB allows spatial indices and spatial queries. Hence we can perform spatial
Figure 4: Bounding Polygons of the tiles of the AHN2 dataset displayed over a map.
This contribution has been peer-reviewed. Editors: M. Brédif, G. Patanè, and T. Dokken
doi:10.5194isprsarchives-XL-3-W3-577-2015 580
queries on the bounding boxes of the LiDAR tiles. We show a very simple example of an interactive query on the Python
console. The query uses a polygon to describe the search area. The database returns the tile metadata including the GridFS
index. Using the GridFS index the actual point cloud data can be read from the database. It is simply the LiDAR point cloud file
that was previously put into the database. It is important to note that no conversion or alteration of the files was done.
tiles=listbig_lidar.find{ loc : { geoWithin : { geometry :
{type: Polygon, coordinates : [[[5.1, 53.3], [5.5, 53.3],
[5.5, 53.5], [5.1, 53.5], [5.1, 53.3] ]]}}}
} lentiles
4 tiles[0][filename]
ug01cz1.laz gf_id = tiles[0][gridfs_id]
gf = gfs.getgf_id las_file = openexport_ + file_name,
wb las_file.writegf.read
las_file.close
In Figure 5 we give the visualization representation of the query operation. Four tiles were retrieved by the spatial query. In the
example we attach the filename as attributes and plot the bounding boxes over a base map using the filenames as labels.
The leftmost tile corresponds to the point cloud visualized in Figure 1.
7. TIMING EXPERIMENTS
As we move file- centric storage from the operating systems’ file
systems to a database, the performance of the transfer operation with respect to the time they need to complete is of interest. We
have therefore performed some timing experiments for simple storage and retrieval of a single tile of about 40 MB. The
experiments were performed on an Amazon EC2, a well-known high-performance
computing environment
Akioka and
Muraoka, 2010. We used an EC2 small instance in all experiments. We separated local transfer from remote transfer.
Local Transfer is in- between the operating systems’ file system
and the database on the same machine. Remote transfer occurs between the EC2 instance and a machine outside the amazon
cluster connected via the internet. To better assess the measured times we give comparisons to alternative transfer methods. We
compare local transfer to the timing measured for direct file system transfer file-copy. Remote transfer is compared to
transfer with Dropbox, a leading personal cloud storage service Drago et al., 2012.
Figure 6 shows the results of local transfer. We can see that there is some overhead compared to file system copy. This is
particularly the case for storing files put, less so for retrieving them get. Figure 7 shows the timings for remote transfer. The
database performs at least on par with a common cloud storage system. This obviously depends on the internet connection. All
experiments were performed on the same computers at nearly the same time.
Figure 6: Transfer times for local transfer.
Figure 7: Transfer times for remote transfer. 0.036
1.515 0.268
0.000 0.500 1.000 1.500 2.000 OS file copy
GridFS - put GridFS - get
Local Transfer ~40MB Tile in sec.
95.8 116.1
6.2 13.1
0.0 50.0
100.0 150.0 GridFS - upload
Dropbox - upload GridFS -download
Dropbox - download
Remote Transfer ~40MB Tile in sec.
Figure 5: Example result of a spatial query visualized using QGIS. In the top image the red box represents the search
polygon. The bottom image shows the tiles that are completely within the search polygon. The MongoDB spatial
query delivered four tiles on the coast of the Netherlands.
This contribution has been peer-reviewed. Editors: M. Brédif, G. Patanè, and T. Dokken
doi:10.5194isprsarchives-XL-3-W3-577-2015 581
8. CONCLUSIONS