Data pre-processing Multi-level sensor data storage

SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 207 of 233 - Different data providers may implement different data models for their Sensor Observation Services because of differences in their requirements andor intentions. While this is still compliant with the SensorSA, it may impose an additional burden on applications that have to use those different data sources. - Data models used by organizations internally may not be feasible or appropriate for publishing them or making them available for a specific purpose. - Organisations may need to provide an aggregated view of data collected by different providers, e.g. for implementing federated data pools Cascading SOS can be used to facilitate cleaner and more robust implementation architectures in these cases. The intermediate SOS server provides a single interface to all the underlying data sources. This results in a clean distinction between the data access and processing on the client side, and the aggregation, transformation andor filtering of the data that is necessary for a specific purpose in the intermediate SOS.

10.8.3 Data pre-processing

In a sensor network data processing occurs on various occasions. The classical use case is pulling a data set from a service, processing it as required for the application scenario, and probably storing the result somewhere. This use case is described in detail in section 10.9. However, for some common, more lightweight data processing tasks the application scenario could be optimized by processing the data on the fly when they are accessed. In such a scenario a cascading SOS acts as a service providing access to derived data without the need to first fetch all of the source data and applying the calculations. While not feasible for all types of data processing operations e.g. lengthy calculations, it simplifies application architectures where it can be applied. A typical scenario would be the calculation of mean values for time series data. While the measured data may be available with, for example, half-hour mean values from the sensors, an application may require daily mean values for its operation. This can be solved by using a cascading SOS that calculates the daily mean values on the fly using the half-hour mean values as the data source.

10.8.4 Multi-level sensor data storage

Some of the scenarios described in section 4.5 include SOS interfaces provided directly by the sensors, or by data loggers connected to the sensors. These devices typically are physically located in remote locations near the place where the observations are taken, and not in a typical data centre environment. When applications e.g. GUI clients, data processing applications would access theses devices directly, it would be very hard to meet requirements of those applications regarding availability, fault tolerance, performance, etc. In addition, those devices usually have tight constraints regarding storage space, which imposes problems for long time storage of observations. To remedy these problems a cascading SOS can be used as illustrated in Figure 10-25. It fetches and stores the data provided by the sensors or data loggers, and all client applications SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 208 of 233 access this service instead of accessing the sensors directly. The cascading SOS can be located in a data centre where it is much easier to meet availability and performance requirements. Long term data storage is also easier to implement in that scenario. Cascading SOS User Application Figure 10-25: Multi-level sensor data storage 10.8.5 Caching of data In most of the previously described scenarios, caching of the data from the source SOS at the cascading SOS is either a primary aspect or at least a “nice to have” feature. For this caching process different approaches can be taken. Depending on the requirements of a specific application every approach has its benefits and weaknesses, or may not be applicable at all. An approach for caching can be broken down into a few different aspects of its operation, which are described in the following. The first distinction can be made on the source of the event that triggers the re-fetching of the data from the source SOS: - Data retrieval from the source SOS can be triggered by the request that the client makes to the cascading SOS. At this moment, the cascading SOS has to decide whether the data that is available in his cache is valid. If it is invalid, the data has to be updated by reloading it from the source SOS - The trigger of the re-fetching can be the source SOS itself. By using event-based interaction patterns see section 6.3.3, it can notify the cascading SOS of new or updated data. The new data values can be included in the event notification itself, or the cascading SOS may fetch data from the source SOS using conventional SOS operations in response to the event. SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 209 of 233 - The re-fetching of the data can be triggered by events not depending on either the source or the client. An example would be a time schedule that controls when data is fetched from source SOS servers. Another classification can be made on how to determine if data in the cache is still current, i.e. if the cache is still valid: - The cascading SOS can query the source SOS if the data is still current. Currently the SOS specification does not foresee operations or metadata to support this approach in a generic way, but using the OM and SOS specifications it can be realised if both the source SOS and the cascading SOS agree on a common way of implementing it. - The data in the source SOS can contain information if the data is current. Depending on the granularity required, this can be either encoded in the SensorML description of a procedure if it remains the same for all observations made using this procedure, or it can be encoded using OM together with the data values if each observation can have different constraints to determine if it is still current. - It may be determined at the level of a cascading SOS implementation. If the source SOS does not support any information about how long its observations are current at all, it may be possible depending on the application scenario to define this at the level of the cascading SOS itself. For updating data in its cache, a cascading SOS has to identify each observation. Since the current OM and SOS specifications do not provide a generic identifier that can be used for this purpose, a work-around solution has to be implemented currently. The implementation approach depends again on the application scenario. An example of such a solution would be to use an artificial unique key to identify an observation, e.g. consisting of the result time, the sampling time, the identifiers of the feature of interest, observed property and procedure. In many sensor network scenarios this may be sufficient to identify an observation for caching purposes. Another important aspect that has to be handled is the deletion of data. Although in many scien tific applications data is not deleted but instead archived and “logically” replaced by “newer” values, there may be applications that require the ability to delete an observation. The current SOS specification does not support this, and thus again work-around solutions have to be implemented.

10.8.6 Event-based interaction in cascaded scenarios