Processing Chain Service Processing Chains .1 Introduction

SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 210 of 233 - Sensor Available Event: depending on the implementation of a cascading SOS it may automatically include new sensors when they become available. In this case the cascading SOS has to act on the “Sensor Available” event in the sensor network and add the new sensor to it‟s configuration. - Sensor Unavailable Event: some of the scenarios for cascading SOSes require that data in the cascading SOS is still available even if the source SOS is no longer active. - Sensor Pro perties Changed: this requires the cascading SOS to update it‟s metadata accordingly - New sensor data: depending on the replication strategy implemented in a cascading SOS as described before, this event may force an update of the data in the cache of a cascading SOS.

10.9. Processing and Fusion Support

10.9.1 Processing Chains 10.9.1.1 Introduction Processing in general and fusion in particular often follows a multi-step pattern. First the input data to be processed or fused must be discovered using meta-information that characterises these data and that is compatible with the processing algorithm to be used. Then the input data must be fetched from different places using the appropriate access methods and protocols. Next, the fetched input data must often be pre-processed to deal with unit and format conversion needed to match the inputs expected by the processing algorithm. At this point, the processing per se can be performed and outputs are produced. Those outputs must often be post-processed to again deal with unit and format conversion before storing the processing results. Then, the converted output data must be stored in various places using the appropriate access methods and protocols. Finally, data rendering could be performed in preparation for later visualisation. In the SensorSA, this multi-step processing pattern is supported by a service processing chain. This processing chain is itself exposed as a service.

10.9.1.2 Processing Chain Service

Referring to the processing flow illustrated in Figure 10-26, the main process has three inputs and two outputs. The input data are fetched from three Sensor Observation Servers SOS and the processing results are stored in two SOS servers. All of the processing i.e. pre-processing, main processing, and post-processing is performed by instances of the Processing Service see section 8.4.2. SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 211 of 233 Pre- Processing Data Storing Post- Processing Pre- Processing Data Fetching Data Fetching Data Fetching Pre- Processing Data Processing Post- Processing Data Storing SOS WPS WPS WPS SOS n m Pre- Processing Data Storing Post- Processing Pre- Processing Data Fetching Data Fetching Data Fetching Pre- Processing Data Processing Post- Processing Data Storing SOS WPS WPS WPS SOS n m Figure 10-26: Processing Flow The multi-step processing pattern described above can be implemented by an instance of a Processing Service PS called processing chain in Figure 10-27. To a client the processing chain exhibits a PS interface front-end interface. As a back-end interface it uses a number of other services in order to execute the processing chain: - The discovery of input data can be accomplished using a catalogue service see section 8.2. - The input data fetching and output data storing can be accomplished using a Sensor Observation Service see section 8.2.2, a Feature Access Service see Table 8-11 or an FTP service. - The input data pre-processing and output data post-processing can be done using a Processing Service see section 8.4.2. - Finally, the data rendering could be achieved using a Map and Diagram Service see section 8.4.2, e.g. for the generation of isolinescontours. The processing chain is opaque i.e. not modifiable by the user and is likely to be implemented using BPEL. Whenever possible, i.e. mainly for data pre-processing and data post- processing, parallel execution is performed using the BPEL flow activity. This approach is expected to cover a wide range of processing needs with only moderate modifications to the BPEL source code. All the inputs needed to access the individual services composing the chain must be provided as input to the processing chain. Temporary storage e.g. an FTP server is needed in order to store intermediate results that are passed by reference from one service to another. If each PS instance has its own FTP server to store its outputs, then the number of data transfers across the Internet can be reduced to its minimum but cannot be eliminated. Nevertheless, the PCS must provide its own FTP server to store the outputs of the Processing Services that do not support stored outputs and to store its execute response which can be updated to provide process execution status information e.g. percentage completion. Storing the execute response is the WPS mechanism to implement asynchronous process execution. To avoid running out of file storage space, some form of garbage collection must be implemented on the FTP server of the Processing Service instances underneath. For example, all output files older than a pre-defined time e.g. 1 day could be removed on a regular basis. SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 212 of 233 Figure 10-27: Processing Chain as an Instance of a Processing Service 10.9.1.3 Advanced Topics for Processing Chains 10.9.1.3.1. Continuous Feeding One particular case of a processing chain arises when the input is a continuous flow of data e.g. temporal fusion. In this case, the data fetching step must be repeated on a regular basis and the complete chain must be executed each time, producing new incremental results. This cyclic execution of the processing steps can be handled by the processing chain itself but the cycle period and stop condition e.g. total number of cycles or total duration must be provided in the processing chain Execute request as additional input parameters. The cyclic execution of the processing chain assumes that all the services in the chain are able to operate incrementally i.e. using only the data fetched in the current processing cycle. It also assumes that all the processing can be completed during the cycle period i.e. before a new cycle begins. However, there are cases where the main data processing step is stateful or simply requires data that was acquired in previous cycles. The main data processing step may then have to be designed to support incremental execution. This means that the service hosting this data processing must be able to create, save, and restore the context algorithm state, data cache, etc SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 213 of 233 needed to relate successive executions. A context identifier must therefore be assigned by the data processing service in the first cycle and be provided by the processing chain as an additional parameter in the execute request of the following cycles. Also, in order to initiate the data processing, the first cycle may require a much larger amount of data fetching e.g. historical data. Finally, it is up to the data processing service to decide if it must cache data provided in previous processing cycles e.g. for algorithm tuning or retraining. Regarding the data fetching step, the SOS specification defines an optional GetResult operation see section 8.2.2 that could be of interest in this continuous feeding use case if supported by the SOS instances providing the data. It could be used in all processing cycles except probably the first one. 10.9.1.3.2. Event-Triggered Processing It could be of interest to trigger the execution of the processing chain upon reception of a particular event. The processing chain is armed by the Execute operation but only really starts when the event is actually received. Depending on the option chosen, once the execution is complete the processing chain could automatically re-arm itself or require a new Execute operation. The information needed to define the triggering event e.g. topic and the stop condition e.g. event count or event topic must be provided in the processing chain Execute request as additional input parameters. Although it is easy to imagine such an event-triggered processing chain, it is actually not straight forward to implement it. The natural and most efficient approach would be to have the processing chain passively waiting for the event to be pushed by the event producer. This means that the event must be addressed and delivered to a particular instance of the processing chain which cannot be done without support from the BPEL environment hosting the processing chain. The BPEL engine may support the invocation of an asynchronous service where the service is able to call back the instance of the BPEL workflow that made the service invocation. In this case, using WS-Addressing information in the SOAP header of the service request e.g. ReplyTo and MessageId elements and of the call back request e.g. RelatesTo element, the BPEL engine is able to find the target workflow instance. However, if for example WS-Notification is used by the processing chain to receive event notifications, the Notification Producer or Notification Broker will not provide the correlation information needed by the BPEL engine to find the particular instance of the processing chain. A workable but less efficient approach would be to have the processing chain actively polling for the availability of events and pull the event from a pull point as illustrated in the figure below. The polling interval could be specified in the processing chain Execute request as an additional input parameter. The processing chain first requests the creation of a pull point from a PullPoint Factory. Then, the processing chain subscribes to a Notification Producer or Notification Broker and provides the topics of interest as well as the end point reference of the newly crated pull point. Next, the notifications generated by the Notification Producer or Notification Broker are pushed to the pull point and can be retrieved pulled by the processing chain at polling time. By specifying the number of notification messages in the GetMessages operation, the processing chain may decided to pull one notification at a time i.e. execute the complete processing chain for each notification. Alternatively, by not specifying the number of notification messages in the SANY D2.3.4 Specification of the Sensor Service Architecture V3 Doc.V3.1 Copyright © 2007-2009 SANY Consortium Page 214 of 233 GetMessages operation, the processing chain may flush all the notification messages accumulated by the pull point during the polling period. PullPoint Processing Chain Notification Broker 3: Notify 2: Subscribe 5: Unsubscribe PullPoint Factory 1: CreatePullPoint 4: GetMessages 6: DestroyPullPoint PullPoint Processing Chain Notification Broker 3: Notify 2: Subscribe 5: Unsubscribe PullPoint Factory 1: CreatePullPoint 4: GetMessages 6: DestroyPullPoint Figure 10-28: Reception of Notifications by Processing Chain Instance When the processing chain stop condition has been reached e.g. event count or particular event topic, the processing chain must unsubscribe to the Notification Producer or Notification Broker and then destroy the pull point. 10.9.1.3.3. Discovery Meta-information is needed to discover input data that can be used for a processing chain, i.e. that are compatible with the processing algorithm used in the chain. Another approach could be to discover the Web Processing Service WPS to call as part of the processing chain to match available input data. This requires that appropriate meta-information about the Processing service be available in some catalogue. One possible way of providing information about the processing service is to describe it using SensorML Botts, 2005. The catalogue service section 8.4.1 offers a broker mechanism that could be used to link processestasks of fusion services to data sets. These links could be established manually or automatically during harvesting. The processing chain could use this mechanism to discover from service to data set or from data set to service the compatibility between fusion services and fusion data sets.

10.9.2 Uncertainty Handling in Processing Chains