Workshop discussion Registries for structuring information

44 Copyright © 2010 Open Geospatial Consortium inputs needed to retrieve the Rule as well as a description of the rule and its assumptions.

9.2.3 Authoritative Data Source Directory

A key part of supporting feature and decision fusion are catalog, or registry services, providing sophisticated capabilities to discover, organize and access relevant data sources. One currently popular term for this kind of information service is an Authoritative Data Source. An Authoritative Data Source Directory ADSD was investigated in the OWS-7 FDF Thread OGC 10-086. ADSD is a resource capable of organizing and discovering a wide variety of types of data such as web sites, books, picturesimages, et al. as well as available web services. The directory is to have the ability to identify and query for data sources based on socio-cultural themes, geographic area either coordinates or geographic name, temporal relevance, and data quality e.g. precision, fitness for use. The ADSD concept was implemented as an OGC Catalog Service supporting all interfaces of CSW 2.0.2 plus extensions developed to support ADSD. OpenSearch was adopted as a metaphor for the extensions.

9.3 Phase 2 Study results on ObjectFeature

9.3.1 Structuring Unstructured Information

9.3.1.1 Workshop discussion

Unstructured information was a topic of discussion in the Fusion Standards Study Workshop. Data having “no structure” was challenged with discussion leading to use of the definition from Wikipedia contained in RFI-2: “Unstructured data is data for which there is no data model, or at least no data model that exposes any of the semantics of the data” See Section 8.3.3. Several topics were identified in the workshop: • One challenge is for an information system to offer services on data for which it does not have an information model. • Recent web based collaborative and mashup techniques support moving less- structured information to more-structured information through human-based interactions. • Techniques based on open standards are needed for gathering unstructured, public information • Methods for adding context and meaning to information in structured fashion are needed based on open standards for information models. In structuring information about an unstructured information item, one creates a model of the information item – meaning we select tags that help convey the meaning of information item. These can be completely arbitrary, and the “model” or list of tags can be changed on the fly at any time. The attached information items tags can have more or less arbitrary types – so can include simple types integers, strings etc but also geospatial or temporal tags. We can use the tags to enable searching and retrieve it. The search requests can make use any of the attached tags, including geospatial and temporal constraints, and can even look inside the unstructured items e.g. look at the internals of an HTML document where that might help in the discoveryaccess process. Copyright © 2010 Open Geospatial Consortium 45

9.3.1.2 Registries for structuring information

Management of the unstructured information and the newly associated information can be done using several alternative methods. The new information can be directly added to the unstructured item or a registryrepository approach can be used. In the latter case, the unstructured item is placed in the repository and the added information is held in the registry along with a link to the original data. The OGC CSW ebRIM standard can be used in this later case see Section 8.4.5. The advantages and disadvantages of each approach can be obvious. In the embedded case, the clear advantage is that all of the information is in one information package. In the exterior description approach the clear advantage is that can support multiple descriptions perhaps for different applications and not clutter a given package with extraneous information. CSW-ebRIM can be used for the management of unstructured data. CSW-ebRIM is a standard from the OGC that builds on OASIS called ebRIM eBusiness Registry Information Model. CSW-ebRIM makes use of something called Reg-Rep, with Registry objects referencing and pointing to associated Repository items. Think of the repository items as the unstructured information items, and the related registry objects as descriptions that expose their semantics. Each repository item e.g. an HTML document has a URN which is a URI and can be readily retrieved from the Registry using a simple GET request e.g. from a browser. In particular it would be useful to develop an ebRIM V3.0 model for feature fusion. This would use ebRIM Associations, Packages and ClassificationScheme registry objects to model feature associations for feature fusion. Version 3.0 of ebRIM is selected since it is the basis of the OGC CSW-ebRIM specification for which several OGC members have developed commercial implementations. The Galdos implementation of CSW-ebRIM enables the automated transformation of the output from the registry using XSLT scripts, these scripts being associated to a given registry object type e.g. audio clip. Whenever a request is made for such an object, the transformation script is retrieved and automatically applied to the registry object and its associated repository items. In this manner one can generate say an ATOM feed in which the registry descriptions are attached embedded together with the content repository item.

9.3.1.3 Adding geographic structure with TJS