A GEO label for the Sensor Web

The GEO label is a visual metadata summary that is designed to improve understandability of geospatial metadata. The amount of sensor data collected via and published in Sensor Webs is steadily increasing and thus the published metadata becomes more diverse, more complex and harder to understand. To mitigate this issue, we transfer the GEO label into the Sensor Web architecture in an encompassing way by using Sensor Web metadata as the label’s data source and by integrating labels into Sensor Web metadata standards. We conclude that (a) the sensor description metadata standards provide appropriate data fields to complete all the facets of the GEO label except for usergenerated information, and (b) that integrating labels into the standards is possible via inline integration and references. The extension mechanisms of a GEO label prototype implantation is successfully used to demonstrate a Sensor Web Label. We end with an extensive description of new research directions for the novel Sensor Web Label.


Introduction
The Global Earth Observations System of Systems (GEOSS) is a distributed 'system of systems' which provides access to earth observation data [8]. GEOSS is currently estimated to contain tens of millions of dataset records 1 which makes it extremely challenging for users to discover datasets that fit their particular needs. To tackle this challenge, in 2009 the GEO Science and Technology Committee proposed to establish a GEO label -a label "related to the scientific relevance, quality, acceptance and societal needs for activities in support of GEOSS" 2 .
As an answer to this call, the FP7 research project GeoViQua (http://geoviqua.org) developed a GEO label [14,15] as a visual metadata summary which can be integrated in discovery websites or catalogues to help users quickly grasp the availability of information and determine fitness-for-use [15].
In a parallel to the development of GEOSS, there is an ever increasing amount of heterogeneous sensor data available online due to the Internet of Things (IoT) or Smart Cities. Therefore, we see a strong demand for improving the understanding of sensor metadata and discovery of sensor observations, which can be achieved by transferring the GEO label concepts into the sensor web [3], adapting the sources for label facets' information, and integrating the label with the research on the discovery of sensors. In the remainder of this work, we describe how the GEO label can be applied to Sensor Web metadata models to mitigate the challenges of data discovery considering the expected increase in availability of sensor observation data. 1 https://www.earthobservations.org/documents/geo_xi/ 5_3_GEOSS_Highlights_Massacand_Desconnets.pdf 2 http://www.earthobservations.org/ts.php?id=91

Related Work
The GEO label represents a visual summary of the availability of metadata for a dataset [15]. It comprises eight informational aspects with three availability states, namely: producer profile, producer comments, lineage information, standards compliance, quality information, user feedback, expert review, and citations information. The label itself does not evaluate the quality or content of the metadata; it utilizes iconic depictions, colour and direction to visually convey availability of quality information enabling at-a-glance dataset intercomparison. Furthermore the label represents an interactive interface that provides summary hover-over text and links to external sites with detailed structured "drilldown" metadata. Figure 1 illustrates a classic GEO label for a fictitious dataset. The GEO label API is a web service interface encapsulating the generation of labels. It accepts XML metadata documents as direct input or as reference and returns a label in Scalable Vector Graphics (SVG) [6] format, which supports interactivity, to a client.
Jirka et al. [11] and Förster et al. [7] identified the challenges for the discovery of sensors, such as the dynamic structure of sensor networks, user context and domain, and the duality of sensors instances and sensor services. Interoperability is crucial for the discovery mechanisms to work, since no singular platform can be assumed [11]. Because the existing standards are complex to accommodate requirements of different domains, profiles are defined to simplify uptake and increase interoperability. The SensorML Profile for Discovery [9] is a profile for SensorML 1.0.1. It specifies a subset of the standard, effectively taking away options and judgements calls from the implementers. It covers the identification, classification, temporal validity, capabilities, contact, location, interfaces, inputs and outputs of a stationary sensor and its components. Figure 1

3
A GEO label for the Sensor Web

Use Case and Requirements
The sensor web spreads over a wide variety of applications, but as an introduction to the Sensor Web label (SWL), a GEO label that is adjusted to Sensor Web applications, we choose a classical scenario: a network of stationary in-situ sensors for environmental observations, which is also designed for the application of the SensorML Profile for Discovery. The Integrated Ocean Observing System 3 is an example, which is driven by public actors using the OGC Sensor Web Enablement (SWE) suite of standards 4 . SensorML, a part of SWE, provides an interoperable data model and encoding for describing components dealing with measuring any kind of observation, be it in-situ or remote, and the processes around the observation, such as pre-or postprocessing. Two versions of SensorML are available [1,18]. They are used to describe sensor stations in data provisioning services such as OGC Sensor Observation Service (SOS) [2] or catalogues such as the Sensor Instance Registry (SIR) [12]. These service specifications provide a standardized self-description operation, whose response is the so-called Capabilities document. SensorML and service capabilities are the main sources for the SWL. The SIR defines a catalogue API specific to the sensor web based on SensorML.
There are three ways to integrate the SWL in a distributed sensor web architecture, and all of these should be supported by an SWL framework: (i) dynamic integration on the clientside, e.g., a desktop GIS generates a label for a dataset based on embedded metadata or a reference to a metadata document; (ii) dynamic integration on the server side, e.g., a broker or portal generates a label on the fly and extends the metadata in its response with an inline or referenced label; and (iii) static integration into the sensor and service metadata, e.g., as a reference to an online or offline resource for the label or inline within a metadata document. A GEO label API server is itself part of the service-oriented architecture so that the provenance of a label must be transparent and its generation reproducible.

Label Transformations
The standards supported by the GEO label, namely ISO, FGDC, and the GeoViQua Quality Model, are XML-based. The specific XML fields to derive the availability and the contents of the label's facets are defined using the XPath [4] expressions. XPath is a query language for XML documents and it is used to select and evaluate suitable elements to build the label's facets. The expressions for one standard are grouped in the form of transformations which are stored as JSON files and contain fields to (i) check information availability, (ii) build a hover-over text based on a string template and XPaths to fill the placeholders, and (iii) add a drilldown URL hyperlink to the facet. The original mappings are available online 5 and Figure 2 shows an excerpt of a transformer file. Multiple transformations can be applied to the same metadata document.

Sensor Web Mappings for the GEO label
SensorML sensor descriptions and SOS service metadata are relevant sources for SWL within the use case. The respective XPaths for a label transformation are given in Table 1. Hoverover texts are described in plain text to improve readability. In the absence of a comparable profile for SensorML 2.0, elements similar to the ones in the Discovery Profile were selected.
For some facets, the sensor web standards lack suitable fields to provide the required information. For these cases, the transformer files of the original GEO label can be applied, because their transformations utilize a flexible XPath root ("//") so that the user comments or ISO quality information embedded inline can be discovered and interpreted. The given paths can also match information in sub-components, i.e., SensorML documents containing a system composed of components, for the same reason. This is invisible for the label user, but can be more transparent in the drilldown information pages. For example, the following full XPath shows the location of a GeoViQua producer comment embedded in the extension element of a SensorML component: sml:PhysicalSystem/sml:components/*/sml:compone nt/*/sml:extension/*/gvq:GVQ_DiscoveredIssue

Labels within Metadata Documents
Integration points have been identified for all relevant sensor metadata standards to facilitate the third integration mechanism for adding sensor labels into existing metadata documents, which are (a) adding sensor labels in-line so that the whole label information is available in a form of SVG encoded as XML, or (b) as a reference to an online resource, such as a call to the GEO label API. Schematron [10] is a schema language for XML. Figure 3 shows a rule reporting The hover text contains the number of parties and the first party's name and organisation.

//sml:contact/gmd:CI_ ResponsibleParty
The hover text contains the number of parties and the first party's role as well as name and organisation.

//sos:ServiceProvid er
The hover text contains the provider's name and a URL.

Producer comments
Not available in these standards. Fallback mechanisms to GEO label transformers.
Lineage information //sml:ProcessChain | //sml:history Covers two aspects of lineage (process steps, system history); the hover text contains number of process steps and number of history records for the sensor.
//sml:method | //sml:history | //sml:connections Covers three aspects of lineage (algorithm, system history, process steps); the hover text contains number of process steps and number of history records for the sensor.
Not available; fallback mechanism to GEO label transformers.

//sml:SensorML
The hover text contains the standard name and version based on the root element name and XML attribute. The hover text contains the standard name and version as a fixed value.

//sos:Capabilities
The hover text contains the service type and versions.
Quality information //sml:output//swe:q uality The hover text differentiates between contents of output quality: text, category, quantity.
The child elements of output which contain the quality element can by any scalar or range type. //sml:capabilities//s we:quality | //sml:output//swe:qua lity The hover text differentiates between quality types: text, category, quantity. The child elements which contain the quality element can by any scalar or range type. Not available; fallback mechanism to GEO label transformers.

User feedback
Not available in these standards, fallback mechanism to GEO label transformers, e.g., for user feedback //gvq:GVQ_FeedbackCollection/gvq:summary or for citations //gmd:identificationInfo/gmd:MD_DataIdentification/gmd:referenceDoc. Expert reviews Citations information whether a SensorML 1.0.1 document contains a label reference as part of the element sml:documentation.
The rule tests whether the format element of an sml:Document contains the character string "geolabel" or the attribute xlink:role within an sml:onlineResource is set. The URI for the role attribute must be provided by an authoritative body defining the semantic relationship between metadata document and the label. Figure 4 shows an XML example fulfilling both tests. The sml:onlineResource element contains the actual link and a human readable title.
The integration of SVG embedded in SensorML 1.0.1 is not possible because no field or element could be identified that can hold arbitrary XML.  SensorML 2.0 also provides an sml:documentation element, but the actual content is encoded using a gmd:CI_OnlineResource. Similar approaches to signal that a specific document is a label are possible by moving the string match to the resource name. Figure 5 shows a SensorML 2.0 snippet with an SWL as a reference. The more recent standard provides extension points which can hold any kind of XML content. This makes the inline integration of an SVG-based inline SWL possible. Figure 6 shows a Schematron document for reporting embedded labels using a specific element <geolabel /> either in an extension or within SVG metadata; Figure 7 is the corresponding example document.  Integration into SOS 2.0 Capabilities documents is possible because extension elements are provided. To limit the file size of the capabilities document, which is intended for quick exploration of services, an implementation would probably use referenced labels in favour of inline labels. In either case, the extension elements within each "offering", a kind of layer with exactly one procedure (sensor) can be used. In lieu of a documentation field, both inline and referenced labels must be integrated inline using a dedicated <geolabel> element as shown in Figure 8; the corresponding Schematron is similar to the second pattern in Figure6 and was left out for brevity.

Implementation Prototype
The SWL mappings are implemented within an open source software project GEO-label-java 6 , which implements the GEO label API. The software reads the transformation description files and creates internal objects from the contained XPaths. When a metadata document is analysed, the XPaths are evaluated and the results are used to generate a label based on a template. New transformation description files (full transformer files available online 7 ) contain the XPaths to support the new metadata standards. To reduce processing effort, the files were extended with a new field using an XPath expression to check if the contained transformation is applicable. The software prototype implements a caching mechanism to increase performance and decrease server load when generating labels. The URLs provided to the API and the generation time are stored as the key in an associative array and the generated label is stored as the value. Naturally, the cache only works for metadata documents provided by reference via URL. Highly dynamic aspects of the label, such as the number of user ratings, are not captured by cached labels. 6 https://github.com/52North/GEO-label-java/ 7 https://github.com/52North/GEO-labeljava/tree/master/server/src/main/resources/transformatio ns

Discussion and Conclusion
We present a holistic integration of an adapted metadatabased label for the sensor web standards and applications. The described mappings demonstrate that, apart from the usergenerated information, appropriate fields in the sensor description standards exist to support a label. The pieces of information missing from SOS can be ascribed to it being a service standard and can be filled by the accompanying sensor description.
IoT sensor webs are expected to grow extensively in the future. However, an evaluation of two popular IoT platforms showed that these lack a sophisticated metadata system which could provide information for a SWL 8 .
This work goes beyond the original GEO label by analysing means to integrate labels into metadata standards. All requirements are met in theory and are supported by a software prototype. Integration into metadata documents works better in recent standards through extension elements for inline embedding of labels, whereas all standards have suitable structures to include references to online resources, i.e., URLs.
A more suitable way to signal that an inline SVG document represents an SWL will have to be determined in real-world deployments. Integration by reference is more practical for scenarios without connectivity restrictions, as it increases the document size less and bears less risk to outdated dynamic label information. The referenced SWL requires global identifiers or well-defined practices for declaring that a given online resource actually returns a label, because the current rules require string matching, which is quite error prone.
The prototype successfully applies the transformation file concept of the GEO label API implementations to add a completely new set of metadata standards as a data source for the label. The caching mechanism requires a more finegrained structure for the individual facets to distinguish between information that is likely to change and not cached at all, such as user-generated feedback, and information with lower update intervals that can be cached, such as producer profile and compliance with standards.
The most crucial step for an SWL lies beyond research, as it will only provide high usage if provided on a cross-domain cross-technology and cross-platform discovery portal with high visibility and acceptance amongst users. It is debatable whether such a platform will exist in the near future, but we expect a usable and practical metadata quality label as a great feature that can help the cause, and this work lays the foundation for such a label in all sensor web domains.

Future Work
Next steps can be divided into (a) extending the data sources for labels, and (b) changing the SWL further to meet sensor web requirements. With respect to additional data sources, the Semantic Sensor Network Ontology (SSNO) [5] is a metadata model, which also provides suitable data fields to the facets and drill-down information elements of a proposed SWL. In general, so this is also the case for the original GEO label, semantic web technologies bear a high potential, because ontologies would allow creating generic rules to determine facet availability and drilldown information. Linked data would allow following references at an arbitrary depth to harvest distributed information sources. But this also poses new challenges to implementations because other protocols, encodings and query languages must be considered, e.g., for transformation descriptions. Within the existing standards, the SWL data sources should be extended by leveraging SensorML 2.0's parent mechanism, which is based on a type definition feature (sml:typeOf), because information to fill the label could be discovered at any point in the description hierarchy. Here, we see future work to support a generic "parent of parent" mechanism replacing the fixed parent document with an XPath to resolve parent documents recursively.
Given the current lack of standardization for IoT, a third path to extend data sources is the work conducted in the SensorThings working group at the OGC 9 . It is working on a standard to support lightweight IoT applications, and the current draft does contain a capabilities document-like metadata structure 10 . These developments must be observed so that an SWL can support IoT platforms and act as a bridge between different sensor webs.
With respect to adjusting the label more drastically to sensor web requirements, we see the following avenues of future research. In the mapping between SWE standards and GEO label facets, it becomes clear that relevant parts of the discovery profile, which were selected to facilitate identification of fitness-for-purpose in sensor web applications, could not be mapped to the SWL. Keywords, identification, and classification are, unlike in ISO standards, not mandatory in SensorML. Valid time and up-to-dateness of metadata and data, a definition of interfaces, and legal constraints are more important in a highly dynamic and realtime oriented infrastructure, but they are not represented by the current label. Therefore we propose to develop new facets based on these data fields for a second iteration of the SWL and evaluate their usefulness in comparison with the GEO label's facets. Such work should include a survey along the lines of the studies conducted by Lush et al. [13,14], but focus on the sensor web domain and its specifics. A survey should target both producers and users.
We also see a potential to develop the labels further with respect to the checks they apply. Going beyond XPaths, GEO label API implementations could validate documents against their schemas and against metadata profiles. As a concrete step, the implementation could evaluate provided metadata documents against a machine-readable profile definition, e.g., Schematron, and use the profile validity as an advanced check for standards conformance. This greatly increases the value of integrating the SWL in sensor web catalogues, such as the SIR. Finally, providing more sources and incentives for usergenerated metadata (comments, reviews) is required to bring the SWL to its full potential across all facets. 9 http://www.opengeospatial.org/projects/ groups/sensorthings 10 http://ogc-iot.github.io/ogc-iotapi/datamodel.html#capabilities (accessed on January 30, 2015) 6