Louise Darroch ([email protected]), Justin Buck and James Ayliffe
July 2018
The British Oceanographic Data Centre (BODC) is located in the UK and is a national facility responsible for looking after and distributing data concerning the marine environment. It primarily deals with biological, chemical, physical and geophysical data and holds over 118,000 data series in their databases which are discoverable via web interfaces and web services (e.g. Sensor Web Enablement Sensor Observation Services). BODC hosts the NERC Vocabulary Server (NVS), which publishes over 240 vocabulary collections (list of standardised terms) represented in the Simple Knowledge Organisation System (SKOS) and Linked Data Resource Description Framework (RDF). They are responsible for hosting and maintaining vocabulary collections of instrument models (static entities), L22 (SeaVoX Device Catalogue) and categories, L05 (SeaDataNet device categories).
A persistent identifier for instrument instances would be of benefit to BODC for the following reasons:
Compiling metadata about datasets can be time-consuming, especially when we often receive only sub-sets of information. Identifiers which can unambiguously identify instruments used and their associated will help in this process.
Much of BODC’s metadata is standardised to allow for semantic interoperability, maximise discovery and allow machines to assess if data is fit for purpose. A globally unique identifier with standardised metadata element sets will enhance semantic interoperability and improve discoverability through our search portals or web services. It will also allow BODC to easily link data to instances and their associated metadata (in machine readable form), allowing end-users to assess if data is fit for purpose. We will also be able to link instances to instrument model entities in L22 completing the semantic chain.
Globally unique identifiers and associated metadata that are machine readable will facilitate their integration into automated workflows, such as the transfer of data from source formats into bodc formats (e.g. glider data in APDS) or applying calibrations during QA/QC of a ship’s underway dataset (ships underway workflow is in development at BODC).
As part of the EU’s SenseOCEAN project, BODC developed a system to make sensors and their data discoverable and accessible in the web through standardised APIs (sensor nodes) using OGC SWE and W3C’s Linked Data/SSN (Marine Linked Systems). The system currently relies on Universally Unique Identifiers (UUIDs) to identify sensor instances, match an instance’s metadata to its data and publish the information on the web. A globally unique identifier will help harmonise our sensor nodes with the rest of the world, a key aspect of the semantic sensor web. Sensor catalogues are actively being developed by different groups (EMSO, BODC, AWI, Ifremer, NOC-OBE for the PAP stie, etc) and OGC SWE enabled sensors will also output standardised metadata descriptions, semantic interoperability of these repository outputs will enable the use and ingestion of the metadata regardless of which group hosts it.
BODC has the capability to mint and publish Digital Object Identifiers (DOI) that are registered at DataCite. Expanding the service to the publication of instrument instances will offer our users a wider service and extend our reputation in the community. Publishing the PID makes BODC hosted instances of sensors globally unique which is a fundamental requirement of the PID for sensors concept.
Integrating persistent identifiers into BODC will contribute to FAIR findability and the provenance criteria of reusability.
- Must allow tools to leverage the content for the semantic web by:
- enhancing semantic interoperability by accommodating community specific identifiers such as controlled vocabularies which are used extensively in the marine community. These may be in URI form.
- allowing for relationships so users can organise information in discovery catalogues or machine-readable ontologies e.g. instance belongs to instrument class; LongName has manufacturer etc.
- Metadata elements must allow the identification of the instance to be unambiguous
- The identifier must be web resolvable
- URI needs to resolve to a landing page that includes content negotiation to resolve machine readable metadata (depending on the call made)
- Handling evolution and versions of instrument instances – who will be responsible? How will this be determined?
- Handling deprecation of instrument instances – who will be responsible? How will this be determined?
- The potential for different interpretations of an instrument – how do we constrain this?
- The resource involved in implementing and handling legacy (i.e. linkages between instruments and legacy datasets will probably require funding)
This is a list of metadata that are specific to BODC (essentially our metadata requirements). They are also what we think are important to be included in a DOI metadata schema in order to locate instrument instances with little ambiguity. They are also metadata that we think could be easily consumed by end-users in local applications. In order to keep things simple we have not attempted to map them to DataCITE properties and sub-properties
Identifier | Description | Occurrence | Comment |
Persistent Identifier | Globally unique identifier | 1 | PID |
Alt Instance Identifier | An alternative identifier such as a local identifier e.g. an inventory number | 0..1 | free text |
Long Name | The full instrument instance name | 1 | free text |
Manufacturer Serial No. | The part number of the design | 0..1 | free text |
Model Name | The model design (e.g. 4135) that the instance is based on. | 0..1 | free text |
Manufacturer Name | The device manufacturer or developer (e.g. Chelsea, University of Washington etc.) | 0..n | standardised term or free text |
Device Type | A community-specific identification of the system/model (e.g. L22). Allows for broader relationships | 0..1 | standardised term or free text |
Device Category | A community-specific identifier for the classification of the instrument instance. | 0..n | standardised term or free text |
Owner Organisation | Organisation name (we need to think about compliance with GDPR) | 0..n | standardised term or free text |
Valid From | from when this instance is true | 0..1 | date |
Valid To | To when this instance is true | 0..1 | date |
Description | General description of the instance to help in identification | 0..1 | free text |
Output | Outputs will help with machine semantic interoperability | 0..n | standardised term or free text |
Publisher | The centre responsible for creating the DOI | 1 | standardised term or free text |
Publication Year | The year published | 1 | date |
Resource Type | Indicates this is an instrument PID (Specified by RDA group or PID provider) | 1..n | Controlled list of values:
Physical object, Instrument PID |
Identifier | Description | Occurrence | |
Characteristic | An instrument’s characteristic. Requires sub-properties (e.g. weight) | 0..n | standardised term or free text |
Capability | An instrument’s capability. Requires sub-properties (e.g. accuracy) | 0..n | standardised term or free text |
Event | An event in an instrument’s lifecycle. Requires sub-properties (e.g. Calibration, Calibration Valid From etc.) | 0..n | standardised term or free text |
Sub Model Name | The sub-model or version of the design. | 0..1 | free text |
Device Type | The type of instrument (e.g. oxygen optode). This is more granular than the DeviceCategory. | 0..n | standardised term or free text |
Instance Reference | A model design reference (e.g. Turner et al. 1994). Can apply to older instances that were not mass produced. | 0..n | free text |
Funding reference | Name of the funding body. Our instruments are owned by the National Capability Centre which is funded by the Natural Environment Research Council | 0..n | standardised term or free text |