Harmonizing the sharing of marine observation data considering data quality information (oral)
MINKE, an EU project funded within the framework of the H2020 (GA nº 101008724.), is about integrating key European Marine Metrology Research Infrastructures to coordinate their use and development and propose an innovative framework for oceanographic data quality. Alright? No? I struggled in the beginning too, so let’s start with an example.
Imagine we have a coastline (see figure above, left) and would like to measure the water temperature. Ideally, we would have accurate sensors at all locations along the coast. However, accurate sensors are expensive, which means that we only have a few of them. These few but very accurate observations lead to an interpolation error. Alternatively, we have many low-cost sensors, with many – but inaccurate – measurements resulting in a bias. In both scenarios, we don’t get a realistic picture of the water temperature and have limitations in terms of accuracy or completeness. So what about combining the two approaches? It is possible, but it requires a data infrastructure that harmonizes such marine observation data. We also need to communicate data quality transparently, for example, by providing access to reports summarizing recent calibration efforts. The overall goal is to achieve a quality-conscious and interoperable exchange of marine observation data. First, we reviewed relevant standards (e.g. OGC) and explored the status quo of current practices for describing data quality. Several standards exist, but the adoption rate is rather low. In addition, many datasets don’t include quality information, and important sources of information (e.g., calibration reports) are not available in machine-readable formats such as PDF files, if at all. Based on these findings, we provide a first set of recommendations, including encoding metadata about observation datasets using ISO 191151, the Sensor Model Language (SensorML), and OGC Observation and Measurement (OGC O&M), taking into account ongoing developments in O&M and the SensorThings API. We also suggest the use of vocabularies to ensure semantic interoperability in data quality descriptions (see figure below).The next step is to have a prototypical implementation of the recommendations as part of the MINKE project.
1 ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of metadata
“The project is funded by the European Commission within the Horizon 2020 Programme (2014-2020) Grant Agreement No. 101008724”
Link to abstract
Communicating data quality through open reproducible research (short course)
The idea of the short course was to demonstrate a set of tools that can help researchers to publish open reproducible research, i.e., to publish research results in such a way that others can achieve the same figures, tables, and numbers using the same data and analysis. With the help of a use case from the MINKE project, we wanted to show the benefits and usefulness of tools like git for versioning and collaborative software development, R for scripting computational analyses reproducibly, R Markdown for creating shareable and executable documents, Binder for sharing reproducible workflows, and The Whole Tale for publishing these workflows in a repository. Needless to say, presenting these tools interactively as a live demo is much more appealing and understandable than just a set of static presentation slides. This brings me to my main point of criticism. Unfortunately, the EGU strongly restricts how teachers can run their short courses. You are not allowed to use your own laptop for an interactive session to demonstrate some tools. Also, instructors aren’t allowed to open the browser. You can only present your static slides. In my opinion, this is not an adequate format for a short course and the EGU should question whether these restrictions make it possible to offer high quality short courses.
After some discussion, they made an exception and allowed me to show something in the browser, which was great because otherwise the short course would have been even shorter than the name suggests. Also, many of the tools I wanted to demonstrate work in the browser, which is just one of the many advantages of the tools. All the materials I used in the course are also available on GitHub (https://github.com/MarkusKonk/egu_shortcourse), including links to Binder and The Whole Tale. I hope that the participants took something away. Maybe they found some of the tools useful, maybe they got some food for thought. Both outcomes would be worth the effort.
Links to repository + abstract
Leave a Reply