Greetings Sensor Web Community,
My name is Christian Danowski and I plan to finish my Bachelor degree in geoinformatics at the University of Applied Sciences Bochum in February next year. Prior to the Bachelor thesis, I had the great opportunity to experience a twelve week internship at 52°North (52N). I intend to share my experience with you in this blog article.
During the internship, my task was to develop a new data source component for the 52N Sensor Observation Service (SOS) within the context of the European Project GEOWOW. The idea was to use a file system of NetCDF files as a data source instead of a database. Hence the necessary information provided to the SOS is not stored inside a database like PostGIS anymore, but in the NetCDF files itself. In the following paragraphs I will try to briefly explain the background of NetCDF and the steps of the development process.
NetCDF (Network Common Data Form) represents a variety of software libraries and data formats to create, access and share array-oriented scientific data. It was developed by Unidata, a community of research and education institutions. The main goal of the NetCDF format is to store scientific data, such as meteorological phenomena like temperature or precipitation, in a self-describing array-oriented way. So a NetCDF file contains both the data itself and enough metadata to offer sufficient information about the characteristics of that data. The actual data is stored in multidimensional arrays, while the metadata is provided via these variables’ attributes. To ensure interoperability, providers and users must agree on certain variables and attributes, otherwise each company might use different attributes and variables for the same information. The job was done with the CF-Convention (Climate and Forecast Convention). Both the NetCDF format and the CF-Convention have been adopted as OGC standards. For further details, please consider the websites about NetCDF and the CF-Convention.
Following the CF-Convention, a NetCDF file can be one of three feature types:
• Point Dataset: Allows storage of unconnected point collections; e.g. can be used for time series of stationary data or trajectories.
• Grid Dataset: Allows storage of data in a regular grid. Such gridded Datasets are also the most common feature type.
• Radial Dataset: Allows storage of data using polar coordinates.
For the time of the internship I managed to support gridded Datasets as an SOS data source. You might ask yourself why I chose gridded Datasets instead of Point Datasets. A Point Dataset would actually match the concept of sensors in a much better way, because you can define stations that ‘observe’ certain phenomena. However, gridded Datasets are more common and we were provided with plenty of test data as gridded Datasets.
The following figure shows an exemplary visualization of a grid dataset using the free tool Panoply.
You can see the visualization of a temperature variable. Each value is embedded in a spatial-temporal coordinate system. This way each grid cell has a unique geo-location and a time reference. From the SOS’s point of view, the value of a grid cell, its geo-location and time reference represent the key information. So each grid cell of a gridded dataset offers a complete observation to a SOS.
Combining NetCDF and SOS
To provide the information of the NetCDF files to an SOS, I had to implement the data source layer of the SOS. Over the past twelve weeks I managed to support the SOS-core-operations (GetCapabilities, DescribeSensor, GetObservation) and the GetFeatureOfInterest operation. For each of these operations I had to implement the corresponding DAO (data access object). During installation of the SOS, the user merely has to provide a path to a folder in which the NetCDF files are located.
• GetCapabilities / CacheFeederDAO: basically returns which NetCDF files exist, their spatial and temporal extent, which features of interest and what phenomena (observableProperties) these files contain. This information is stored in a cache by creating certain IDs for each object. For the rest of the operations, the cache can be asked for certain relationships between NetCDF files and their contents.
• DescribeSensor: one NetCDF file is seen as one procedure/offering. So this DAO returns which features of interest and observable properties this NetCDF file offers and encodes it in SensorML. The cache can be asked for all relevant information. As a consequence, it is not necessary to access the data source itself in this DAO, because all information can be picked from the cache.
• GetFeatureOfInterest: Each grid cell of a gridded NetCDF file is defined as a feature of interest, thus each feature has its own geolocation. By this definition each NetCDF file can easily have multiple thousands of features. This may become inefficient with many different NetCDF files. Depending on the filters of the GetFeatureOfInterest request, certain features (grid cells) are determined and returned as GML-encoded objects. These features can be filtered spatially and by procedure, observable property and feature-ID.
• GetObservation: This DAO has the task of querying the data source for the actual data. The queried observations are encoded in OGC ‘Observations and Measurement’ by applying certain filters from the request. The same filters as in GetFeatureOfInterest and an additional temporal filter can be used. In the case of NetCDF files, the goal is to create multidimensional indexes from the filter parameters that point to the requested observations. An observation is stored in a variable that is embedded in a spatial-temporal coordinate system, thus the indexes of the coordinate axes must be determined. The value of the variable at a specific multidimensional index represents the observation value.
I am grateful for the past weeks I spent at 52°North. I made a whole bunch of new experiences talking to the people and working on the project.
In the near future I will continue on the project while working on my Bachelor thesis. The main goal will be to support Point Dataset files as well.
I hope to stay in contact with 52N after my degree.
Feel free to contact me: firstname.lastname@example.org
Please find the link to the SVN repository: 52n-sos-netCDF.