The core idea of my GSoC 2017 project “Simple Features for protobuf and others” is to define and implement a serialization API for spatial vector data, which will transparently serialize geometries based on the Simple Feature specification using Protobuf into binary encoding. The Simple Feature Access Specification is a common standard that is widely used in geoinformatics for exchanging spatial features. This serialization API also supports decoding serialized binary data into prefered output models such as JTS and others. Protocol Buffers is used as the primary serialization framework. Other serialization frameworks, such as Avro, are being considered as well. I am currently implementing serializing support for raster data into the API, where it will utilize raster data formats, such as GeoTIFF, and modeling libraries, such as GeoTools.ommon standard that is widely used in geoinformatics for exchanging spatial features. This serialization API also supports decoding serialized binary data into prefered output models such as JTS and others. Protocol Buffers is used as the primary serialization framework. Other serialization frameworks, such as Avro, are being considered as well. I am currently implementing serializing support for raster data into the API, where it will utilize raster data formats, such as GeoTIFF, and modeling libraries, such as GeoTools.
The core tasks and development of the first seven weeks of the project period can be summerized as follows:
- Protobuf serialization/deserialization support for simple features
- Avro serialization/deserialization support for simple features
- Protobuf serialization/deserialization support for raster/coverage type data
I will give a more detailed description on the project’s status below. A more technical description about the changes in the project can be found at the project wiki page and its corresponding Github repository:
Protobuf serialization support for simple access features
Protocol Buffers is a format for data serialization, which has built-in features, such as binary serialization, RPC frameworks and IDL. Its unique features, such as flexibility, efficiency, automated mechanism for serializing structured data with smaller, faster, and simpler manner, make it a very good candidate for data serialization use cases. JTS Topology Suite is an open source Java software library that provides an object model to realize vector based geomatics. In this project I will be using locationtech JTS library to realize a raster data model. I have implemented a serialization handler for JTS and protobuf with serialization support for Point, Linestring, Polygon, MultiPoint and MultiLineString, MultiPolygon, Line, LinearRing and Triangle models. Since there is no JTS model for a TIN model, we had to exclude that from implementation for the moment. It will be considered later to extend the JTS library to support TIN model serialization. As Line and Triangle models do not inherit from Geometry abstraction in JTS, I had to use overloading to support those serialization options. Then I have implemented a deserilization handler to support deserialization by encoding the serialized binary data and generate a JTS model out of it. Deserialization support has been implemented for all of the previously serialization models mentioned and unit tests have also been added with sample data.
Figure 1 – Component view of Protobuf serialization of Simple Features
Avro serialization support for simple access features
Avro is another popular serialization framework. It provides rich data structures that are compact and transported in a binary data format. One of the differences between Avro and Thrift (or Protobuf) is that Avro is coupled with a schema, thus it needs to compare the schemas before transport happens. Dynamic schemas and untagged data make Avro more appealing for data serialization use cases. Avro provides two options to serialize data: a static approach with code generation and dynamic approach, which is serializing without code generation. The dynamic approach provides extra flexibility of the generic data handling, but has performance implications. According to the performance analysis done here, it depicts that the penalty is minor, and the benefit is a simplified code base. In this implementation, a static approach will be used and the dynamic approach will be considered in later tasks. I have implemented the serialization handler and deserialization handler for JTS and Avro with support for Point, Linestring and Polygon models. Unit tests have also been added with sample data.
Figure 2 – Component view of Avro serialization of Simple Features
Protobuf serialization support for Raster/Coverage type data
GeoTIFF is an open file format based on the TIFF format. It is used as an interchange format for georeferenced raster data. I have evaluated a few popular java libraries, such as GDAL java library, Apache commons Imaging and Geotools geotiff plugin, which is capable of generating coverage model out of a GeoTIFF file. I chose the Geotools geotif plugin because of its ease of use and better documentation. Each cell in the raster has a row and column number. We need some meta information to transform them into real-world coordinates. This data is stored in eithera Word file, or in the header of the image file itself. Here I have implemented serializing coverage data for both options.
Figure 3 – Component view of Protobuf serialization of Raster/Coverage type data
By using serialized data, grid rows and columns will be transformed into real world coordinates on-the-fly when deserializing. One of the future tasks will be to provide the option of deserializing them based on a preferred Coordinate Reference System.
Tasks for next weeks
As of now I have implemented serialization support for simple features using protobuf and Avro and basic support for serializing raster/coverage type data. My next tasks are to continue on Avro serialization for the remaining simple features, support on the fly coordinate transform based on given CRS and finally integrate all together into a reusable and easily extensible serialization API.