So as you may or may not have been following my blog posts or status reports recently, I’m currently a Google Summer of Code 2012 student for 52°North. My project primarily deals with the problem of providing bindings for OpenStreetMap data such that they may be used within the Web Processing Service (WPS). Yet here I am talking about xmlcodegen, a component that is primarily part of Geotools… why?
It turns out that the WPS relies a lot on Geotools to for handling various map data types. This makes sense, as Geotools itself is intended to provide an easy way to interact with various geospatial data formats. By making use of Geotools, WPS can tap into its features and capabilities, and thus support interactions and manipulate all these data formats without necessarily having to reinvent the wheel for each file format. The same goes for any other projects or packages that make use of Geotools.
As such, when the task came to implement bindings in WPS for OpenStreetMap data, I naturally looked into the capabilities of Geotools to see if such functionality was already provided. It turned out that it was not, however mechanisms existed to easily implement such functionality; this was exacerbated by the fact that OpenStreetMap data was primarily in XML format.
This blog post is mostly dedicated to my experiences in generating bindings for Geotools from XML-based file formats. It is meant to supplement Geotools’ articles in terms of some of the problems I encountered, and the solutions I found for them. I also will give a brief overview as to the progress of the project thus far.
Disclaimer: I don’t in any way claim to know all there is to Geotools’ xmlcodegen process, nor in any way act as the authoritative answer for solutions to the issues I address.
What is XML? Why is knowing this important in the context of OpenStreetMap?
XML is a markup language. In the context of OpenStreetMap data, it is primarily used as means to transport and convey map data in a meaningful format to both humans and machines. As such, it tends to be considerably more verbose than other file formats (consider a 303GB uncompressed OSM XML planet file, compared to its 21GB OSM PBF relative), however this verbosity is mitigated by the fact that several mature libraries and tools already exist to handle this file format. Consider for example, all of the ways in Java to interact with XML: