OpenStreetMap (OSM) is a community-driven project which aims to provide a free, editable map of the entire world to users. This is in stark contrast to other mapping services which tend to charge for API usage. Recently, major vendors such as Wikimedia, Apple, and Foursquare have started using OSM to power the mapping service in their products, which serves as testimony to the growing completeness and popularity of the project. The project itself is licensed under Creative Commons Attribution-ShareAlike 2.0.
As such, OSM has the potential to be a rich and useful datastore for 52°North’s Web Process (WPS), which aims to provide standardization in geo-processing on the web. In particular, we would like to provide on-the-fly transformation of OSM data to one or more of other popular GIS formats, such as Shapefiles or GML. This is the motivation for the Google Summer of Code 2012 Project: On-demand transformation of OSM Data into common GIS formats.
The basic architecture of WPS looks like this:
This workflow shows the three generalized yet discrete components of WPS: the input mechanisms or Parsers (green); the internal processing facilities or Algorithms (orange); and the output mechanisms or Generators (red). We can then categorize each stage of the conversion as a discrete component of WPS: the initial import of OSM data should be handled by one or more Parsers; any manipulation or pruning of the data should be handled by one or more Algorithms, and the export process should be handled by one or more Generators. As it stands, a portion of the task is already complete, namely the export process, as mechanisms exist within WPS to output Shapefile or GML data from an internal data format.
The WPS already makes use of an internal data format IData as a generalized datastore for all the pluggable processes provided. As such, this makes the task significantly easier at hand. We simply now need to convert OSM data into the internal format, and then WPS will take care of the export process. The question now posed is, how do we input the OSM data? What are the formats of OSM data? What kinds of considerations need to be taken for the conversion process to go smoothly?
The first two problems are fairly easy to solve. Preliminary research shows several ways OSM data can be stored and/or accessed, of which we will focus on two methods in particular:
- Direct flat-file-based access, in the form of XML-based *.osm files and custom *.pbf files
- Input into WPS can be done with any of the existing XML libraries, and manual conversion to the internal data format
- Overpass API access, through the OSM3S server
- Input into WPS is done transparently; the internal data format have functions that map directly to the API
The latter problem is the main problem to be solved by this project. Flat-file planet OSM data can be large, on the order of tens to hundreds of gigabytes of data. Iterating through the entire map takes a significant amount of time, so some preprocessing will be involved to convert the map to something that can be quickly accessible and referenced. At the same time, we cannot completely discount the flat-file based access, as some users may want to input their own OSM data for processing by WPS.
We aim to explore a variety of solutions to this problem, in order to provide OSM as a viable data source for WPS.
A proposed final workflow (following the original architecture for WPS) for the project is: