As I mentioned in a previous post (please read it before continuing), I intended to extend the Open Geospatial Consortium (OGC) Web Processing Service (WPS) enabling it to process continuous data streams, as well as to send intermediate results back to the client. These two capabilities are realized with Full Streaming WPS, a new approach that brings near real-time geoprocessing to WPS. In this post, the foundations of such an approach are given, as well as implementation details and pointers to public resources. It also includes a demo.
Current interoperable ways of processing continuous data streams are addressed by the Sensor Web Enablement (SWE) OGC initiative. Processing in SWE is called Event Processing, which enables filtering and aggregation of data streams via an Event Pattern Markup Language (EML) (Everding et al., 2009).
More sophisticated data stream processing can be performed by triggering WPS processes when a condition is met (Everding and Foerster, 2011). However, depending on the context, sequential WPS triggering can be suboptimal. For example, execution of a WPS process whose input data is a data stream and a big static data set, would cause a performance issue, because the static data set must be sent repeatedly.
It is thus desirable to enable the WPS to handle incoming data streams and load the big data set only once. Full Streaming WPS helps to solve this issue.
Full Streaming WPS
WPS is based on a sequential request-response mechanism. A client sends a request uploading data to the server, the server executes a process with the data and, finally, sends the result back to the client (Foerster et al., 2012). A WPS is an Output Streaming WPS when the last two steps are parallel, and it is a Full Streaming WPS when all the three steps are parallel.
Full Streaming WPS is able to receive input data streams, process them and, meanwhile, send intermediate results back to the client as an output stream.
Introducing a new spatial data format: The Playlist
Transporting and accessing data streams implies keeping track of all data chunks that compose the stream to avoid loss of data. A well-known data format that helps to transport multimedia data chunks is the Playlist. It is a plain text file that contains Uniform Resource Identifiers (URIs) pointing to data chunks and can be updated when new chunks are available.
The original Playlist format is used in the multimedia realm and was defined by Apple in an open specification called HTTP Live Streaming. The reason why it was selected over other Playlist-like formats, such as the Media Presentation Description (MDP) from DASH (Dynamic and Adaptive Streaming over HTTP) is twofold: it is publicly available and simple.
The spatial version of the Playlist is a file with extension txt, in which every line is an informational tag or a URI pointing to a data chunk. Informational tags have a pound sign (“#”) as prefix. There are currently three:
- #SPATIAL-DATA-PLAYLIST Opens the Playlist.
- #EXCEPTION:URI Used for communicating an exception occurred during the process to Playlist readers . The URI points to an exception report, as used by OGC Web services. If the Exception tag appears in a Playlist, it must be followed by the End tag so that the client can assemble already downloaded data and inform the user.
- #PLAYLIST-END Closes the Playlist and indicates Playlist readers to stop.
Finally, the streaming based WPS supports service chaining, since one process’ output Playlist can be in turn another process’ input Playlist.
A new mime type
The Playlist also introduces a new mime type (application/x-ogc-playlist) for identifying it in Web services workflows, according to the Web Processing Service Best Practices Discussion Paper.
The Playlist content’s format must be specified by appending the data chunks’ mime type to the Playlist’s mime type. A plus (“+”) sign must be placed in between. For example, the mime type for a Playlist containing Geography Markup Language (GML) v.18.104.22.168 files could be:
Road traffic monitoring
When monitoring road traffic, it is important to have access to vehicle locations along a road network. GPS devices are usually employed for measuring locations every so often, but it is likely that these locations do not lie on any road because of GPS inaccuracy. Therefore, processing the raw GPS locations is a must for determining what roads the vehicles are on. This is known as map matching.
Think of vehicles sending raw GPS locations every five seconds. Instead of displaying inaccurate locations, a map matching algorithm could be applied, and only then the fixed locations must be displayed. In the following screencast, a basic algorithm for snapping points to lines is used. The algorithm is published as a Full Streaming WPS process, supporting input data streams via an input Playlist, as well as making intermediate results available to the client by means of an output Playlist.
As you can see, the road network (which may be large), is sent only once and used repeatedly for executing the algorithm every time a new chunk of data is available in the input Playlist. This way, the locations the client displays lie on a road and are available, almost immediately, for performing further analysis.
Processing big Web Feature Service (WFS) data
WFS is a standardized storage (and, optionally, editing) service. When the data exposed by the WFS is relatively large, a performance issue can arise making the service unusable in practical terms. That is why the WFS 2.0 introduces a mechanism to page the results, and thus, allows clients to retrieve chunks of data rather than the whole data set at once.
An input Playlist could be used for keeping track of every chunk (i.e., page) by storing its URL, enabling a Full Streaming WPS to process chunks as soon as they are available. This way, processing big WFS data will no longer be cumbersome.
Sensor Observation Service (SOS) and Sensor Event Service (SES)
Sensor data are the most suitable input for Full Streaming WPS. An SES could notify the availability of new data and, at the same time, trigger a Playlist update by appending the URL for accessing the data via SOS. The road traffic monitoring example could be also implemented in this way.
How to implement a Streaming based WPS process
If you would like to implement your own Streaming based WPS process, please follow a couple of tutorials: Implementing an Output Streaming WPS and Implementing a Full Streaming WPS. All available resources are listed in the Resources section of this post.
In addition to the screencast, a demo environment has been set up for you to see Full Streaming WPS and Output Streaming WPS in action. The server-side extends the 52°North WPS framework, whereas the client-side extends the QGIS WPS client plugin.
Install the version of the QGIS WPS client provided by this post. So far, the official version of the plugin does not include the streaming capabilities, although a patch was already sent to the author.
Once you install the QGIS WPS client, add the demo server (see Resources) to the Server Connections.
Testing the Output Streaming WPS
Run the OutputStreamingSimplifyDouglasPeucker process. As input for the parameter FEATURES you can choose any line layer. Additionally, you must set the tolerance (depending on the reference system of your data) and the number of chunks you would like to get from the process, e.g., 50 (although it depends on the number of features your layer has). Watch this screencast if you need further guidance.
Testing the Full Streaming WPS
Download the sample road network provided by this post and load it into QGIS (make sure you set the reference system EPSG:31467 for the QGIS project). Run the FullStreamingSnapPointsToLines process choosing the road network for the parameter Lines. Enter 100 (meters) for MaxDistance and 20000 (milliseconds) for MaxTimeIdle. The parameter MaxTimeIdle defines how long the server will wait for input Playlist updates. If that time is exceeded, the server will finish the process and notify the client via an exception. Finally, points are expected to be in a Playlist. You can try out two types of Playlist:
- Static Playlist: A Playlist that is already closed, so no more data will be appended to it. Copy this Playlist URL (http://downloads.tuxfamily.org/tuxgis/geoblogs/streaming_based_wps/sample_playlist.txt) to the text field corresponding to the parameter Points. Don’t expect results in a particular order since the server will download the chunks from the Playlist asynchronously, process them parallelly, and make them available via the output Playlist. Moreover, the client will display results as soon as it can download them.
- Dynamic Playlist: A Playlist that is open and will be updated every 5 seconds for about 2 minutes. You can get the URL of a dynamic Playlist from http://geotux.pythonanywhere.com/generate_playlist, just copy the URL that is returned to the text field corresponding to the parameter Points.
In a couple of posts I’ve presented an implementation of Streaming-based WPS, which has two modalities: Output Streaming WPS and Full Streaming WPS. The former aims to reduce latency of basic WPS processes, whereas the latter enables WPS to process continuous data streams while sending intermediate results back to the client.
The implementation is based on the 52°North WPS framework on the server-side, and on the Quantum GIS WPS client plugin on the client-side. A number of resources have been published, including tutorials, screencasts, source code, and a demo.
If you have comments, suggestions or bug reports, please leave a comment or send an email to the 52°North Geoprocessing community’s mailing-list.
- T. Everding, J. Echterhoff, and S. Jirka. (2009). Event Processing in Sensor Webs. Presented at the Proceedings of Geoinformatik 2009, Osnabrueck, Germany, March 2009. ifgiPrints, Institute for Geoinformatics (pp. 11-19). Muenster, Germany.
- T. Everding and T. Foerster. (2011). An Event Driven Architecture for Decision Support. In A. Schwering, E. Pebesma, and K. Behncke (Eds.), Geochange (pp. 7-13). Presented at the Geoinformatik 2011, Muenster, Germany: AKA Verlag. Retrieved from http://ifgi.uni-muenster.de/~tfoer_01/articles/Geoinformatik2011_EverdingFoerster.pdf
- T. Foerster, B. Baranski, and H. Borsutzky. (2012). Live Geoinformation with Standardized Geoprocessing Services. In J. Gensel, D. Josselin, and D. Vandenbroucke (Eds.), Bridging the Geographic Information Sciences (pp. 99–118). Berlin, Heidelberg: Springer Berlin Heidelberg. Retrieved from http://www.springerlink.com/index/10.1007/978-3-642-29063-3_6
- Source code:
- Server-side (extending the 52°North WPS Framework) https://svn.52north.org/svn/geoprocessing/main/WPS/branches/StreamingBasedWPS/
- Client-side (extending the QGIS WPS client) http://downloads.tuxfamily.org/tuxgis/geoblogs/streaming_based_wps/wps.zip (Tested on QGIS 1.7.3 and 1.8 on GNU/Linux)
- Demo environment:
- Demo server http://geoprocessing.demo.52north.org:8081/streamingBasedWPS/WebProcessingService
- Playlist provider http://geotux.pythonanywhere.com/generate_playlist
- Road network data (source: OpenStreetMap) http://downloads.tuxfamily.org/tuxgis/geoblogs/streaming_based_wps/road_network.zip