Maintainers of 52°North web services currently have no information about the usage of their web services. User tracking is an important tool to understand user interest, requirements, and to ensure web services are working properly. But for OGC services, mere web usage statistics are not sufficient. They lack the thematic information. An administrator of a 52°North SOS does not know what the “most common” requested procedures are and when people are requesting data for which area and time frame. Are users mostly interested in the last week, or do they request historical data as well? Which sensor stations rarely get any queries?
The same is the case for the 52°North WPS. How long do execute requests of a specific WPS process take on average? What is the distribution of a numeric input parameter, such as the size of a buffer operation? How large is the biggest shapefile created within a routing algorithm within the last month? Which one of the many different vector output formats offered by a certain process is used most?
Other questions are relevant for both services, e.g. how large is the average response document? What hours of day show the highest traffic of GetCapabilities documents? What coarse global regions do users of a service come from?
To answer these questions, this Google Summor of Code project will develop a set-up using the ELK stack (though without the “L” – Logstash) to analyze SOS and WPS requests and responses and capture “usage statistics”, and to integrate these usage statistics somehow into the administrative backends of the services. Elasticsearch (the “E” in ELK) is an open source search engine built on top of Apache Lucene™, a full-text search engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary. Kibana (the “K”) is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices. You can easily perform advanced data analysis and visualize your data in a variety of charts, tables, and maps.
Currently some underlying services from the SOS project are being extracted to another project called Iceland. It will serve a common base structure for further projects. You can view the source code on Github. The deadline of the new Iceland project build is still undetermined, but I will keep an eye on the further developments and probably in the second phase I could integrate my solution into the Iceland repository.
Both integrations will use the Elasticsearch Java API to index the data into the database. It provides an easy access to the underlying database.
This project needs to seamlessly integrate with the two large software projects without affecting the functionality and performance:
- Sensor Observation Service – utilize the existing SosEventBus mechanism to collect statistics. My ElasticSearch listener should subscribe to the RequestEvent events (it contains the requested service, i.e. GetObservationRequest), which are emitted from the AbstractRequestOperator#receiveRequest method. For many statistic metrics, a ResponseEvent would be necessary (i.e. size of the generated document, process time), but is currently non-existent. It would also be a great enhancement for future usage.
- Web Processing Service – finding entry points for easy statistics collection in the servlet classes. The main hook-point could be in the WebProcessingService and the RequestHandler#handle classes.
You will find more information on the official wiki page of this project and in the upcoming blogposts about the technical details and use-cases.
About me
I’m a 25 years old Hungarian computer engineer and this year I finish my master degree at the Technical University in Budapest. I have also studied at the Karlsruhe Institute of Technology (K.I.T) in Karlsruhe, Germany and at the Universidade do Porto in Porto, Portugal.
Leave a Reply