A Python package to query and analyze enviroCar’s trajectory data
enviroCar is an open Citizen Science platform to collect, share and analyze floating car data for traffic quality and environmental monitoring. Its main components are the server back-end and the Android App. See envirocar.org for more details. This article focuses on the Python package envirocar-py, which enables users to query enviroCar’s open data and perform detailed analysis in Python.
In general, eXtended Floating Car Data (XFCD) provide spatio-temporal profiles of vehicles equipped with GPS receivers, as well as various sensors to measure car and engine related parameters. The enviroCar Python package allows users to query and download XFCD data via the enviroCar Rest-API. GeoPandas’ flat GeoDataFrame stores the data received. Each track is represented as a chain of measurement data points and some metadata, such as measurement time and coordinates. The data frame provides not only sensor measurements, but also estimated CO2 emissions based on GPS recordings. For this estimation, we calculated the energy demand based on the vehicle movement, which we extracted from GPS recordings. Table 1 shows all GeoDataFrame variables requested via envirocar-py. A Jupyter Notebook on variable description in the envirocar-py package provides further details on the data and variables. Note that not all variables are available for all tracks, e.g. “Energy Consumption.value” is only stored for electric vehicles.
|column name||data type|
|CO2 Emission (GPS-based).value||float64|
|O2 Lambda Voltage ER.value||float64|
|O2 Lambda Voltage.value||float64|
|Short-Term Fuel Trim 1.value||float64|
|Long-Term Fuel Trim 1.value||float64|
|O2 Lambda Current ER.value||float64|
|O2 Lambda Current.value||float64|
The package requires a Python version >= 3.6. The package is available on the PyPI package manager and can be installed with the following command:
pip install envirocar-py --upgrade
To install envirocar-py in develop mode, use the following:
python setup.py develop
Example snippet of enviroCar API request
To request enviroCar data from the API, import the envirocar module, as well as the modules pandas and GeoPandas, to your Python Script. You also need the area of interest coordinates, which you can get e.g. here OpenStreetMap.
After installing envirocar, pandas and geopandas, request enviroCar Data by adding the following code snippet to your script:
# Import envirocar classes, pandas and geopandas to your Python script import pandas as pd import geopandas as gpd from envirocar import TrackAPI, DownloadClient, BboxSelector, ECConfig
# Set configuration parameters by initializing the ECConfig class config = ECConfig()
# Initialize an instance of the TrackAPI class which handles the API access track_api = TrackAPI(api_client=DownloadClient(config=config))
# Define a bounding box of the area which you are interested in bbox = BboxSelector([ 7.601165771484375, # min_x 51.94807412325402, # min_y 7.648200988769531, # max_x 51.97261482608728 # max_y ]) # Issue a query by calling the get_tracks method of the TrackAPI class # which takes the bounding box and the number of tracks as arguments track_df = track_api.get_tracks(bbox=bbox, num_results=10) # requesting 10 tracks inside the bbox
Check out the examples folder in the Jupyter notebooks on GitHub for examples and explanations, e.g. how to match recorded tracks to a streetnetwork (‘map matching’).
Example analytics workflows
Users, developers and university students have recorded, shared and analyzed the open data. Several examples of Python scripts that analyse the data with different scopes are available on GitHub (see links to certain analysis packages in the relevant paragraphs below).
Hot Spot Analysis
Hot spot analysis is a statistical tool for identifying clusters of a specific phenomenon in a dataset. Regarding traffic management, phenomenons of interest can be, e.g., traffic density or CO2 emissions.
The research project CITRAM carried out a hot spot analysis of CO2 emissions in the city of Hamm. We presented the results of this analysis, which was based on Getis Ord statistics, in a blog post “Hot Spot Analysis of Floating Car Data”. A study project at the University of Münster applied various hotspot analyses workflows. These included different statistics (Getis Ord , Moran), spatial references for features (points, polygons, rectangular grids), weights (network distance, travel times) and phenomena (speed, CO2 emissions, vehicle stopping time). Access the code on GitHub.
If you are interested in determining traffic safety, try the traffic safety analysis package. This package provides multiple tools to determine certain indicators for traffic safety as, e.g. lucky escapes, black spots, cold spots and speeding points. In addition, you can integrate OpenStreetMap and weather data to analyze spatio-temporal patterns of accidents. To dive deeper into (statistical) features of accidents, you can create statistical models with neural networks to compute probabilites, categories, trends and patterns. Check out the repository and relevant Jupyter Notebooks on GitHub.
You can also use data from the enviroCar platform to analyze the fuel consumption along the car tracks. We presented an evaluation of a GPS-based fuel consumption model at the EGU 2020. A dedicated Python library provides tools to estimate GPS-based fuel consumption and understand its sensitivity to different model parameters. The library is based on preliminary work done during a study project at the University of Münster.
Exploratory Data Analysis and Preprocessing
If the data have errors, outliers, missing values and noise (e.g. due to measurement distortions), analyses results will likely be of poorer quality and patterns may not be detected (i.e. ‘garbage in, garbage out’). Thus, we recommend that you first view and understand the data and then do some preprocessing. For this, you can use the functionality in the eda_quality repository on GitHub. It provides several tools specifically implemented for enviroCar data to help you gain insight into the data’s structure and information content, e.g. by visualizing tracks and viewing descriptive statistics. In addition, there are tools to determine the data quality and to apply some correction, e.g. by detecting outliers, duplicates and implausible values (e.g. negative speed values). The repository also provides some simple tools to prepare the data for machine learning. Check out the relevant Jupyter notebooks for more information.