A closer look at the flexibility and scalability of a GeoNode deployment using Kubernetes and related Cloud concepts.
Mid-2020, we started a project with Fraym – a data science company working with manifold types of datasets to execute projects in countries that are undergoing substantial societal change places around the world where data has been traditionally hard to access. Spatial data, ranging from base data such as administrative boundaries to Earth Observation and satellite data, play an important role. The majority of the datasets will be reused in other project contexts, thus a solution for the management and discovery of spatial and non-spatial datasets is inevitable for the effective execution of data analysis processes. In close cooperation with the experts at Fraym, we developed a data platform based on the GeoNode software stack. The challenges here were a set of requirements:
- Flexibility: supporting the different dataset types such as raster, vector and tabular data
- Scalability: accommodating the large amount and variety of data (multiple terabytes, tens of thousands of more than 100k individual datasets)
- Availability: taking uptime into account (e.g. on-demand bootstrapping vs 24/7 operations)
- Processability: promoting processing functionality close to the data.
In this blogpost, we take a closer look at the first two aspects: flexibility and scalability. We look into the details of our deployment concept using Kubernetes on Amazon Web Services (AWS) and other related cloud concepts.