Which Format? - Blog

In ILWIS 3 we had a lot of hassle with imports and exports. ILWIS 3 had its own format and everything had to be converted to this format or exported from ILWIS 3 to be usable in another package. Though there was a ‘use-as’ method (for raster), which basically used the data in its orignal format, it had its limitations. Furthermore, there was always lots of work involved (for the programmers) in getting a format ‘correct’. The documentation of formats is often mediocre, so a lot of guessing about how to interpret a certain format took place. The usage of the GDAL library helped a lot, but the implementation of a new format was still a hassle. And then there were the service/remote data sources, which play an increasingly important role these days. ILWIS 3 wasn’t built for those and supporting them was like trying to teach a turtle to fly.

In Ilwis-objects we have approached this quite differently. We basically say Ilwis4 has no format. Any format is acceptable as long as we have a ‘connector’ to it. A connector is kind of bridge that translates the external representation of data (say a Geo-Tiff file) into ilwis-objects’ internal representation of data. There are also connectors for operations, which are conceptually similar to data connectors (translating external to internal representation), but which will not be covered today.

A connector is always tied to a ‘type’ of data source. ‘Type’ is a loose concept, because sometimes the type is another library. For example, the connector views the GDAL library as a data type – a type the has a certain representation of its data and whose representation has to be translated to the internal data structures. That GDAL itself is another layer over many other physical formats is of no interest to Ilwis-objects. In the same way, we have connectors to PostgGIS, spreadsheets, WFS and of course ILWIS 3. Undoubtedly before the release of Ilwis4, some other connectors will be created.

As far as Ilwis4 is considered, all the data sources supported are ‘native’. The connector of course has the same limitations as the data source it is based on. For example, you can store a raster in a spreadsheet (as long as it is not too big), but a spreadsheet is hardly an ideal medium for that. Some metadata might be lost (e.g. georeference or coordinate system). The GDAL ‘data-source’ doesn’t support writing to all its underlying physical formats, so some ‘exports’ aren’t possible.

Each Ilwis-object always has an input connector and an output connector. These can be different objects. If no output connector has been defined, the input connector is assumed to be the output connector. As such, my remark above doesn’t sound that interesting, until you realize that the result is that it’s very easy to transform different data formats this way. For example, one can read from a WFS connection and write the results to a shape file without any need for ‘exports’.

So what about the ‘internal format’? Where is it? What is it? Well, it exists only in memory. It’s a virtual format and technically, it has a connector. Ilwis4 does everything in its memory. There can be some caching if the memory is too small, but that takes place behind the scenes. This also means that every object you create is, in principle, temporary unless you indicate differently. This speeds up performance and keeps the amount of data that ends up on your disk/database/whatever to a minimum. You only keep/save what you need, the rest disappears. To be honest, I have not been completely truthful here. The temporary objects can be, and sometimes are, streamed to disk for technical reasons. Ilwis4 uses an internal serialization for this. The format is not meant as an interchange format. This might change at anytime if the development requires it.

A special remark has to be made about the ‘container’ formats. Container formats are formats that are containers for other data sources, e.g. a GPX file or an HDF5 file. A GPX file stores (a.o.) multiple layers/representations for the GPS track recorded by a device. In Ilwis4 each and every container format is represented as a catalog (which might contain other catalogs) in which you can find the data sources contained in the ‘container’. As far as Ilwis4 is concerned, a folder on a disk and a directory in an HDF5 file are the same thing (conceptually), i.e. containers for data.

A last important point is that some (external) data sources, depending on the type, can appear multiple times in different views/roles as a data source in a catalog. For example, shape is a vector format and is displayed as such in the catalog. It is also there as a table, because each shape file is accompanied by an attribute table. So a possible alternative view on a shape file is to see it as a table. If we continue with this example, a shape file might also be seen as a coordinate system (if there is a .prj file), since it contains all the information for a coordinate system. So a third alternative is to view it as a coordinate system. As you can see, it is possible to couple a shape file to another vector file in the role of coordinate system.

Here is a small example; a snippet from a catalog view.

If you ignore the .mpa file for the moment, you see a shape file in the role of vector data, tabular data and coordinate system.

Reading and writing to different formats will always be somewhat problematic. There is only so much you can do within the limitations of all formats. Still I feel we have found a good balance between how to use many different formats and the expressiveness of Ilwis-objects itself. We will see if it is sufficient.

Comments

Arseniy says

Thursday October 29th, 2015 at 11:27 AM

Thank you for interesting news! In ilwis 3 I like that operations with data are very logical. In ilwis we have to control data import and transformation processes. Its difficult for beginners but we have less unexpected errors eg with edge pixels. Moreover, begginers have to learn projections, coordinate systems, resampling methods, etc. though its good. I have no problems with space on disk and time for analysis because of using ilwis scripts. Nevertheless, there are some problems with 16 bit images and some oth. I think that using of internal formats is good idea because it will likely to be simplier to write scripts and addons for ilwis, and unexpected errors will arrive on stage of importing. In ilwis 4 can we use old scripts?) Can user integrate R scripts to ilwis 4?

Martin Schouwenburg says

Thursday October 29th, 2015 at 01:41 PM

The intention at the moment is that we can read the old ilwis scripts though probably we will not able to write them to the old format. Ilwis4 can do more than Ilwis3 so a write would be quite hard to make.
We will maybe have some integration with R ( it is on the list at least) but not in the form of R scripts. Apart from the data connectors like described there are also operation connectors. They do the same thing as data connectors; translating an external representation, in this case of an operation, to an internal one (and vice versa). In case of R this is probably a good thing as R scripts tend to be difficult to read for people who are not familiar with R (in my experience at least).

Chris M says

Thursday October 29th, 2015 at 11:17 PM

Indeed interesting news on ever increasing geospatial data supplies and formats related. One question: is the Ilwis4 “con” approach similar to the R and also Matlab “open/close connection principle”, which I know a bit (and is indeed better to read-in all kinds of unknown data sources/types and subsequently, peek and process what you need out of the foreign data set? cm

Martin Schouwenburg says

Friday October 30th, 2015 at 08:08 AM

Not sure if I 100% understand what you mean. We only take what we need from a from a data source or processing source. For example when we need a small window on a raster data set, we only read that section and not more. When we need only 5 methods from a processing library, we expose 5 methods and igore the rest. The integration goes automatic as the plug-ins that do the translation stuff have enough inteligence to figure out for them selves how to connect to things and when to discard those connections. The only thing that might be needed (depending of how the implementer of the plug-in wants to do it) is that in the preferences for the plug-in you have to tell were the foreign library can be found. But even this can be automated if you do it correctly.

Comments

Leave a Reply Cancel reply