GIS tools for validating geometries (1)

Management and treatment algorithms for GIS data are developed with the assumption that the geometry of the entities meets certain specifications. When the data processing algorithms deal with data that does not respect these specifications, the software can simply crash or, worse, the operation can succeed with no apparent problem but the result is wrong. The subject is complicated since there is almost no documentation as for what each software does. And if you think that a control “Repair geometries” will remediate this problem, you are far from reality …
At the origin of the wrong geometries; we find, basically, the ESRI shapefiles. This quite old format (from the eighties) was not designed to incorporate the topology constraints of a GIS. For example, two contiguous polygons can be superimposed without problem , their boundaries can be double, there may be empty spaces between the two limits , …
Unfortunately , it’s become a standard format for data exchange between different GIS software , and even if all software publishers as well as the OpenSource projects strive to propose alternative, powerful and much safer management formats (ESRI geodatabase, PostGIS, Spatialite …) the users, most often, opt for the ease of the shapefile solution.
But for those who opt for these current database formats, the problem is, only, solved for the data created directly in these formats. When shapefiles (shapefile) are loaded into ESRI geodatabases (personal or files), in a PostGis database or Spatiality, etc., geometries are copied as they are, with all the existing geometry problems. The same precaution and care must be taken when using other formats where these data is imported.

The indispensable work includes two steps, conveyed by two GIS tools:

  • analysing the geometry to detect abnormalities, usually under the form of a tool “Check the geometries”
  • correcting the detected defects, generally under the form of a tool “Repair geometries”

The tool Check Geometry will generate a report of all entities with geometry problems in the geographical layers provided. To solve these problems, the geometry repair tool will, automatically, perform the correction. This seems magical, but it’s a lot more complicated…

Although there are different definitions for a polygon, most current GIS software, use the definition stated by the Open Geospatial Consortium (OGC) and the Organization International Standards (ISO), and provide validation functions to ensure compliance with the polygons definition. There are small variations between the different implementations, but   the validation of a two-dimensional polygon can be considered as a problem solved at the theoretical level. To have a common definition, as well as validation tools GIS users should be provided with the possibility to exchange data sets and use spatial analysis operations with these data (a valid input is a prerequisite for most operations).
However, if a polygon does not comply with the definition, it has to be fixed. Most validation tools give the user a list of errors and locations where they are located, but the user must fix these shortcomings manually. This is a very tedious and time consuming task.

Hence the obvious temptations to use the auto repair geometries tools.

In this series of articles, we will discuss how the main GIS software behaves to detect and correct the geometries. Let’s say right away that if you want one a 100% efficient, you will be disappointed. But it is worth knowing what the possible shortcomings are than to pretend there are none.

The first problem faced to tackle this topic is the almost total lack of software documentation. Therefore we will consider a layer of polygons containing abnormalities and process them with the different softwares .

We will use a layer of Italian municipalities provided by ISTAT, the Italian Statistical Institute. It is this layer that is used in the page on validating Spatialite geometries: SQL functions based on liblwgeom support in version 4.0.0 .

You can download this layer with the following link:

Geometry validation with ArcGis Let’s state the following setting:
– we use ArcGIS 10.3 in English
– the order used is Toolbox -> Data Management Tools -> Features -> Check Geometry
– the layer to be tested is the layer com2011.shp that we have downloaded  

Once the order is executed , the table with the list of invalid records is loaded in ArcMap

You will find that the order has not found any invalid geometry.

If we search in ArcGis help, the only description of the work done by the command Check geometries is in the page http://desktop.arcgis.com/en/desktop/latest/tools/data-management-toolbox/check-geometry.htm

It becomes clear that the detected mistakes are:

  • Short segment: some segments are shorter than the size authorized by the units of the reference spatial system associated with the geometry.
  • Null geometry: the entity does not have any geometry or anything in the SHAPE field.
  • Incorrect ring ordering: the polygon is simple from a topological point of view but his loops may not be oriented correctly (external loops: clockwise, inner loops: counter clockwise).
  • Invalid segment orientation: individual segments are not consistently oriented. The arrival point of the segment i must correspond to the starting point of the segment i + 1.
  • Self-intersections: a polygon should not be self-intersecting.
  • Unclosed rings: the end point of the last segment in a loop must match the starting point of the first segment.
  • Empty parts: the geometry includes several parts and one of them is empty (has any geometry).
  • Duplicate vertex: the geometry has two or more peaks with identical coordinates.
  • Mismatched attributes: the Z or M coordinate at the end of a line segment does not match the concurrent Z or M coordinate of the next segment.
  • Discontinuous parts: one of the parts of the geometry is composed of disconnected or discontinuous parts.
  • Empty Z values: the geometry presents one or many peaks including an empty Z value (NaN, for example).
  • Bad envelope: the envelope does not correspond to the extent of the coordinates of the geometry.
  • Bad dataset extent: the extent of the data set does not contain all entities.

Satisfied with our test, we will load this shape in a Spatialite database. Note that this would exactly the same if using a PostGis database, the SQL validation tools being exactly the same . In the next article, we will use Spatialite because you do not need to install PostGres or anything special, if you have for ArcGIS or QGis.

Si cet article vous a intéressé et que vous pensez qu'il pourrait bénéficier à d'autres personnes, n'hésitez pas à le partager sur vos réseaux sociaux en utilisant les boutons ci-dessous. Votre partage est apprécié !

Leave a Reply

Your email address will not be published. Required fields are marked *

Are you human? Please solve:Captcha