Since the release of version 10.3 a new tool that allows the search for duplicates in a layer: look for duplicates has been made available. The tool Remove the identical element allows deleting the duplicates of a layer. Let’s see an example. We have a layer of objects at a certain scale (the stars):
We provide an update from a map with a more accurate scale (the squares).
Therefore we end up with a resulting layer bearing objects that were absent in our first layer, and others in duplicate.
If each object had a unique identifier, we could easily detect duplicates. On the other hand, in our example, the number of each item is an internal identifier of each table.
First of all it is necessary to define what a duplicate is. In our case it is an object of the same category (attribute) located at the same place (geometry). However in terms of category, the coding of the two tables is the same, therefore there is no problem. On the other hand as for the location, the two source layers are created at two different scales, it is almost impossible that the values of X and Y be exactly the same. Therefore we need to define an acceptable margin of difference (tolerance) to say that it is the same location.
The definition of this value is not always easy.
You can test your outcome layer at those places where you suspect duplicates, and with the tool “measure” determine an empirical tolerance.
You can start with the digitization scale, whereas the accuracy of digitization is of the order of 0.1 mm. Take the layer with the smaller scale (less accurate), for example, a scale of 1:50 000. For this scale, the possible error is of the order of 5m. Therefore 5m will be the minimum value of your tolerance.
The look for duplicates tool
This tool indicates all the records of a table or class entities that have values identical in a list of fields and generates a table that lists these identical records. If the Shape field is selected, the geometries locations of the entity are compared.
The tool considers that the records are identical if the values of selected input fields are identical. If several fields are adequate, the records are matched according to the values of the first field, then to the values of the second field, and so on.
The parameters XY Tolerance and Tolerance Z are valid only if shape is selected as one of the input fields.
If the Shape field is selected and the input entities exhibit M or Z values, these values are also used to determine the identical entities.
You will find this tool in Toolbox-> Data Management -> General -> Look for Duplicates.
Define the entry layer.
Define the output table. If you check output only duplicated records the output table will only contain duplicates with two columns :
- IN_FID which contains the record identifier in the input table.
- FEAT_SEQ which contains an order number of the All the first records duplicated will have one FEAT_SEQ = 1, the second FEAT_SEQ = 2, etc …
Here is the result of the tool for our example:
Records 0 and 21 are a duplicate (Feat_seq = 1), the records 1 and 20 are another (Feat_seq = 2), etc …
If we zoom in on the first duplicate
We see that both points are separated by 0.22m (less than 5mk, therefore within the range of tolerance) and that they are the same category.
If you do not check the box in order to have, only, the outcome duplicates, you will have all the output records with FEAT_SEQ which appear only once if they are not duplicates , and which appears many times , if they are duplicates.
The Remove the identical element tool
It’s the same as the previous one except that instead of producing the duplications table, it erases for each duplicate found, all the identical records, except one.
As any tool that deletes data automatically, it has to be used carefully. Make a backup of your layer. You just have to remember, for example, to change the default tolerance units, degrees, to meters for all the tolerance entities so that all the entities belonging to the same category are considered as duplicates … and erased!!
And, obviously, use firstly the duplicate search tool and verify that the result matches the duplicates.