Job optimization with ArcGis- Geodatabases Introduction – 1-Indexation

26 February 201926 February 2019 Atilio Francois No Comments

Modelling a geodatabase helps to produce a comprehensive scheme while reducing the maintenance work. This is a necessary step to ensure a good design, and contributes directly to the optimization of the geodatabase. The problem is that, often, we neglect this stage. Even if we go through this step, the geodatabase will grow during its life cycle and, therefore, its performance will tend to decrease.
The more features you have in a geodatabase, the longer the geodatabase uses to execute a query. That’s why, in this series of articles, we will describe the tools to help you to adjust the geodatabase so as to work optimally. Some tools will only be used when creating the geodatabase, while others will be run frequently.
These articles will cover three topics. Firstly, we’ll learn about indexing feature classes and how this can help to speed up the query. Secondly, we will discuss the compression concept, where we will learn how this can reduce the geodatabase size. Finally, we’ll see how to compact geodatabases and help speed up queries for a frequently edited geodatabase.

Indexing is a feature that speeds up the data query, based on an attribute or collection of attributes, in a database table.
Compression is a process by which duplicated data in geodatabase datasets is simplified in order to reduce its size.
Compacting is a process by which a geodatabase, often edited, is cleaned of unused and orphaned items.

How to index a geodatabase

Indexing is the basic principle for optimizing databases. It is a very powerful and effective tool that can help speed up the records search. Without indexing, a table is scanned entirely to retrieve a particular record. So, if we have a dataset with n records, the worst case scenario is that the record we are trying to locate is the last record of that table, and so we need to search through the n records in order to achieve our goal. Imagine a feature class with one million entities, so if the time it takes to read each entity is one millisecond, it means we’ll have to wait 17 minutes to scan all the data.
Of course, the response time depends on the place of the recording you are looking for. If it is located at the beginning of the table, it will take much less time to reach it.
Indexing is roughly similar to how you organize your files in alphabetical order. When looking for a document, if it starts with the letter D, you are only looking for documents starting with D. To enable indexing, the geodatabase creates another table for the attribute to index.
Indexing works similarly with almost any type of field: text, numbers, dates, and even with spatial data type like entities geometry. The indexes created on the “shape” columns are called spatial indexes, which follow the same concept as the indices on the attributes.
Both reduce the query search domain to achieve greater performance.

How to index an attribute

Let’s to suppose you have started to do attribute queries on your geodatabase, which optimizations can you perform to get a better performance ?. We will start by adding an attribute index. The question is, on which attribute should we create an index? Usually, this question must arise during the geodatabase modelling step, where the indexes are added in the entity-relationship diagram. Indexes are created on attributes that are frequently queried. If you skipped this step, you can create them during the routine operation of your database. In the Cadastre geodatabase, the Owner Name field is a good candidate for creating an index if you often search by Owner Name. To create an attribute index, follow the following steps:
1. Open ArcCatalog.
2. Search for the geodatabase in the catalogue tree window.
3. Right-click on your owner’s table and select Properties …
4. In the Feature Class Properties dialog box, select the Index tab.
5. The Attributes Index window shows the existing indexes for this feature class.

As you can see, there is an index of FDO_OBJECTID (the primary key), which is a very important index and cannot be removed. The geodatabase uses this index to uniquely identify each entity. When you click FDO_OBJECTID, in the Fields section, you can see the field for which this index is created, as shown in the screenshot above.
6. Click Add … to add a new attribute index.
7. In the Add an Index Attributes dialog box, type a name of your choice to identify this new index.
8. In the Available Fields list, select the field that interests you, in this example PRNAME, and click the right arrow to add it to the list,as you can see in the screenshot below:

9. Click OK, the new index appears on the list of indexes in the table.
10. Click Apply and exit the window by clicking OK

Now, when you query the Name_Name attribute, ArcGis will use this index to speed up the query.

The table used in this example has 23053 rows.
We built a small processing model with Model Builder, including a query of the type name_lastname such as ‘JEAN DUPONT *’, corresponding to the last owner of the table.
The model took 1.27 seconds to run without an additional index.
Once the index was added to the field name_lastname, the same model was executed in 0.76 seconds, a gain of 40% response time.

How to add a spatial index

When you create a feature class, a spatial index is automatically created and optimized for that feature class. At any time, you can delete and recreate the spatial index by performing the following steps:
1. Open ArcCatalog and navigate to the geodatabase.
2. Right-click on the relevant feature class and select Properties
3. Click the Index tab.
4. In the spatial index section, click Delete to delete the spatial index.
5. Click Create if you want to create the spatial index again.
6. Close ArcCatalog.
Removing and recreating the spatial index is a recommended exercise on an often modified geodatabase, as it ensures the consistency of spatial queries.

Optimization of indexing Although indexing is an excellent tool for optimization, it can be counterproductive if it is implemented incorrectly. When you index a column, the geodatabase creates an additional hidden structure that must be managed and updated frequently. The more indexes you have, the more work is needed when you update the geodatabase. Additional indexes can slow down update operations such as INSERT, UPDATE, and DELETE because the geodatabase needs to regenerate the corresponding indexes.
Avoid creating indexes on columns with very few distinct values, since they often will not improve your performance. It’s advisable to create indexes on columns with unique or almost unique values. You can calculate the performance improvement percentage by using the following formula:

In the previous formula, a is the attribute of being indexed and ind (a) is the efficiency index of indexing; 100% being the maximum and 0% the lowest. d (a) is the number of distinct values in attribute column a and n (a) is the number of total values of a . Note that if a is a primary key, ind (a) is 100%.
This also explains why CATEGORY fields have a low score on indexing performance with this formula.

Si cet article vous a intéressé et que vous pensez qu'il pourrait bénéficier à d'autres personnes, n'hésitez pas à le partager sur vos réseaux sociaux en utilisant les boutons ci-dessous. Votre partage est apprécié !