Job optimization with ArcGis- Geodatabases Introduction – 1-Indexation

26 February 201926 February 2019 Atilio Francois No Comments

Modelling a geodatabase helps to produce a comprehensive
scheme while reducing the maintenance work. This
is a necessary step to ensure a good design, and contributes directly to the
optimization of the geodatabase. The problem is that, often, we
neglect this stage. Even if we go through this step,
the geodatabase will grow during its life cycle and, therefore, its performance
will tend to decrease.
The
more features you have in a geodatabase, the longer the geodatabase uses to
execute a query. That’s why, in this series of
articles, we will describe the tools to help you to adjust the geodatabase so
as to work optimally. Some tools will only be used
when creating the geodatabase, while others will be run frequently.

These
articles will cover three topics. Firstly, we’ll learn about
indexing feature classes and how this can help to speed up the query.
Secondly, we will discuss the compression concept, where we
will learn how this can reduce the geodatabase size. Finally,
we’ll see how to compact geodatabases and help speed up queries for a
frequently edited geodatabase.

Indexing is a feature that speeds up the data query,
based on an attribute or collection of attributes, in a database table.

Compression is a process by which duplicated data in
geodatabase datasets is simplified in order to reduce its size.
Compacting is a process by which a geodatabase,
often edited, is cleaned of unused and orphaned items.

How to index a geodatabase

Indexing is the basic principle for optimizing databases.
It is a very powerful and effective tool that can help speed
up the records search. Without indexing, a table is
scanned entirely to retrieve a particular record. So,
if we have a dataset with n records,
the worst case scenario is that the record we are trying to locate is the last
record of that table, and so we need to search through the n records in order
to achieve our goal. Imagine a feature class with one
million entities, so if the time it takes to read each entity is one
millisecond, it means we’ll have to wait 17 minutes to scan all the data.

Of
course, the response time depends on the place of the recording you are looking
for. If it is located at the beginning of the table, it will take much
less time to reach it.
Indexing
is roughly similar to how you organize your files in alphabetical order.
When looking for a document, if it starts with the letter D,
you are only looking for documents starting with D. To enable indexing, the geodatabase
creates another table for the attribute to index.
Indexing
works similarly with almost any type of field: text, numbers, dates, and even
with spatial data type like entities geometry. The
indexes created on the “shape” columns are called spatial indexes,
which follow the same concept as the indices on the attributes.
Both
reduce the query search domain to achieve greater performance.

How to index an attribute

Let’s to suppose you have started to do attribute queries on
your geodatabase, which optimizations can you perform to get a better
performance ?. We will start by adding an
attribute index. The question is, on which attribute should we create an index?
Usually, this question must arise during the geodatabase modelling
step, where the indexes are added in the entity-relationship diagram.
Indexes are created on attributes that are frequently queried.
If you skipped this step, you can create them during the
routine operation of your database. In
the Cadastre geodatabase, the Owner Name field is a good candidate for creating
an index if you often search by Owner Name. To
create an attribute index, follow the following steps:
1. Open ArcCatalog.
2. Search for the geodatabase in the catalogue tree
window.
3. Right-click on your owner’s table and select
Properties …
4. In the Feature Class Properties dialog box, select
the Index tab.
5. The Attributes Index window shows the existing
indexes for this feature class.

As
you can see, there is an index of FDO_OBJECTID (the primary key), which is a
very important index and cannot be removed. The geodatabase uses this
index to uniquely identify each entity. When you click FDO_OBJECTID,
in the Fields section, you can see the field for which this index is created,
as shown in the screenshot above.
6. Click Add … to add a new attribute index.
7. In the Add an Index Attributes dialog box, type a
name of your choice to identify this new index.
8. In the Available Fields list, select the field that
interests you, in this example PRNAME, and click the right arrow to add it to
the list,as you can see in the screenshot below:

9. Click OK, the new index appears on the list of
indexes in the table.
10. Click Apply and exit the window by clicking OK

Now, when you query the Name_Name attribute, ArcGis will use
this index to speed up the query.

The table used in this example has 23053 rows.
We
built a small processing model with Model Builder, including a query of the
type name_lastname such as ‘JEAN DUPONT *’, corresponding to the last owner of
the table.
The
model took 1.27 seconds to run without an additional index.
Once
the index was added to the field name_lastname, the same model was executed in
0.76 seconds, a gain of 40% response time.

How to add a spatial index

When you create a feature class, a spatial index is
automatically created and optimized for that feature class. At
any time, you can delete and recreate the spatial index by performing the
following steps:
1.
Open ArcCatalog and navigate to the geodatabase.
2.
Right-click on the relevant feature class and select Properties
3.
Click the Index tab.
4.
In the spatial index section, click Delete to delete the spatial index.

5.
Click Create if you want to create the spatial index again.
6.
Close ArcCatalog.
Removing
and recreating the spatial index is a recommended exercise on an often modified
geodatabase, as it ensures the consistency of spatial queries.

Optimization of indexing

Although indexing is an
excellent tool for optimization, it can be counterproductive if it is
implemented incorrectly. When
you index a column, the geodatabase creates an additional hidden structure that
must be managed and updated frequently. The more indexes you have,
the more work is needed when you update the geodatabase. Additional
indexes can slow down update operations such as INSERT, UPDATE, and DELETE
because the geodatabase needs to regenerate the corresponding indexes.

Avoid
creating indexes on columns with very few distinct values, since they often
will not improve your performance. It’s advisable to create
indexes on columns with unique or almost unique values. You
can calculate the performance improvement percentage by using the following
formula:

In the previous formula, a is the attribute
of being indexed and ind (a) is the efficiency index of indexing;
100% being the maximum and 0% the lowest. d
(a) is the number of distinct values in attribute column a and n
(a) is the number of total values of a . Note
that if a is a primary key, ind (a) is 100%.
This
also explains why CATEGORY fields have a low score on indexing performance with
this formula.

Si cet article vous a intéressé et que vous pensez qu'il pourrait bénéficier à d'autres personnes, n'hésitez pas à le partager sur vos réseaux sociaux en utilisant les boutons ci-dessous. Votre partage est apprécié !