Arcmap: A Python script to geocode addresses with OpenStreetMap

 

At a time when QGis allows a free, unlimited and easy-to-use addresses geocoding (see the article Geocodages of addresses with QGis 2.8 ), the geocoding with ArcGis gets caught up in credits, API keys and others artifices to charge and complicate to obtain  the same result. Fortunately some doors cannot be locked and there are still ways to do it without necessarily passing by the checkout. Here is a small Python script that allows you to geocode an address file with OpenStreetMap from ArcMap.

We owe the basis of this script to Riccardo . To geocode an address file, ie to take a record with the address of a point and convert it to a point in a shape, we need:

1- the geopy module of Python

2-a text file with addresses to geocode

3-the ArcMap toolbox

How to install the geopy module

To download the latest version updated  the day of this article, click here

For an updated version, go to the Python Repository homepage: https://pypi.python.org/packages , search for geopy with the search window, and download the latest geopy version available in source version (not in Python wheel). Unzip the downloaded file and search for a geopy directory (without other characters):

 Copy this directory to the ArcGis Python directory. In theory, it is located on your C drive, directory Python27-> ArcGis10.X-> Lib

 

You have installed the geopy module.

Preparation of the address file

The geocoding will be executed by a web service, in our case the OpenStreetMaps Nominatim service. You need to format your data if you want the web service to understand what you want. Therefore we will minimally configure the addresses.

To get started, if you have not already loaded your addresses into a spreadsheet, do it! You must have at least two columns:

  • in the first you will put the address (number, street, …)
  • in the second you will put the zip code and the city or town

The first row of the table must contain the names field. Avoid any special characters, white, etc.

For our two fields, let’s suppose we name them address and city You can have more fields, but for the time being we will only use these two fields.

 

Now you must save the table as a text file. You have several text formats suggested by the spreadsheets. In the case of Excel, choose text separator tabs. By doing this, you will choose not only the field separator but also the encoding type of the file. The encoding, as its name suggests, are the ascii codes. If you have Notepad, you can see the encoding of the file. In this case you will see encoding: ANSI.

If you do not follow these instructions, you may have a script crash in Arcmap. You will need either to modify the script so that it corresponds to the encoding of your file, or to return to the file and encode it using ANSI (Latin-1).

The address file and ready to be geocoded.

The Python script

The script proposed here does not pretend to be a finished tool. There are many aspects that are not taken into account. If this is enough for you, good! If not, it will be useful as a basis to be modified and adapted for your particular case.

You can download here a toolbox containing the script as well as the sample text file.

 geocoding

 

Remember to configure the toolbox script path to the provided .py file.

Operation

For the script to work, we need the address file (built above), as well as an empty point shape.

We could create the shape in the script but in the current situation, you have to create a point shape, with the WGS84 geographic coordinate system, and a text field named Name where will be stored the address contained in the record of the text file. Allow enough field length (100?) to hold this address.

Except miracle, every time we geocode an address file, there are addresses that are not found by the geocoding service. These addresses are put in an output text file. By launching the script you will have the following configuration window:

<

Shapesor is the points shapefile where you will retrieve geocoded addresses

address file is the text file containing the addresses

address is the name of the address field containing the address (number, street, …)

city is the name of the address field that contains the postal code and the city

file errors is the name of the file where will be put the addresses which will not have been geocoded.

Script content

Here is the whole script:

arcpy import
import geopy
import csv
from geopy.geocoders import Nominatim
geolocator = Nominatim ()

failed_text =
numbers_failed = 0
outShp = arcpy.GetParameterAsText (0)
infc = arcpy.GetParameterAsText (1)
outFC = arcpy.GetParameterAsText (4)
address = arcpy.GetParameterAsText (2)
arcpy.GetParameterAsText place = (3)
cursor = arcpy.InsertCursor (outShp)
with open (outFC, ‘w’) as file:
with open (inFC, ‘rb’) as csvfile:
content = csv.DictReader (csvfile, delimiter = ‘\ t’)
for row in content:
feature = cursor.newRow ()
vertex = arcpy.CreateObject (“ Point “)
coord = geolocator.geocode (row [address] .decode (‘latin1’) + row [place] .decode (‘latin1’), timeout = 25)

if coord is None:
failed_text + = row [address] + row [place] + “ \not
numbers_failed + = 1

if coord is not None:
vertex.X = coordinate.longitude
vertex.Y = coord.latitude
= vertex feature.shape
feature.Nom = row [address]
cursor.insertRow (feature)

file.write (failed_text)
file.close ()
del cursor
print « failed geocodes: »+ Str (numbers_failed) +« !!! check the file »+ OutFC

dataframe = arcpy.mapping.ListDataFrames (arcpy.mapping.MapDocument (‘current’)) [0]
geocodelayer = arcpy.mapping.Layer (outShp)
arcpy.mapping.AddLayer (dataframe, geocodelayer, “BOTTOM “)
layer_extent = geocodelayer.getExtent ()
dataframe.extent = layer_extent

Now let’s see the different parts.

First of all, we have the choice of the geolocator:

from geopy.geocoders import Nominatim
geolocator = Nominatim ()

If you want to use another geocoding service, this is the moment to choose it. Remember that some services require an API key, a username, a password. In these cases it will be necessary to add the adequate code to initialize the connection.

Secondly, we have recovery of the parameters provided in the script configuration window:

outShp = arcpy.GetParameterAsText (0)
infc = arcpy.GetParameterAsText (1)
outFC = arcpy.GetParameterAsText (4)
address = arcpy.GetParameterAsText (2)
arcpy.GetParameterAsText place = (3) You can edit the script for more setting options, such as the separator used in the address file. In this case you must add the parameter in the script configuration: right click on the script in the toolbox window -> Properties -> Settings tab

 

and add a code line with a variable name as well as the parameter index:

spacer = arcpy.GetParameterAsText (6)

Line:

cursor = arcpy.InsertCursor (outShp)

set up a cursor on the shape file that allows the insertion of the points of the found addresses.

Line :

  content = csv.DictReader (csvfile, delimiter = ‘\ t’)

uses the csv Python module to read the contents of the address file into a content variable . The delimiter = ‘\ t’ option indicates the module that the field separator is the tab. If you use another separator (which I do not recommend), this is where you should indicate it.

  feature = cursor.newRow ()
vertex = arcpy.CreateObject (“ Point “)

correspond to the creation of a new point type entity in the output shapefile.

The geocoding call uses the address and place variables that correspond to the address and city fields of the script parameters.

  coord = geolocator.geocode (row [address] .decode (‘latin1’) + row [place] .decode (‘latin1’), timeout = 25)

Depending on the encoding of the text file, you may need to change ‘latin-1’. You have multiple possibilities, but the most common are ‘utf-8’ and ‘UTF-16LE’.

If the return of geocoding (coord) is empty, write the address in the error file and if it is not empty, create the point in the shapefile.

  if coord is None:
failed_text + = row [address] + row [place] + “ \not
numbers_failed + = 1

if coord is not None:
vertex.X = coordinate.longitude
vertex.Y = coord.latitude
= vertex feature.shape
feature.Nom = row [address]
cursor.insertRow (feature)

feature.Name is the Name field created in the shapefile at the start of the process. If you call the field other than Name this line must be modified.

The end of the script loads the shapefile into the active ArcMap document. The final result of the order is as follows:

One Reply to “Arcmap: A Python script to geocode addresses with OpenStreetMap”

  1. I don’t know if it’s just me or if perhaps everybody
    else encountering issues with your blog. It looks like some of the text within your posts are running off the screen. Can someone else please comment and let me
    know if this is happening to them too? This may be a issue with my internet browser
    because I’ve had this happen previously. Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *