Blog d’Anita Graser

https://anitagraser.com

  • 12 janvier 2020Movement data in GIS #27: extracting trip origin clusters from MovingPandas trajectories
    This post is a follow-up to the draft template for exploring movement data I wrote about in my previous post. Specifically, I want to address step 4: Exploring patterns in trajectory and event data. The patterns I want to explore in this post are clusters of trip origins. The case study presented here is an extension of the MovingPandas ship data analysis notebook. The analysis consists of 4 steps: Splitting continuous GPS tracks into individual trips Extracting trip origins (start locations) Clustering trip origins Exploring clusters Since I have already removed AIS records with a speed over ground (SOG) value of zero from the dataset, we can use the split_by_observation_gap() function to split the continuous observations into individual trips. Trips that are shorter than 100 meters are automatically discarded as irrelevant clutter: traj_collection.min_length = 100 trips = traj_collection.split_by_observation_gap(timedelta(minutes=5)) The split operation results in 302 individual trips: Passenger vessel trajectories are blue, high-speed craft green, tankers red, and cargo vessels orange. Other vessel trajectories are gray. To extract trip origins, we can use the get_start_location …
  • 3 janvier 2020Movement data in GIS #26: towards a template for exploring movement data
    Exploring new datasets can be challenging. Addressing this challenge, there is a whole field called exploratory data analysis that focuses on exploring datasets, often with visual methods. Concerning movement data in particular, there’s a comprehensive book on the visual analysis of movement by Andrienko et al. (2013) and a host of papers, such as the recent state of the art summary by Andrienko et al. (2017). However, while the literature does provide concepts, methods, and example applications, these have not yet translated into readily available tools for analysts to use in their daily work. To fill this gap, I’m working on a template for movement data exploration implemented in Python using MovingPandas. The proposed workflow consists of five main steps: Establishing an overview by visualizing raw input data records Putting records in context by exploring information from consecutive movement data records (such as: time between records, speed, and direction) Extracting trajectories & events by dividing the raw continuous tracks into individual trajectories and/or events Exploring patterns in trajectory and event data by looking at groups of the trajectories or events Analyzing …
  • 7 décembre 2019Getting started with PySpark & GeoPandas on Databricks
    Over the last years, many data analysis platforms have added spatial support to their portfolio. Just two days ago, Databricks have published an extensive post on spatial analysis. I took their post as a sign that it is time to look into how PySpark and GeoPandas can work together to achieve scalable spatial analysis workflows. If you sign up for Databricks Community Edition, you get access to a toy cluster for experimenting with (Py)Spark. This considerably lowers the entry barrier to Spark since you don’t need to bother with installing anything yourself. They also provide a notebook environment: I’ve followed the official Databricks GeoPandas example notebook but expanded it to read from a real geodata format (GeoPackage) rather than from CSV. I’m using test data from the MovingPandas repository: demodata_geolife.gpkg contains a hand full of trajectories from the Geolife dataset. Demodata_grid.gpkg contains a simple 3×4 grid that covers the same geographic extent as the geolife sample: Once the files are downloaded, we can use GeoPandas to read the GeoPackages: Note that the display() function is used to show the plot. The same applies to the grid data: When the GeoDataFrames are …
  • 16 novembre 2019Movement data in GIS #25: moving object databases
    Recently there has been some buzz on Twitter about a new moving object database (MOD) called MobilityDB that builds on PostgreSQL and PostGIS (Zimányi et al. 2019). The MobilityDB Github repo has been published in February 2019 but according to the following presentation at PgConf.Russia 2019 it has been under development for a few years: Of course, moving object databases have been around for quite a while. The two most commonly cited MODs are HermesDB (Pelekis et al. 2008) which comes as an extension for either PostgreSQL or Oracle and is developed at the University of Piraeus and SECONDO (de Almeida et al. 2006) which is a stand-alone database system developed at the Fernuniversität Hagen. However, both MODs remain at the research prototype level and have not achieved broad adoption. It will be interesting to see if MobilityDB will be able to achieve the goal they have set in the title of Zimányi et al. (2019) to become “a mainstream moving object database system”. It’s promising that they are building on PostGIS and using its mature spatial analysis functionality instead of reinventing the wheel. They also discuss why they decided that PostGIS trajectories (which I’ve written a …
  • 3 novembre 2019Folium vs. hvplot for interactive maps of Point GeoDataFrames
    In the previous post, I showed how Folium can be used to create interactive maps of GeoPandas GeoDataFrames. Today’s post continues this theme. Specifically, it compares Folium to another dataviz library called hvplot. hvplot also recently added support for GeoDataFrames, so it’s interesting to see how these different solutions compare. Minimum viable The following snippets show the minimum code I found to put a GeoDataFrame of Points onto a map with either Folium or hvplot. Folium does not automatically zoom to the data extent and I didn’t find a way to add the whole GeoDataFrame of Points without looping through the rows individually: Hvplot on the other hand registers the hvplot function directly with the GeoDataFrame. This makes it as convenient to use as the original GeoPandas plot function. It also zooms to the data extent: Standard interaction and zoom to area of interest The following snippets ensure that the map is set to a useful extent and the map tools enable panning and zooming. With Folium, we have to set the map center and the zoom. The map tools are Leaflet defaults, so panning and zooming work as expected: Since hvplot does not come with mouse wheel zoom enabled by …
  • 31 octobre 2019Interactive plots for GeoPandas GeoDataFrames of LineStrings
    GeoPandas makes it easy to create basic visualizations of GeoDataFrames: However, if we want interactive plots, we need additional libraries. Folium (which is built on Leaflet) is a great option. However, all examples for plotting GeoDataFrames that I found focused on point or polygon data. So here is what I found to work for GeoDataFrames of LineStrings: First, some imports: import pandas as pd import geopandas import folium Loading the data: graph = geopandas.read_file(‘data/population_test-routes-geom.csv’) graph.crs = {‘init’ :’epsg:4326′} Creating the map using folium.Choropleth: m = folium.Map([48.2, 16.4], zoom_start=10) folium.Choropleth( graph[graph.geometry.length>0.001], line_weight=3, line_color=’blue’ ).add_to(m) m I also tried using folium.PolyLine which seemed like the more obvious choice but does not seem to accept GeoDataFrames as input. Instead, it expects a list of coordinate pairs and of course it expects them to be in the opposite order that Shapely.LineString.coords provides … Oh the joys of geodata! In any case, I had to limit the number of features that get plotted because Folium refuses to plot all 8778 features at once. I decided to filter by line length b …
  • 11 septembre 2019Movement data in GIS #24: MovingPandas hands-on tutorials
    Last week, I had the pleasure to give a movement data analysis workshop at the OpenGeoHub summer school at the University of Münster in Germany. The workshop materials consist of three Jupyter notebooks that have been designed to also support self-study outside of a workshop setting. So you can try them out as well! All materials are available on Github: Tutorial 0 provides an introduction to the MovingPandas Trajectory class. Tutorials 1 and 2 provide examples with real-world datasets covering one day of ship movement near Gothenburg and multiple years of gull migration, respectively. Here’s a quick preview of the bird migration data analysis tutorial (click for full size): Tutorial 2: Bird migration data analysis You can run all three Jupyter notebooks online using MyBinder (no installations required). Alternatively or if you want to dig deeper: installation instructions are available on movingpandas.org The OpenGeoHub summer school this year had a strong focus on spatial analysis with R and GRASS (sometimes mixing those two together). It was great to meet @mdsumner (author of R trip) and @edzerpebesma (author of R trajectories) for what might have well been the ultimate movement …
  • 7 juillet 2019Five QGIS network analysis toolboxes for routing and isochrones
    In the past, network analysis capabilities in QGIS were rather limited or not straight-forward to use. This has changed! In QGIS 3.x, we now have a wide range of network analysis tools, both for use case where you want to use your own network data, as well as use cases where you don’t have access to appropriate data or just prefer to use an existing service. This blog post aims to provide an overview of the options: Based on local network data Default QGIS Processing network analysis tools QNEAT3 plugin Based on web services Hqgis plugin (HERE) ORS Tools plugin (openrouteservice.org) TravelTime platform plugin (TravelTime platform) All five options provide Processing toolbox integration but not at the same level. If you are a regular reader of this blog, you’re probably also aware of the pgRoutingLayer plugin. However, I’m not including it in this list due to its dependency on PostGIS and its pgRouting extension. Processing network analysis tools The default Processing network analysis tools are provided out of the box. They provide functionality to compute least cost paths and service areas (distance or time) based on your own network data. Inputs can be individual points or layer …
  • 22 mai 2019Movement data in GIS #23: trajectories in context
    Today’s post continues where “Why you should be using PostGIS trajectories” leaves off. It’s the result of a collaboration with Eva Westermeier. I had the pleasure to supervise her internship at AIT last year and also co-supervised her Master’s thesis [0] on the topic of enriching trajectories with information about their geographic context. Context-aware analysis of movement data is crucial for different domains and applications, from transport to ecology. While there is a wealth of data, efficient and user-friendly contextual trajectory analysis is still hampered by a lack of appropriate conceptual approaches and practical methods. (Westermeier, 2018) Part of the work was focused on evaluating different approaches to adding context information from vector datasets to trajectories in PostGIS. For example, adding land cover context to animal movement data or adding information on anchoring and harbor areas to vessel movement data. Classic point-based model vs. line-based model The obvious approach is to intersect the trajectory points with context data. This is the classic point data model of contextual trajectories. It’s straightforward to add context information in the point-base …
  • 3 mai 2019Flow maps in QGIS – no plugins needed!
    If you’ve been following my posts, you’ll no doubt have seen quite a few flow maps on this blog. This tutorial brings together many different elements to show you exactly how to create a flow map from scratch. It’s the result of a collaboration with Hans-Jörg Stark from Switzerland who collected the data. The flow data The data presented in this post stems from a survey conducted among public transport users, especially commuters (available online at: https://de.surveymonkey.com/r/57D33V6). Among other questions, the questionnair asks where the commuters start their journey and where they are heading. The answers had to be cleaned up to correct for different spellings, spelling errors, and multiple locations in one field. This cleaning and the following geocoding step were implemented in Python. Afterwards, the flow information was aggregated to count the number of nominations of each connection between different places. Finally, these connections (edges that contain start id, destination id and number of nominations) were stored in a text file. In addition, the locations were stored in a second text file containing id, location name, and co-ordinates. Why was this data collected? …