Blog d’Anita Graser

https://anitagraser.com

  • 21 mars 2020MovingPandas v0.3 released!
    MovingPandas has come a long way since 2018 when I started to experiment with GeoPandas for trajectory data handling. This week, MovingPandas passed peer review and was approved for pyOpenSci. This technical review process was extremely helpful in ensuring code, project, and documentation quality. I would strongly recommend it to everyone working on new data science libraries! The lastest v0.3 release is now available from conda-forge. All tutorials are available on MyBinder New features include: Support for GeoPandas 0.7 Trajectory collection aggregation functions to generate flow maps   …
  • 2 mars 2020Movement data in GIS #29: power your web apps with movement data using mobilitydb-sqlalchemy
    This is a guest post by Bommakanti Krishna Chaitanya @chaitan94 Introduction This post introduces mobilitydb-sqlalchemy, a tool I’m developing to make it easier for developers to use movement data in web applications. Many web developers use Object Relational Mappers such as SQLAlchemy to read/write Python objects from/to a database. Mobilitydb-sqlalchemy integrates the moving objects database MobilityDB into SQLAlchemy and Flask. This is an important step towards dealing with trajectory data using appropriate spatiotemporal data structures rather than plain spatial points or polylines. To make it even better, mobilitydb-sqlalchemy also supports MovingPandas. This makes it possible to write MovingPandas trajectory objects directly to MobilityDB. For this post, I have made a demo application which you can find live at https://mobilitydb-sqlalchemy-demo.adonmo.com/. The code for this demo app is open source and available on GitHub. Feel free to explore both the demo app and code! In the following sections, I will explain the most important parts of this demo app, to show how to use mobilitydb-sqlalchemy in your own webapp. If you want to reproduce this demo, you can clone the demo re …
  • 21 février 2020Movement data in GIS #28: open geospatial tools for movement data exploration
    We recently published a new paper on “Open Geospatial Tools for Movement Data Exploration” (open access). If you liked Movement data in GIS #26: towards a template for exploring movement data, you will find even more information about the context, challenges, and recent developments in this paper. It also presents three open source stacks for movement data exploration: QGIS + PostGIS: a combination that will be familiar to most open source GIS users Jupyter + MovingPandas: less common so far, but Jupyter notebooks are quickly gaining popularity (even in the proprietary GIS world) GeoMesa + Spark: for when datasets become too big to handle using other means and discusses their capabilities and limitations: This post is part of a series. Read more about movement data in GIS. …
  • 2 février 2020First working MovingPandas setup on Databricks
    In December, I wrote about GeoPandas on Databricks. Back then, I also tried to get MovingPandas working but without luck. (While GeoPandas can be installed using Databricks’ dbutils.library.installPyPI(« geopandas ») this PyPI install just didn’t want to work for MovingPandas.) Now that MovingPandas is available from conda-forge, I gave it another try and … *spoiler alert* … it works! First of all, conda support on Databricks is in beta. It’s not included in the default runtimes. At the time of writing this post, “6.0 Conda Beta” is the latest runtime with conda: Once the cluster is up and connected to the notebook, a quick conda list shows the installed packages: Time to install MovingPandas! I went with a 100% conda-forge installation. This takes a looong time (almost half an hour)! When the installs are finally done, it get’s serious: time to test the imports! Success! Now we can put the MovingPandas data structures to good use. But first we need to load some movement data: Or course, the points in this GeoDataFrame can be plotted. However, the plot isn’t automatically displayed once plot() is called on the GeoDataFrame. Instead, Databricks provides a display() function to display …
  • 12 janvier 2020Movement data in GIS #27: extracting trip origin clusters from MovingPandas trajectories
    This post is a follow-up to the draft template for exploring movement data I wrote about in my previous post. Specifically, I want to address step 4: Exploring patterns in trajectory and event data. The patterns I want to explore in this post are clusters of trip origins. The case study presented here is an extension of the MovingPandas ship data analysis notebook. The analysis consists of 4 steps: Splitting continuous GPS tracks into individual trips Extracting trip origins (start locations) Clustering trip origins Exploring clusters Since I have already removed AIS records with a speed over ground (SOG) value of zero from the dataset, we can use the split_by_observation_gap() function to split the continuous observations into individual trips. Trips that are shorter than 100 meters are automatically discarded as irrelevant clutter: traj_collection.min_length = 100 trips = traj_collection.split_by_observation_gap(timedelta(minutes=5)) The split operation results in 302 individual trips: Passenger vessel trajectories are blue, high-speed craft green, tankers red, and cargo vessels orange. Other vessel trajectories are gray. To extract trip origins, we can use the get_start_location …
  • 3 janvier 2020Movement data in GIS #26: towards a template for exploring movement data
    Exploring new datasets can be challenging. Addressing this challenge, there is a whole field called exploratory data analysis that focuses on exploring datasets, often with visual methods. Concerning movement data in particular, there’s a comprehensive book on the visual analysis of movement by Andrienko et al. (2013) and a host of papers, such as the recent state of the art summary by Andrienko et al. (2017). However, while the literature does provide concepts, methods, and example applications, these have not yet translated into readily available tools for analysts to use in their daily work. To fill this gap, I’m working on a template for movement data exploration implemented in Python using MovingPandas. The proposed workflow consists of five main steps: Establishing an overview by visualizing raw input data records Putting records in context by exploring information from consecutive movement data records (such as: time between records, speed, and direction) Extracting trajectories & events by dividing the raw continuous tracks into individual trajectories and/or events Exploring patterns in trajectory and event data by looking at groups of the trajectories or events Analyzing …
  • 7 décembre 2019Getting started with PySpark & GeoPandas on Databricks
    Over the last years, many data analysis platforms have added spatial support to their portfolio. Just two days ago, Databricks have published an extensive post on spatial analysis. I took their post as a sign that it is time to look into how PySpark and GeoPandas can work together to achieve scalable spatial analysis workflows. If you sign up for Databricks Community Edition, you get access to a toy cluster for experimenting with (Py)Spark. This considerably lowers the entry barrier to Spark since you don’t need to bother with installing anything yourself. They also provide a notebook environment: I’ve followed the official Databricks GeoPandas example notebook but expanded it to read from a real geodata format (GeoPackage) rather than from CSV. I’m using test data from the MovingPandas repository: demodata_geolife.gpkg contains a hand full of trajectories from the Geolife dataset. Demodata_grid.gpkg contains a simple 3×4 grid that covers the same geographic extent as the geolife sample: Once the files are downloaded, we can use GeoPandas to read the GeoPackages: Note that the display() function is used to show the plot. The same applies to the grid data: When the GeoDataFrames are …
  • 16 novembre 2019Movement data in GIS #25: moving object databases
    Recently there has been some buzz on Twitter about a new moving object database (MOD) called MobilityDB that builds on PostgreSQL and PostGIS (Zimányi et al. 2019). The MobilityDB Github repo has been published in February 2019 but according to the following presentation at PgConf.Russia 2019 it has been under development for a few years: Of course, moving object databases have been around for quite a while. The two most commonly cited MODs are HermesDB (Pelekis et al. 2008) which comes as an extension for either PostgreSQL or Oracle and is developed at the University of Piraeus and SECONDO (de Almeida et al. 2006) which is a stand-alone database system developed at the Fernuniversität Hagen. However, both MODs remain at the research prototype level and have not achieved broad adoption. It will be interesting to see if MobilityDB will be able to achieve the goal they have set in the title of Zimányi et al. (2019) to become “a mainstream moving object database system”. It’s promising that they are building on PostGIS and using its mature spatial analysis functionality instead of reinventing the wheel. They also discuss why they decided that PostGIS trajectories (which I’ve written a …
  • 3 novembre 2019Folium vs. hvplot for interactive maps of Point GeoDataFrames
    In the previous post, I showed how Folium can be used to create interactive maps of GeoPandas GeoDataFrames. Today’s post continues this theme. Specifically, it compares Folium to another dataviz library called hvplot. hvplot also recently added support for GeoDataFrames, so it’s interesting to see how these different solutions compare. Minimum viable The following snippets show the minimum code I found to put a GeoDataFrame of Points onto a map with either Folium or hvplot. Folium does not automatically zoom to the data extent and I didn’t find a way to add the whole GeoDataFrame of Points without looping through the rows individually: Hvplot on the other hand registers the hvplot function directly with the GeoDataFrame. This makes it as convenient to use as the original GeoPandas plot function. It also zooms to the data extent: Standard interaction and zoom to area of interest The following snippets ensure that the map is set to a useful extent and the map tools enable panning and zooming. With Folium, we have to set the map center and the zoom. The map tools are Leaflet defaults, so panning and zooming work as expected: Since hvplot does not come with mouse wheel zoom enabled by …
  • 31 octobre 2019Interactive plots for GeoPandas GeoDataFrames of LineStrings
    GeoPandas makes it easy to create basic visualizations of GeoDataFrames: However, if we want interactive plots, we need additional libraries. Folium (which is built on Leaflet) is a great option. However, all examples for plotting GeoDataFrames that I found focused on point or polygon data. So here is what I found to work for GeoDataFrames of LineStrings: First, some imports: import pandas as pd import geopandas import folium Loading the data: graph = geopandas.read_file(‘data/population_test-routes-geom.csv’) graph.crs = {‘init’ :’epsg:4326′} Creating the map using folium.Choropleth: m = folium.Map([48.2, 16.4], zoom_start=10) folium.Choropleth( graph[graph.geometry.length>0.001], line_weight=3, line_color=’blue’ ).add_to(m) m I also tried using folium.PolyLine which seemed like the more obvious choice but does not seem to accept GeoDataFrames as input. Instead, it expects a list of coordinate pairs and of course it expects them to be in the opposite order that Shapely.LineString.coords provides … Oh the joys of geodata! In any case, I had to limit the number of features that get plotted because Folium refuses to plot all 8778 features at once. I decided to filter by line length b …