Blog d’Anita Graser

https://anitagraser.com

  • 25 juin 2025QGIS User Conf 2025 videos have landed!
    The QGISUC2025 team has done an awesome job recording and editing the conference presentations. All “presentation” type talks where the presenter has accepted to be published are now available in a dedicated list on the QGIS Youtube channel. I also had the pleasure of presenting our Trajectools plugin and you can see this talk here: [youtube https://www.youtube.com/watch?v=T7haF1DPy2U?si=rqFAe0AE4dnQqnvm&version=3&rel=1&showsearch=0&showinfo=1&iv_load_policy=1&fs=1&hl=en&autohide=2&wmode=transparent&w=545&h=307] Thank you to all the organizers, speakers, and participants for the great time! …
  • 17 mai 2025Speed up your analytics with the new MovingPandas 0.22 and Trajectools 2.6
    The latest releases of MovingPandas and Trajectools come with many “under the hood” changes that aim to make your movement analytics faster: Instead of immediately creating a GeoPandas GeoDataFrame and populating the geometry column with Point objects, MovingPandas now has “lazy geometry column creation” that holds off on this operation until / if the geometries are actually needed. This way, for many operations, no geometry objects have to be generated at all. MovingPandas TrajectorySplitters now support parallel processing and Trajectools uses parallel processing whenever available (e.g. for adding speed & direction metrics, detecting stops, splitting trajectories). When a minimum length is specified for trajectories, MovingPandas now avoids computing the total trajectory length and, instead, immediately stops once the threshold value has been reached (“early skip”). Trajectools now offers the option to skip computation of movement metrics (speed & direction). This way, we can skip unnecessary computations and leverage the lazy geometry column creation, wherever applicable. Let’s have a look at some example performance measurements! Example 1: MovingPandas ValueChangeSplitter The …
  • 29 mars 2025The quest for a fair TimeGPT benchmark
    At the end of yesterday’s TimeGPT for mobility post, we concluded that TimeGPT’s trainingset probably included a copy of the popular BikeNYC timeseries dataset and that, therefore, we were not looking at a fair comparison. Naturally, it’s hard to find mobility timeseries datasets online that can be publicized but haven’t been widely disseminated and therefore may have slipped past the scrapers of foundation models builders. So I scoured the Austrian open government data portal and came up with a bike-share dataset from Vienna. Dataset SharedMobility.ai dataset published by Philipp Naderer-Puiu, covering 2019-05-05 to 2019-12-31. Here are eight of the 120 stations in the dataset. I’ve resampled the number of available bicycles to the maximum hourly value and made a cutoff mid August (before a larger data collection cap and the less busy autumn and winter seasons): Models To benchmark TimeGPT, I computed different baseline predictions. I used statsforecast’s HistoricAverage, SeasonalNaive, and AutoARIMA models and computed predictions for horizons of 1 hour, 12 hours, and 24 hours. Here are examples of the 12-hour predictions: We can see how Historic Average is pretty much a straight …
  • 28 mars 2025TimeGPT for mobility: Can foundation models outperform classic machine learning models for mobility predictions?
    tldr; Maybe. Preliminary results certainly are impressive. Introduction Crowd and flow predictions have been very popular topics in mobility data science. Traditional forecasting methods rely on classic machine learning models like ARIMA, later followed by deep learning approaches such as ST-ResNet. More recently, foundation models for timeseries forecasting, such as TimeGPT, Chronos, and LagLlama have been introduced. A key advantage of these models is their ability to generate zero-shot predictions — meaning that they can be applied directly to new tasks without requiring retraining for each scenario. In this post, I want to compare TimeGPT’s performance against traditional approaches for predicting city-wide crowd flows. Experiment setup The experiment builds on the paper “Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction” by Zhang et al. (2017). The original repo referenced on the homepage does not exist anymore. Therefore, I forked: https://github.com/topazape/ST-ResNet as a starting point. The goals of this experiment are to: Get an impression how TimeGPT predicts mobility timeseries. Compare TimeGPT to classic machine learning (ML) and deep learning …
  • 10 mars 2025Analyzing GTFS Realtime Data for Public Transport Insights
    In today’s post, we (that is, Gaspard Merten from Universite Libre de Bruxelles and yours truly) are going to dive deep into how to analyze public transport data, using both schedule and real time information. This collaboration has been made possible by the EMERALDS project. Previously, I already shared news about GTFS algorithms for Trajectools that add GTFS preprocessing tools (incl. Route, segment, and stop layer extraction) to the QGIS Processing toolbox.  Today, we’ll discuss the aspect of handling realtime GTFS data and how we approach analytics that combine both data sources. About Realtime GTFS  Many of us have come to rely on real-time public transport updates in apps like Google Maps. These apps are powered by standardized data formats that ensure different systems can communicate. Google first introduced GTFS in 2005, a format designed to organize transit schedules, stop locations, and other static transit information. Then, in 2011, they introduced GTFS Realtime (GTFS-RT), which added the capability to include live updates on vehicle positions, delays, speeds, and much more. However, as the name suggests, GTFS Realtime is all about live data. This means that while GTFS …
  • 1 mars 2025Trajectools is moving to Codeberg
    The Trajectools repository is migrating from GitHub to Codeberg. The new home for Trajectools is: https://codeberg.org/movingpandas/trajectools The GitHub repo remains as a writable mirror, for now, but the issue tracking is only active on Codeberg. Why the move? I am working on moving my projects to European infrastructure that better aligns with my values. Codeberg is a nonprofit and libre-friendly platform based in Germany. This will ensure that the projects are hosted on infrastructure that prioritizes user privacy and open-source ideals. What does this mean for users? No impact on functionality – Trajectools remains the same great tool for trajectory analysis, available through the recently update QGIS Plugin Repo. Development continues – I’ll continue actively maintaining and improving the project. (If you want to file feature requests, please note that the issue tracker on the GitHub mirror has been deactivated and issues should be filed on Codeberg instead.) What does this mean for contributors? If you’re contributing to Trajectools, simply update your remotes to the new repository. The GitHub repo continues to accept PRs and the changes are synched between GitHub and Codeb …
  • 31 janvier 2025Geocomputation with Python: now in print!
    Today, I’m super excited to share with you the announcement that our open source textbook “Geocomputation with Python” has finally arrived in print and is now available for purchase from Routledge.com, Amazon.com, Amazon.co.uk, and other booksellers. “Geocomputation with Python” (or geocompy for short) covers the entire range of standard GIS operations for both vector and raster data models. Each section and chapter builds on the previous. If you’re just starting out with Python to work with geographic data, we hope that the book will be an excellent place to start. Of course, you can still find the online version of the book at py.geocompx.org. The book is open-source and you can find the code on GitHub. This ensures that the content is reproducible, transparent, and accessible. It also lets you interact with the project by opening issues and submitting pull requests. …
  • 11 janvier 2025Trajectools 2.4 release
    In this new release, you will find new algorithms, default output styles, and other usability improvements, in particular for working with public transport schedules in GTFS format, including: Added GTFS algorithms for extracting stops, fixes #43 Added default output styles for GTFS stops and segments c600060 Added Trajectory splitting at field value changes 286fdbd Added option to add selected fields to output trajectories layer, fixes #53 Improved UI of the split by observation gap algorithm, fixes #36 Note: To use this new version of Trajectools, please upgrade your installation of MovingPandas to >= 0.21.2, e.g. using import pip; pip.main([‘install’, ‘–upgrade’, ‘movingpandas’]) or conda install movingpandas==0.21.2 …
  • 17 décembre 2024Urban mobility insights with MovingPandas & CARTO in Snowflake
    Today, I want to point out a blog post over at https://carto.com/blog/urban-mobility-insights-with-movingpandas-carto-in-snowflake written together with my fellow co-authors and EMERALDS project team member Argyrios Kyrgiazos. For the technically inclined, the highlight are the presented UDFs in Snowflake to process and transform the trajectory data. For example, here’s a TemporalSplitter UDF: CREATE OR REPLACE FUNCTION CARTO_DATABASE.CARTO.TemporalSplitter(geom ARRAY, t ARRAY, mode STRING) RETURNS ARRAY LANGUAGE PYTHON RUNTIME_VERSION = 3.11 PACKAGES = (‘numpy’,’pandas’, ‘geopandas’,’movingpandas’, ‘shapely’) HANDLER = ‘udf’ AS $$ import numpy as np import pandas as pd import geopandas as gpd import movingpandas as mpd import shapely from shapely.geometry import shape, mapping, Point, Polygon from shapely.validation import make_valid from datetime import datetime, timedelta def udf(geom, t, mode): valid_df = pd.DataFrame(geom, columns=[‘geometry’]) valid_df[‘t’] = pd.to_datetime(t) valid_df[‘geometry’] = valid_df[‘geometry’].apply(lambda x:shapely.wkt.loads(x)) gdf = gpd.GeoDataFrame(valid_df, geometry=’geometry’, crs=’epsg:4326′) gdf = gdf.set_index(‘t’) traj = mpd.Trajectory(gdf …
  • 23 novembre 2024GeoParquet in QGIS – smaller & faster files for the win!
    tldr; Tired of working with large CSV files? Give GeoParquet a try! “Parquet is a powerful column-oriented data format, built from the ground up to as a modern alternative to CSV files.” https://geoparquet.org/ (Geo)Parquet is both smaller and faster than CSV. Additionally, (Geo)Parquet columns are typed. Text, numeric values, dates, geometries retain their data types. GeoParquet also stores CRS information and support in GIS solutions is growing. I’ll be giving a quick overview using AIS data in GeoPandas 1.0.1 (with pyarrow) and QGIS 3.38 (with GDAL 3.9.2). File size The example AIS dataset for this demo contains ~10 million rows with 22 columns. I’ve converted the original zipped CSV into GeoPackage and GeoParquet using GeoPandas to illustrate the huge difference in file size: ~470 MB for GeoParquet and zipped CSV, 1.6 GB for CSV, and a whopping 2.6 GB for GeoPackage: Reading performance Pandas and GeoPandas both support selective reading of files, i.e. we can specify the specific columns to be loaded. This does speed up reading, even from CSV files: Whole fileSelected columnsCSV27.9 s13.1 sGeopackage2min 12s 20.2 sGeoParquet7.2 s4.1 s Indeed, reading the whole GeoPackage is get …