Analyzing real-world geospatial networks in R

Author

Lucas van der Meer

Geospatial networks are graphs embedded in geographical space. Space is relevant in their analysis, since the graph topology alone does not contain all information. To facilitate such an integrated analysis, we created the R package sfnetworks, which combines the power of the sf package for spatial data science and the tidygraph package for standard graph analysis. At the UseR! Conference in Salzburg, we presented a simple example of how sfnetworks in combination with other R packages can be used to analyze a real-world spatial network. This notebook contains the code belonging to that talk.

Analysis

The analysis aims to find out how many restaurants we can reach from the conference venue, if we would travel by bicycle. It consists of five steps. First, we obtain the centerlines of streets and the locations of restaurants in Salzburg from OpenStreetMap, using the osmdata package. Then, we estimate a cycling travel time for each street segment, taking into account the gradient which we compute with the slopes package. We also assign each street segment a cycling suitability level, describing how well it is designed for cycling. The street centerlines are then converted into a routable network structure with the sfnetworks package. Finally, we compute a simple cumulative opportunities metric to quantify the accessibility of restaurants from the conference venue.

Prepare

We will use the following packages in our analysis. To be clear which function comes from which package, we will explicitly use namespaces in all code chunks.

library(sfnetworks)
library(sf)
library(tidygraph)
library(tidyverse)
library(osmdata)
library(slopes)
library(terra)
library(units)

1. Get OpenStreetMap data

In this step, we use the osmdata package to retrieve data from OpenStreetMap. This package basically provides an R interface to the Overpass API of OpenStreetMap. First, we set the conference venue and draw a circular buffer around it with a radius of 3km, which will serve as our area of interest.

Tip

If your queries are larger or more frequent, you should probably not overload the Overpass API. Instead, you can download an OpenStreetMap extract (e.g., from Geofabrik) and read the required data into R using the osmextract package.

venue = sf::st_sfc(sf::st_point(c(13.04205, 47.81415)), crs = 4326)
area = sf::st_buffer(venue, units::set_units(3, "km"))

Now we query all streets within our area of interest from OpenStreetMap. In OpenStreetMap, streets are those features that have a value for the highway key. In R, they will be represented by an sf data frame with linestring geometries. Since some streets may be drawn as a closed polygon (like a square or a roundabout) we convert polygon borders into linestrings, before extracting only the linestrings from the API response. Finally, we cut streets at the edge of our area of interest and make sure all of them are single linestrings, meaning that the different parts of multi-linestrings will be seen as different street segments (note that to do this properly we first need to cast everything to multi-linestrings).

streets = osmdata::opq(sf::st_bbox(area)) |>
  osmdata::add_osm_feature(key = "highway") |>
  osmdata::osmdata_sf() |>
  osmdata::osm_poly2line() |>
  purrr::pluck("osm_lines") |>
  sf::st_intersection(area) |>
  sf::st_cast("MULTILINESTRING") |>
  sf::st_cast("LINESTRING")

In a similar way, we also query all restaurant locations within our area of interest. They will serve later on as the destinations in our accessibility analysis. In OpenStreetMap, restaurants can be identified as those features having the value restaurant for the amenity key. They are represented as points instead of linestrings.

restaurants = osmdata::opq(sf::st_bbox(area)) |>
  osmdata::add_osm_feature(key = "amenity", value = "restaurant") |>
  osmdata::osmdata_sf() |>
  purrr::pluck("osm_points") |>
  sf::st_intersection(area)
par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(streets))
plot(sf::st_geometry(restaurants), col = "orange", pch = 20, cex = 0.5, add = TRUE)

2. Estimate cycling travel times

In this step, we estimate cycling travel times for each street segment. These are computed (obviously) as length / speed. First, we compute the length in kilometers of the street segments.

streets$length = sf::st_length(streets) |>
  units::set_units("km")
plot(dplyr::select(streets, length))

For the cycling speed, we define a default speed which we adjust based on the gradient of the street segment. To do so, we first need to estimate the gradient of each street segment. This requires a Digital Elevation Model (DEM), which basically is a spatial grid covering an area and storing an elevation value for each cell. We use a DEM for Austria with a 10m resolution, available as open data from the Austrian government. We included a cropped version of it in the data folder of this repository, which we load with the terra package. To estimate gradients of linestrings with this DEM, we use the slopes package.

file = here::here("data/dem.tif")

dem = terra::rast(file) |>
  terra::project("epsg:4326")

gradients = slopes::slope_raster(streets, dem, directed = TRUE) * 100

Now we can set a speed of each street segment, based on its gradient. We use a default speed of 20 km/h. An uphill gradient decreases this speed, and a downhill gradient increases this speed, all until a given minimum and maximum speed of respectively 5 km/h and 30 km/h. For reasons of simplicity, we will just model a linear relation between gradient and speed, adapted from Parkin & Rotheram (2010).

estimate_speed = function(gradient, default = 20, max = 30, min = 5) {
  ifelse(
    gradient < 0,
    min(default + 0.8 * abs(gradient), max), # downhill
    max(default - 1.4 * gradient, min) # uphill
  )
}

streets$speed = sapply(gradients, estimate_speed) |>
  units::set_units("km/h")
plot(dplyr::select(streets, speed))

Now we have the lengths and speeds computed, we can simply compute the estimated travel time of cyclists for each street segment, in minutes.

streets$time = units::set_units(streets$length / streets$speed, "min")
plot(dplyr::select(streets, time))

3. Estimate cycling suitability

In this step, we assign each street segment a cycling suitability level. This should give an indication of how suitable the infrastructure is for cycling, in terms of safety and comfort. A common approach is to use four different levels of traffic stress, first introduced by Mekuria, Furth & Nixon (2012) and later applied in practice with differing implementations. Another approach is the NetAScore, and open-source project developed at the University of Salzburg that allows to compute a continuous bicycle suitability index based on OpenStreetMap data.

In this notebook, however, we will just use a simplified workflow serving as an example. It defines three different levels of cycling suitability based on the available cycling infrastructure and the street classification. Separated bike lanes form the highest level. Painted bike lanes, shared lanes, and residential streets form the intermediate level. Everything else becomes the lowest level.

estimate_suitability = function(hw, cw, cwl, cwr, cwb, bc, ...) {
  lanes = c("lane", "shared_lane", "share_busway")
  if (hw %in% c("cycleway")) {
    1
  } else if (hw %in% c("residential", "living_street")) {
    2
  } else if (cw %in% lanes | cwl %in% lanes | cwr %in% lanes | cwb %in% lanes) {
    2
  } else if (hw %in% c("footway") & bc %in% c("yes", "designated")) {
    2
  } else {
    3
  }
}

cols = c(hw = "highway", cw = "cycleway", cwl = "cycleway:left",
         cwr = "cycleway:right", cwb = "cycleway:both", bc = "bicycle")

streets$level = streets |>
  sf::st_drop_geometry() |>
  dplyr::select(any_of(cols)) |>
  setNames(c("hw", "cw", "cwl", "cwr", "cwb", "bc")) |>
  purrr::pmap_dbl(estimate_suitability) |>
  factor(levels = c(1, 2, 3), labels = c("good", "medium", "low"))
plot(dplyr::select(streets, level))

4. Construct a routable street network

In this step, we convert the street centerlines into a routable network structure. This basically means that we make it explicit which street segments share an endpoint, such that we know how streets are connected, and hence, how someone can travel from A to B. This can be done very easily with the sfnetworks package. Note that for now we simply assume each street can be traveled in both directions, and the network is therefore undirected.

network = sfnetworks::as_sfnetwork(streets, directed = FALSE)
par(mar = c(0, 0, 0, 0))
plot(network, cex = 0.5)

However, streets in OpenStreetMap are not always mapped with a network structure in mind. It happens often that connected street segments cross each other without sharing an endpoint. This results in many disconnected sets of streets in the network, as shown below. Even our small network has over 6000 disconnected components, with the largest of them containing only around 20% of all nodes.

tidygraph::with_graph(network, tidygraph::graph_component_count())
[1] 6045
node_count_all = network |>
  tidygraph::with_graph(tidygraph::graph_order())

node_count_largest = network |>
  tidygraph::convert(tidygraph::to_largest_component) |>
  tidygraph::with_graph(tidygraph::graph_order())

round(node_count_largest / node_count_all * 100)
[1] 19

Luckily, the sfnetworks package contains functions to pre-process and clean networks. One of them is to_spatial_subdivision(), which connects streets when they share internal vertices rather than only endpoints. This results in a network that still has some disconnected components (which is to be expected since we cut the network at the buffer boundaries), but with the largest of them containing almost all nodes.

Tip

There are many more network pre-processing and cleaning functions in sfnetworks. See the sfnetworks documentation for details.

network = network |>
  tidygraph::convert(sfnetworks::to_spatial_subdivision)
Warning: to_spatial_subdivision assumes attributes are constant over geometries
par(mar = c(0, 0, 0, 0))
plot(network, cex = 0.5)

tidygraph::with_graph(network, tidygraph::graph_component_count())
[1] 313
node_count_all = network |>
  tidygraph::with_graph(tidygraph::graph_order())

node_count_largest = network |>
  tidygraph::convert(tidygraph::to_largest_component) |>
  tidygraph::with_graph(tidygraph::graph_order())

round(node_count_largest / node_count_all * 100)
[1] 96

Now we can select the largest component without having to worry we loose a lot of data.

network = network |>
  tidygraph::convert(tidygraph::to_largest_component)
par(mar = c(0, 0, 0, 0))
plot(network, cex = 0.5)

5. Compute accessibility to restaurants

In this step, we compute a simple cumulative opportunities accessibility metric to find out how many restaurants can be reached from the conference venue within a given time threshold. For that, we use sfnetworks to compute travel times between the venue and all restaurants, and select those restaurants that are accessible within the time threshold. The threshold we use is 15 minutes.

times = network |>
  sfnetworks::st_network_cost(venue, restaurants, weights = "time")

access = restaurants[which(times < units::set_units(15, "min")), ]

nrow(access)
[1] 240
par(mar = c(0, 0, 0, 0))
plot(network, cex = 0.5)
plot(sf::st_geometry(access), col = "orange", pch = 20, add = TRUE)

That is a lot of restaurants! But we used the whole network in the analysis, while some street segments are not suitable for cycling according to our index. If we only consider the first and second suitability level, results already look much different.

suitable_network = network |>
  tidygraph::activate("edges") |>
  dplyr::filter(level %in% c("medium", "good"))

times = suitable_network |>
  sfnetworks::st_network_cost(venue, restaurants, weights = "time")

access = restaurants[which(times < units::set_units(15, "min")), ]

nrow(access)
[1] 5
trimmed_suitable_network = suitable_network |>
  tidygraph::activate("nodes") |>
  dplyr::filter(!tidygraph::node_is_isolated())

par(mar = c(0, 0, 0, 0))
plot(network, col = "grey", cex = 0.4)
plot(trimmed_suitable_network, cex = 0.5, add = TRUE)
plot(sf::st_geometry(access), col = "orange", pch = 20, add = TRUE)

And if we find only separated bike lanes to be acceptable, we will unfortunately remain hungry.

suitable_network = network |>
  tidygraph::activate("edges") |>
  dplyr::filter(level %in% c("good"))

times = suitable_network |>
  sfnetworks::st_network_cost(venue, restaurants, weights = "time")

access = restaurants[which(times < units::set_units(15, "min")), ]

nrow(access)
[1] 0
trimmed_suitable_network = suitable_network |>
  tidygraph::activate("nodes") |>
  dplyr::filter(!tidygraph::node_is_isolated())

par(mar = c(0, 0, 0, 0))
plot(network, col = "grey", cex = 0.4)
plot(trimmed_suitable_network, cex = 0.5, add = TRUE)

Remarks

1. We simplified a lot

In this notebook we often used simple examples. In reality, accessibility of urban street networks is much more complex, with many factors at play.

2. We only touched upon the tip of the R iceberg

We showed a few R packages that are useful in the analysis of urban street networks. However, there are much more out there. The transport planning community in R is full of nice and talented people that share great work. Some examples:

  • dodgr: A package for fast route calculations and flow aggregations on street networks based on C++ code.
  • cpprouting: A package for fast route calculations and traffic assignment on street networks based on C++ code.
  • osrm: An R interface to the Open Source Routing Machine, a routing software for OpenStreetMap written in C++.
  • openrouteservice: An R interface to Openrouteservice, an open-source routing software written in Java.
  • gtfsrouter: A package for fast route calculations on public transport time tables based on C++ code.
  • r5r: An R interface to the R5 routing software for multi-modal routing.
  • opentripplanner: An R interface to the OpenTripPlanner routing software for multi-modal routing.
  • m4ra: A package for multi-modal routing based on C++ code (still in development).
  • gtfsio: A package to read, represent and write General Transit Feed Specification data on public transport time tables.
  • gtfstools: A package to edit and analyze General Transit Feed Specification data on public transport time tables.
  • stplanr: A multi-purpose package for sustainable transport planning with functions for origin-destination matrix analysis, routing, and more.
  • accessibility: A package that implements different ways to compute transport accessibility metrics.

Read also the Transportation chapter of the open-source book Geocomputation with R.

3. There is more to come

Open-source projects like sfnetworks are always in development. We are currently working on a new version of the package, hopefully to be released this fall. We are also working on a new package bikesuiter in which we are implementing different ways to analyze the bicycle suitability of street networks. Stay tuned!