vignettes/report2.Rmd
report2.Rmd
This report outlines progress on the SaferActive project, which aims to provide a strong evidence base for interventions that simultaneously boost walking and cycling and greatly improve road safety. An overview of the project’s context and aims can be found at the open access project page github.com/saferactive/saferactive. This is the 2nd quarterly report, building on the 1st SaferActive report, published July 2020. Overall we have been working on the following areas:
One of the objectives of the project was to develop software to enable access to data on traffic calming interventions.
We have progressed with the development of the trafficalmr
package for reproducible road safety analysis, supporting the project and other projects on road safety.
We would like feedback from stakeholders: is it easy to use? what additional functionality would you like to see? The list of functions provided by the package can be found here: https://saferactive.github.io/trafficalmr/reference/index.html
We analysed the DfT’s manual traffic count data to explore its potential to provide estimates of the spatio-teamporal estimates of cycling levels — measured in km (or billion km, bkm, for compatibility with international road safety research) cycled per year — over time. A key component of this work was the expansion of the geographical scope from our London case study, to cover the whole of England, Wales and Scotland.
Over the period 2000-2019, there is an uneven number of locations where annual average daily flow (AADF) estimates are available, as shown in Figure 3.1.
Figure 3.1 reveals the number of count points has fluctuated from year to year, with over 14,000 count points in 2008, but less than 7000 counts in 2014. In addition, there has been a high degree of mobility of cycle count points over the last two decades. There are >2000 locations that have been sampled just once over the years 2000-2019, and >4000 locations that have been sampled in just two of these years. Relatively few locations have been sampled in more than 11 years. This flux in the number and location of count points makes it more difficult to assess changes in cycling uptake, because we do not have a consistent dataset from one year to the next. However, sampling location selection has been more consistent over the period 2010-2019 so we have focused our analysis on this ten-year period.
Over the years 2010 to 2019, there are 3,049 count locations which had continuous readings for each of these 10 years. This represents 35% of the total counts in these years. Taking solely these 3,049 count locations, we are able to generate estimates of cycling uptake at the local authority level which are fully independent of year-to-year changes in the location and type of roads chosen for cycle counts. These uptake estimates are presented, relative to 2011 cycling levels, in Figure 3.2, showing an upward trend in cycling uptake in the first half of the decade.
The mean AADF per year, for the years 2010-2019, is shown in Figure 3.3. We include three different estimates of mean AADF: across all 13,303 locations with at least two years of data, across the 5,895 locations with at least 5 years of sampling, and across the 3,049 count locations with complete annual data.
We can see that not only is the temporal trend different if we include a greater number of cycle count points, but the mean AADF is also higher. One reason for this is that counts are made on roads of a wide range of types. Of the count points with continuous readings in the years 2010-2019, 75% are on ‘C’ and unclassified roads, 24% on ‘B’ roads, and less than 1% on principal or trunk ‘A’ roads. By comparison, for all count points with at least two years of data during this period, 46% of counts are on ‘C’ and unclassified roads, 16% on ‘B’ roads, 29% on principal ‘A’ roads and 9% on trunk ‘A’ roads.
This is relevant because mean AADF varies greatly by road type. For count points with at least two years of data, mean AADF ranges from 217 on Principal ‘A’ roads to 71 on ‘B’ roads, 38 on ‘C’ and unclassified roads, and just 12 on Trunk ‘A’ roads.
For count points with complete annual data (Figure 3.4), the mean increase in cycling uptake during the years 2010-2014 closely corresponds with counts from minor roads (‘B’, ‘C’ and unclassified roads). However, mean AADF is highest on Principal ‘A’ roads at 214 cycles per day, and lowest on Trunk ‘A’ roads at 2.4 cycles per day. On Trunk ‘A’ roads, zero cyclists were recorded at 34% of counts; these may represent locations on major national routes that are unsuitable for most cyclists.
The number of counts held on roads of different types also varies from year to year, although there is greater consistently in the decade 2010-2019 than in the previous decade. For count points with at least two years of data, Figure 3.5 shows the number of annual counts on roads of each type.
Generalized Additive Models were chosen for their flexibility to accommodate non-linear trends in spatial and temporal variables, and the ability to include random effects within the model.
They also allow the production of easily interpretable partial effects for each model parameter.
We used the function bam()
in the mgcv
R package (Wood et al. 2014) which is specifically designed for use with large datasets as it is less computationally intensive than other methods.
The models currently use the absolute count data as the response variable, and follow a negative binomial error distribution with a log link function.
Initially, GAM models were developed for London only. Using the raw hourly count data all cycle count points in London over the years 2010-2019, we produced a GAM model with smooth terms for year, day of the year, hour, space, road category, and an interaction term for year and space.
Preliminary results, based on a General Additive Model (GAM) are shown in Figure 3.9.
Building on results for London, we scaled-up the approach to provide nationwide spatio-temporal estimates of cycling. For the national GAM model we simplified the inputs, basing it on the AADF data rather than the raw hourly count data.
The model has smooth terms for the variables year, space and road category, each of which were found to be significant at p<0.05. This model uses all counts during the years 2010-2019, not just the counts from locations that were sampled every year. This allows us to make full use of the available data, with all road types represented.
The smooth term for year uses a cubic regression spline with 5 knots. The cubic spline utilises a low number of knots spread at even intervals through the range of parameter values. This helps to prevent overfitting.
British National Grid eastings and northings are represented in a duchon spline. This is a generalization of a thin plate spline (Duchon 1977; Wood 2003), which allows for two dimensional smoothing. We specified 100 knots for this spline, enabling the representation of complex spatial patterns in cycling levels across the UK.
Road category is included as a random effect in the model. Its inclusion is vital given that cycling levels vary greatly according to the type of road surveyed, and the number of counts from roads of each category varies from year to year.
The partial effects of year are shown in Figure 3.10.
The partial effects of space are shown in Figure 3.11.
The analysis above shows that the although the count data is ‘noisy’, it has potential to provide an indicator of relative change in cycling levels at regional to local authority levels. A next step, outlined in the final section of this report, is to assess how large the confidence intervals are associated with this dataset, by comparing the relatively sparse DfT count data with larger external datasets including TfL’s open cycle counter network data.
Building on methods we initially developed for the CyIPT project, we developed new techniques for allocating crashes to features on the road network, such as junctions and road sections. Preliminary results comparing CyIPT results and results from this project are shown in Figure 4.1.
The objective of this part of the project is to increase to measure to level of risk of collisions at a very high level of spatial detail. Ideally we seek to use historical data to identify specific roads and junctions which are especially dangerous and thus need to be fixed.
The first stage of the process is to separate crashes which occurred at junctions, and those that did not. Junctions are a known “hotspot” for collisions thus should be treated separately from roads in general. Fortunately the stats19 data indicated whether a collision occurred at or near a junction. The second stage of the process it to simplify the road and junction network for analysis. This is a complex multistage process, but the overall objective is to reduce the road network so simple an easily comprehensible from by excluding and aggregating unnecessary details. For example, a single road will be represented within the OpenStreetMap as many geometries. For our purposes, a single line representing the main centreline of the road is sufficient. Thus small details just a turning lanes, or alleyways are removed and road segments are combined into a single linestring. Road segments are grouped by road name and reference allowing for “human legible” road units to be constructed. In the case of very long roads (over 1km) roads are broken into 500m sections.
Junctions are similarly simplified in a three stage process:
osm_cleaning.R
.A reproducible example showing how the junction identification method works is shown in the code chunk below, which starts by installing the trafficalmr
package and uses a number of other functions developed for this project:
remotes::install_github("saferactive/trafficalmr")
library(trafficalmr)
road_data = tc_data_osm
nrow(tc_data_osm)
#> [1] 499
road_data_major = trafficalmr::osm_main_roads(road_data)
nrow(road_data_major) # find the main roads
#> [1] 125
road_data_junctions = osm_get_junctions(road_data_major)
road_data_clusters = cluster_junction(road_data_junctions, dist = 30)
plot(road_data$geometry, col = "grey")
plot(road_data_clusters, add = TRUE)
#> Warning in plot.sf(road_data_clusters, add = TRUE): ignoring all but the first
#> attribute
plot(road_data_junctions, add = TRUE)
plot(road_data_major$geometry, col = "black", add = TRUE)
Once matching is completed is it possible to map the number of crashes or casualties for each road and junction over the last 10 years.
The definition of a ‘junction’ depends on a number of factors, including the way that junctions are represented in the input data and, when aggregating nearby junction entities e.g. on a roundabout, the threshold distance beyond which two junctions are treated separately.
To do this in a reproducible way, the osm_get_junctions()
function was developed, details of which can be found on the trafficalmr website at saferactive.github.io/trafficalmr.
The cluster_junctions()
function was developed to cluster junctions.
Different geographic approaches for defining junctions using OSM data as an input are shown in Figure 4.4.
We have considered a range of scenarios that we plan to implement in the next months. These are at a conceptual level but each could be implemented based on the data we have. Scenarios could be implemented either as ‘global’ scenarios, as implemented in the Propensity to Cycle Tool (Lovelace et al. 2017), as area-wide interventions or (more challenging) as specific intervention on specific roads (e.g. Chapeltown Road in Leeds becomes one way for motor traffic freeing-up space for a 2 way cycleway and increased space for walking).
We have a series of improvements and further investigations in progress or planning, that will be needed to reach the next stage of the project.
Duchon, Jean. ‘Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spaces’. In Constructive Theory of Functions of Several Variables, edited by Walter Schempp and Karl Zeller, 85–100. Lecture Notes in Mathematics. Berlin, Heidelberg: Springer, 1977. https://doi.org/10.1007/BFb0086566.
Wood, Simon N. ‘Thin Plate Regression Splines’. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65, no. 1 (2003): 95–114. https://doi.org/10.1111/1467-9868.00374.
Wood, Simon N., Yannig Goude, and Simon Shaw. ‘Generalized Additive Models for Large Data Sets’. Journal of the Royal Statistical Society: Series C (Applied Statistics) 64, no. 1 (2015): 139–55. https://doi.org/10.1111/rssc.12068.
Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). https://doi.org/10/gfgzf7.