# 1 Introduction

This report outlines progress on the SaferActive project, which aims to provide a strong evidence base for interventions that simultaneously boost walking and cycling and greatly improve road safety. An overview of the project’s context and aims can be found at the open access project page github.com/saferactive/saferactive. This is the 2nd quarterly report, building on the 1st SaferActive report, published July 2020. Overall we have been working on the following areas:

• Development of the trafficalmr R package, adding functionality for cleaning and visualising road crash data (see saferactive.github.io/trafficalmr for details)
• Collecting and analysing data on cycling levels from the DfT’s traffic count data, recently updated to include 2019 data and undertaking spatio-temporal analysis of cycling resulting from these estimates, building on our work in Report 1 (Section 8), which provided geographic estimates of cycling uptake.
• Developing new methods for the geographic analysis and visualisation of road safety outcomes
• Exploring additional datasets to use as ‘explanatory variables’ to assess the impacts of different interventions

# 2 Software development

One of the objectives of the project was to develop software to enable access to data on traffic calming interventions. We have progressed with the development of the trafficalmr package for reproducible road safety analysis, supporting the project and other projects on road safety. We would like feedback from stakeholders: is it easy to use? what additional functionality would you like to see? The list of functions provided by the package can be found here: https://saferactive.github.io/trafficalmr/reference/index.html

# 3 Analysis of cycle counter data

We analysed the DfT’s manual traffic count data to explore its potential to provide estimates of the spatio-teamporal estimates of cycling levels — measured in km (or billion km, bkm, for compatibility with international road safety research) cycled per year — over time. A key component of this work was the expansion of the geographical scope from our London case study, to cover the whole of England, Wales and Scotland.

## 3.1 Year-to-year changes in the location and number of count points

Over the period 2000-2019, there is an uneven number of locations where annual average daily flow (AADF) estimates are available, as shown in Figure 3.1.

Figure 3.1 reveals the number of count points has fluctuated from year to year, with over 14,000 count points in 2008, but less than 7000 counts in 2014. In addition, there has been a high degree of mobility of cycle count points over the last two decades. There are >2000 locations that have been sampled just once over the years 2000-2019, and >4000 locations that have been sampled in just two of these years. Relatively few locations have been sampled in more than 11 years. This flux in the number and location of count points makes it more difficult to assess changes in cycling uptake, because we do not have a consistent dataset from one year to the next. However, sampling location selection has been more consistent over the period 2010-2019 so we have focused our analysis on this ten-year period.

## 3.3 Mean normalised change in AADF at Local Authority level

As we have seen, the count point locations are not stable, and not all locations are surveyed every year. This presents difficulties when assessing change through time, since an apparent increase in cycling in a given year may simply be due to the selection of count points on busy roads that year.

To avoid problems associated with this, we assessed relative change at each count point, by calculating, for each year the point was surveyed, the relative divergence from the mean AADF across all years at that count point. Figure 3.6 shows these relative changes in weighted mean AADF, in the same way as absolute changes in AADF are shown in Figure 3.3.

We then calculated the mean of these values for each Local Authority and year, and normalised these around a baseline year of 2011.

The results presented in Figure 3.7 can also be represented as an animated map, as shown in Figure 3.8. This provides an approximation of the relative change in cycling levels compared the the 2011 baseline, for which we have good data from the 2011 Census. The results shown if Figure 3.8 show that, according to DfT’s seasonally adjusted ‘AADF’ (annual average daily flow) dataset dataset, cycling has tended to grow in densly populated, cosmopolitan areas such as London (especially central north London boroughs), Bristol, Manchester and Leeds.

## 3.4 Generalized Additive Models of AADF

Generalized Additive Models were chosen for their flexibility to accommodate non-linear trends in spatial and temporal variables, and the ability to include random effects within the model. They also allow the production of easily interpretable partial effects for each model parameter. We used the function bam() in the mgcv R package (Wood et al. 2014) which is specifically designed for use with large datasets as it is less computationally intensive than other methods.

The models currently use the absolute count data as the response variable, and follow a negative binomial error distribution with a log link function.

### 3.4.1 London model

Initially, GAM models were developed for London only. Using the raw hourly count data all cycle count points in London over the years 2010-2019, we produced a GAM model with smooth terms for year, day of the year, hour, space, road category, and an interaction term for year and space.

Preliminary results, based on a General Additive Model (GAM) are shown in Figure 3.9.

### 3.4.2 National model

Building on results for London, we scaled-up the approach to provide nationwide spatio-temporal estimates of cycling. For the national GAM model we simplified the inputs, basing it on the AADF data rather than the raw hourly count data.

The model has smooth terms for the variables year, space and road category, each of which were found to be significant at p<0.05. This model uses all counts during the years 2010-2019, not just the counts from locations that were sampled every year. This allows us to make full use of the available data, with all road types represented.

The smooth term for year uses a cubic regression spline with 5 knots. The cubic spline utilises a low number of knots spread at even intervals through the range of parameter values. This helps to prevent overfitting.

British National Grid eastings and northings are represented in a duchon spline. This is a generalization of a thin plate spline (Duchon 1977; Wood 2003), which allows for two dimensional smoothing. We specified 100 knots for this spline, enabling the representation of complex spatial patterns in cycling levels across the UK.

Road category is included as a random effect in the model. Its inclusion is vital given that cycling levels vary greatly according to the type of road surveyed, and the number of counts from roads of each category varies from year to year.

The partial effects of year are shown in Figure 3.10.

The partial effects of space are shown in Figure 3.11.

## 3.5 Reliability of count data

The analysis above shows that the although the count data is ‘noisy’, it has potential to provide an indicator of relative change in cycling levels at regional to local authority levels. A next step, outlined in the final section of this report, is to assess how large the confidence intervals are associated with this dataset, by comparing the relatively sparse DfT count data with larger external datasets including TfL’s open cycle counter network data.

# 4 Geographic data analysis

Building on methods we initially developed for the CyIPT project, we developed new techniques for allocating crashes to features on the road network, such as junctions and road sections. Preliminary results comparing CyIPT results and results from this project are shown in Figure 4.1.

The objective of this part of the project is to increase to measure to level of risk of collisions at a very high level of spatial detail. Ideally we seek to use historical data to identify specific roads and junctions which are especially dangerous and thus need to be fixed.

Junctions are similarly simplified in a three stage process:

1. All junction points are extracted from the input data (OSM in the case study example).
2. Junction points are clustered together to reflect how we naturally think about junctions. For example, a roundabout is a single large junction not 4 - 10 small junctions arranged in a circle. Here a balance has to be struck between making the clusters large enough that big complex junctions are grouped together, while not being so large that densely packed urban streets become a single super-junction. We found a 15 metre buffer diameter to work well for coalescing junction clusters when focussing on relatively small case study areas (e.g. a single small local authority) and 30 m diameter for a larger study region (e.g. all or London), as shown in Figure 4.4.
3. Associate the point crash data with the appropriate road or junction. Here we had to balance accuracy against performance. Measuring the distance between every crash and every road would take an inordinate about of time. We developed a computationally efficient solution, documented in the script osm_cleaning.R.

A reproducible example showing how the junction identification method works is shown in the code chunk below, which starts by installing the trafficalmr package and uses a number of other functions developed for this project:

remotes::install_github("saferactive/trafficalmr")
library(trafficalmr)
nrow(tc_data_osm)
#> [1] 499
#> [1] 125
plot(road_data$geometry, col = "grey") plot(road_data_clusters, add = TRUE) #> Warning in plot.sf(road_data_clusters, add = TRUE): ignoring all but the first #> attribute plot(road_data_junctions, add = TRUE) plot(road_data_major$geometry, col = "black", add = TRUE)

Once matching is completed is it possible to map the number of crashes or casualties for each road and junction over the last 10 years.

The definition of a ‘junction’ depends on a number of factors, including the way that junctions are represented in the input data and, when aggregating nearby junction entities e.g. on a roundabout, the threshold distance beyond which two junctions are treated separately. To do this in a reproducible way, the osm_get_junctions() function was developed, details of which can be found on the trafficalmr website at saferactive.github.io/trafficalmr. The cluster_junctions() function was developed to cluster junctions. Different geographic approaches for defining junctions using OSM data as an input are shown in Figure 4.4.

# 5 Scenario development

We have considered a range of scenarios that we plan to implement in the next months. These are at a conceptual level but each could be implemented based on the data we have. Scenarios could be implemented either as ‘global’ scenarios, as implemented in the Propensity to Cycle Tool (Lovelace et al. 2017), as area-wide interventions or (more challenging) as specific intervention on specific roads (e.g. Chapeltown Road in Leeds becomes one way for motor traffic freeing-up space for a 2 way cycleway and increased space for walking).

• Traffic calming: the creation of traffic calming interventions of the type seen in the CID
• 20s Plenty: The roll-out of 20 mph zones in areas and local authorities
• Cycleways: The construction of protected cycleways in an area
• Low traffic: Reduction in traffic, e.g. due to congestion charge
• Oneway streets: the conversion of roads from 2 way to one way streets, e.g. as done in Torrington place
• LTNs: If every residential zone was a Low Traffic Neighbourhood

# 6 Next steps

We have a series of improvements and further investigations in progress or planning, that will be needed to reach the next stage of the project.

• Adapt the GAM models to use relative change in cycle counts, rather than the raw counts. This should improve the accuracy of the partial effects curves, enabling improved estimates of changes in cycle uptake through time and space.
• Use our annual Local Authority level estimates of exposure together with similar estimates of collision rates, to assess temporal and spatial changes in collision risk across the country. This is primarily analysed as KSI/bkm.
• Validate the cycle count-based exposure estimates, by comparing DfT counters with TfL counters for the years 2015-2019, for which TfL data are available.
• Develop case studies of the impact of various traffic calming or road safety measures, and use these to illuminate the scenarios described in the previous section.
• Launch our scalable web application.

# 7 References

Duchon, Jean. â€˜Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spacesâ€™. In Constructive Theory of Functions of Several Variables, edited by Walter Schempp and Karl Zeller, 85â€“100. Lecture Notes in Mathematics. Berlin, Heidelberg: Springer, 1977. https://doi.org/10.1007/BFb0086566.

Wood, Simon N. â€˜Thin Plate Regression Splinesâ€™. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65, no. 1 (2003): 95â€“114. https://doi.org/10.1111/1467-9868.00374.

Wood, Simon N., Yannig Goude, and Simon Shaw. â€˜Generalized Additive Models for Large Data Setsâ€™. Journal of the Royal Statistical Society: Series C (Applied Statistics) 64, no. 1 (2015): 139â€“55. https://doi.org/10.1111/rssc.12068.

Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). https://doi.org/10.5198/jtlu.2016.862.