A Code
All the R code used in this thesis, as well as the nested SQL code, is bundled in the dockless
package (dockless?). Its source code, including documentation for each function, can be found on GitHub, through the following link: https://github.com/luukvdmeer/dockless. An R version of at least 2.10 is required. The package is optimized for the case study in San Francisco, but can easily be adapted to other systems.
The JUMP Bikes database is not openly accessible. Therefore, to query the data, and pre-process them on the database server, database credentials are needed. Please contact the author for more information. However, to enable reproducibility, all necessary pre-processed datasets have been included in the package. These are the following:
distancedata_centroids
: an object of classdockless_dfc
containing the distance data for all 249 grid cell centroids, during the training period.distancedata_modelpoints
: an object of classdockless_dfc
containing the distance data for all 4 model points, during the training period.distancedata_testpoints
: an object of classdockless_dfc
containing the distance data for all 500 test points, during the test period, and the two weeks before.usagedata_train
: an object of classsf
with POINT geometry, containing all calculated pick-ups during the training period.usagedata_test
: an object of classsf
with POINT geometry, containing all calculated pick-ups during the test period.testpoints
: an object of classsf
with POINT geometry, containing all location-timestamp combinations of the 500 test points.systemarea
: an object of classsf
with POLYGON geometry, containing the geographical outline of the JUMP Bikes system area in San Francisco.
The dockless
package can be downloaded from github with the following code. Please make sure that the devtools
package is installed in advance.
::install_github('luukvdmeer/dockless') devtools
Then, the complete analysis can be reproduced as follows. Furthermore, reproducible scripts for all tables and figures in chapter 5 can be found through the following link: https://github.com/luukvdmeer/dockless/tree/master/scripts
require(dockless)
require(sf)
## ----------------------- CLUSTER LOOP --------------------------
# Create grid
= dockless::create_grid(
gridcells area = systemarea,
cellsize = c(500, 500)
)
# Calculate grid cell centroids
= gridcells %>%
gridcentroids ::project_sf() %>%
dockless::st_centroid() %>%
sf::st_transform(crs = 4326)
sf
# Usage intensity per grid cell
$intensity = dockless::usage_intensity(
gridcellsusage = usagedata_train,
grid = gridcells
)
# Add intensity information to grid cell centroids
$intensity = gridcells$intensity
gridcentroids
# Cluster
= dockless::spatial_cluster(
clusters data = distancedata_centroids,
grid = gridcells,
area = systemarea,
K = c(3:10),
omega = seq(0, 1, 0.1)
)
# Add cluster information to grid cells and grid cell centroids
$cluster = clusters$indices
gridcells$cluster = clusters$indices
gridcentroids
# Create model points
= dockless::create_modelpoints(
modelpoints centroids = gridcentroids
)
## ------------------------ MODEL LOOP ---------------------------
# Build models
= dockless::build_models(
models data = distancedata_modelpoints,
auto_seasonality = TRUE,
seasons = list(NULL, 96, 672, c(96, 672))
)
## ---------------------- FORECAST LOOP --------------------------
# Forecast test points with DBAFS and NFS
= dockless::forecast_multiple(
forecasts_dbafs data = distancedata_testpoints,
method = 'DBAFS',
points = testpoints,
models = models
)
= dockless::forecast_multiple(
forecasts_nfs data = distancedata_testpoints,
method = 'NFS',
points = testpoints
)
# Calculate RMSE's
= dockless::evaluate(
errors_dbafs
forecasts_dbafs,type = 'RMSE',
clusters = testpoints$cluster
)
= dockless::evaluate(
errors_nfs
forecasts_nfs,type = 'RMSE',
clusters = testpoints$cluster
)