Evaluation and prediction of patterns and behaviours of micromobility in the city of Lisbon



To support new planning and management approaches altogether with new tools to evaluate impact and prediction of behaviours.


Research question

  • What is the estimated docks occupation ratio in each bike station in periods of 3 hours along a week??
Analytical Service Specifications

Analytical Service Specifications

Learn More
This image for Image Layouts addon

Analytical Model Code

Learn More
This image for Image Layouts addon

Analytical Service Dashboard

Learn More

Challenge Brainstorming Session


Business understanding

For the definition of the use case #1 Micromobility, several meetings were made with EMEL (Empresa de mobilidade e estacionamento de Lisboa), a municipal company of mobility and parking in Lisbon that supports the municipality in parking and mobility management. EMEL is responsible to manage a docked bike sharing service called GIRA.

The main objectives for the implementation of this service are:

  1. be an alternative to public transportation;
  2. contribute to a more accessible city;
  3. contribute to a city less polluted; and
  4. contribute to a city with less noise.

Service users can pick up classical or electrical bikes via an app that provides all the interaction with the service provider. Users can subscribe the service in three ways, on a yearly, monthly, and daily basis, being included in the subscription the first 45 minutes of use. In the meetings was discussed the necessity by EMEL to have a service that can provide a prediction of docks occupation ratio in short-time periods.

The main goal of this service is to provide information about the bike stations that need bike rebalancing due to the lack of bikes in a bike station (not allowing a bike pick-up), and due to the excess of bikes in a bike station (that could not allow bike drop-offs), obliging the user to drop-off the bike in a near bike station. Taking this into consideration was defined the following use case: Prediction of bike docks occupation ratio in periods of 3 hours by bike station during a week.


Data understanding

For the development of the #1 Micromobility use case was collected data from the GIRA docks occupation of 2020 from 84 bike stations, collected from the Lisbon Open Data Portal (http://lisboaaberta.cm-lisboa.pt/index.php/pt/). This data provides information about the bike stations location, number of available bikes and number of empty docks along with occupation ratio.

Influence factors have been addressed by several authors, to understand their impacts in bike sharing demand to provide useful information that could be used for the expansion of bike stations network. For example the count and distance to public transport means (B. Wang & Kim, 2018; K. Wang & Chen, 2020); proximity and percentage to built environment and land use (Kabak, Erbaş, Çetinkaya, & Özceylan, 2018), census data (Hyland, Hong, Pinto, & Chen, 2018), and weather factors as temperature and precipitation (Martinez, 2017; K. Wang & Chen, 2020; Zhu, Zhang, Kondor, Santi, & Ratti, 2020) showing that weather factors are key determinants for cycling (Soheil, Paleti, Balan, & Cetin, 2020). Accordingly, with Eren & Uz (2020) there is a positive correlation between the 0 ºC and 30 ºC with bike sharing services demand, being more evident this correlation between the 20 ºC and 30 ºC, while above approximately 30 ºC the conclusions from several studies stated that there is a negative relation. As an example, Kim (2018) concluded that temperatures above 30 ºC, have a negative impact in bike sharing services demand, while Miranda-Moreno and Nosal (2011), a temperature of 28 ºC. As for precipitation, it is considered an adverse factor regarding bike sharing services demand having a negative relation with their demand (El-Assi, Salah Mahmoud, & Nurul Habib, 2017; Shen, Zhang, & Zhao, 2018; Sun, Chen, & Jiao, 2018). It was stated by Lu et al. (2018), that take about three hours after heavy rainfall, to rebound again the mean demand by bike sharing services.

In that sense, weather data for 2020 from three weather stations in Lisbon (Tapada da Ajuda, Geofísico and Gago Coutinho) and provided by the Portuguese Institute for Sea and Atmosphere (IPMA), was also collected. In Table 1 is presented the information collected from the dock's occupation and the weather information from Lisbon. In Figure 1 is presented the bike stations and weather stations spatial location.

Table 1. Docks and weather information collected for the development of use case #1 – Micromobility
Figure 1. GIRA service bike stations and weather stations spatial distribution
Figure 2. Daily mean occupation ratio for 84 bike stations of the GIRA service in 2020. Daily mean temperature for 2020. The grey area represents the period of the first lockdown in Portugal due to the COVID-19 pandemic (between March 19 and May 2)

The mean occupation ratio during 2020 for the 84 bike stations is presented in Figure 2, along with the mean temperature.

During 2020 the occupation ratio decreased along the year. In the beginning of the year were recorded the highest occupation ratio values with lowest mean temperatures.

Between July and November, despite the decrease in the mean temperature, the occupation ratio also decreased, indicating a higher demand by the GIRA service.


Data preparation

In the data preparation phase, were aggregated the datasets, and selected the features for modelling. As the information about docks occupation is collected every 20 minutes, first was aggregated in periods of one hour to allow correspondence with weather data that is collected hourly.

The number of empty and occupied docks, along with the occupation ratio was aggregated hourly using the mean as aggregation function.

Data of temperature and precipitation of three weather stations in Lisbon (Tapada da Ajuda, Geofísico and Gago Coutinho) (Figure 3) was added to the nearest bike station using the Euclidean distance. This allowed link the data of docks occupation with weather data of the closest weather station, through the timestamp and the unique identifier of the nearest weather station.

The number of empty and occupied docks, occupation ratio, temperature and precipitation was aggregated in periods of 3 hours (using the mean as aggregation function) for each bike station ([0 – 3[; [3 – 6[; [6 – 9[; [9 – 12[; [12 – 15[; [15 – 18[; [18 – 21[; [21 – 00[), to consider the periods of more bike demand (i.e., morning and end of the afternoon periods). As patterns on docks occupation are different depending on the weekday, it was created a dummy variable to identify business days and weekends in the dataset.

In Table 2 are presented the features used for modelling the use case for #1 Micromobility.

Table 2. Features selected for modelling the use case #1 Micromobility


Forecasting bike sharing demand, more concretely in a docked bike sharing service is essential for their management, namely in the optimization of bike rebalancing operations and to plan the expansion of the stations network. Accordingly with Soheil et al. (2020), the studies for prediction of bike sharing services demand, try to correlate a series of external factors (i.e. weather, infrastructural, and socio-demographic) with bike demand, through different approaches , regarding time-horizon and spatial resolution, modelling methodology, and explanatory variables considered.

Prediction models developed have different time-horizons and different spatial resolutions, accordingly with the studies objective, that can go from predictions at station level with sub-hour granularity (Z. Yang et al., 2016); at station level with three different time levels (1, 5 and 10 minutes) (B. Wang & Kim, 2018); in clusters of stations on an hourly basis (Chen et al., 2016).

Several prediction methods have been developed to predict bike sharing demand. For example Giot & Cherrier (2014), tested several regression methods to predict in an hourly basis for a period of 24 hours bike sharing demand founding that the regression models are sensitive to overfitting but performs better than the baseline models and Sathishkumar et al. (2020) that among several regression models Gradient Boosting Machine provided the best results.

Deep learning approaches like convolutional neural networks, using weather data in the modelling process, have shown better results when compared with a baseline neural network and an ARIMA model at station level predictions (H. Yang, Xie, Ozbay, Ma, & Wang, 2018). Stochastic models, like the one developed by Zhou et al. (2018), a Markov Chain Model in which was obtained good forecasting results at station level. Other authors tested several machine learning models to predict bike sharing services demand, namely Xu et al. (2020) which combined a self-organizing mapping network with a regression tree (RT), or Gao & Lee (2019), that combined a fuzzy C-means and a genetic algorithm with a back propagation network to estimate bike sharing services demand at station level.

Also approaches like time series models that consider the temporal dynamic of bike sharing services demand have been also implemented, more concretely to predict short term demand. One of the more used time series approach are based on auto-regressive integrated moving average (ARIMA) models. ARIMA models have been used to analyze time-series to load forecasting due to their accuracy and mathematical soundness (Contreras, Espínola, Nogales, & Conejo, 2003) and have been used in several scientific domains for prediction and forecast, namely economy Siami Namin & Siami Namin (2018), environment (Aasim, Singh, & Mohapatra, 2019), health (Benvenuto, Giovanetti, Vassallo, Angeletti, & Ciccozzi, 2020) and demand (Fattah, Ezzine, Aman, El Moussami, & Lachhab, 2018). In the context of bike sharing services demand ARIMA models have been used by some authors to compare their performance with other machine learning models (C. Xu, Ji, & Liu, 2018; Y. Yang, Heppenstall, Turner, & Comber, 2020), providing lower accuracies than machine learning models. However as stated by Y. Yang et al. (2020), there are some variants of ARIMA that can incorporate seasonality, that could lead to improved results, namely Seasonal Auto Regressive Integrated Moving Average (SARIMA) models, and also Seasonal Auto Regressive Integrated Moving Average with Exogenous factors (SARIMAX) models. SARIMAX models integrate seasonality in the moving average, along with exogenous factors, added to the model as additional regressors that serve as correction for autocorrelated errors, incorporating seasonal effects in the forecast of the dependent variable that could be relevant for the phenomenon in study. Despite the potential advantages of SARIMAX models when compared with ARIMA models, in the literature there is a general absence of studies that develop and explore the possible potential of SARIMAX models in the forecast of bike sharing services demand.

For the use case of micromobility, the main goal is to estimate the docks occupation ratio in periods of 3 hours for each bike station. The modelling of this use case was done through the training testing of Autoregressive Integrated Moving Average (ARIMA) models (Box, Jenkins, & Reinsel, 2008), namely:

  1. without seasonal and exogenous component (ARIMA);
  2. with exogenous component, Autoregressive Integrated Moving Average with Exogenous Factors (ARIMAX);
  3. with seasonal component, Seasonal Autoregressive Integrated Moving Average (SARIMA); and
  4. with seasonal and exogenous component, Seasonal Autoregressive Integrated Moving Average with Exogenous Factors (SARIMAX).

For the ARIMAX and SARIMAX models were used as exogenous variables, temperature, precipitation, and a flag identifying business days.

The models were trained with data from 1/09/2020 to 23/11/2020 and tested with data from 24/11/2020 to 30/11/2020. In Figure 3 is presented an example of the results for bike station 421 – Alameda D. Afonso Henriques, using ARIMA, ARIMAX, SARIMA and SARIMAX model with the parameters at station level.

Figure 3. Observed and predicted values for training (top figure) and test datasets (bottom figure) using ARIMA, ARIMAX, SARIMA and SARIMAX models for bike station 421 – Alameda D. Afonso Henriques, with the time-series parameters at station level
Figure 4. Observed and predicted values of SARIMAX model for the train (top figure) and test (bottom figure) sample for bike station 421 - Alameda D. Afonso Henriques

In Figure 4 is presented the observed and predicted values using a SARIMAX model for the train and test data regarding the number of occupation docks.

The performance of the ARIMA, ARIMAX, SARIMA and SARIMAX models were assessed through the mean absolute error (MAE), root mean squared error (RMSE), and the mean absolute percentage error (MAPE) (de Myttenaere, Golden, Le Grand, & Rossi, 2016).

In Table 3 are presented the mean values for MAE, RMSE and MAPE for the 84 bike stations analysed. Considering the forecast quality based on MAPE value, using the scale developed by Lewis (1982) (i.e., Highly accurate forecast (MAPE <= 10%); Good forecast (MAPE = 11% - 20%); Reasonable forecast (MAPE = 21% - 50%); Inaccurate forecast (MAPE >= 50%), in mean all models presented a reasonable forecast.

Table 3. Mean MAE, mean RMSE and mean MAPE for the 84 bike stations analysed


To validate the results with the Lisbon City Council departments, namely EMEL and the Mobility Department, was elaborated a dashboard with several reports based on a star-schema dimensional model.



Aasim, Singh, S. N., & Mohapatra, A. (2019). Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renewable Energy, 136, 758–768. https://doi.org/10.1016/j.renene.2019.01.031

Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., & Ciccozzi, M. (2020). Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief, 29, 105340. https://doi.org/10.1016/j.dib.2020.105340

Box, G., Jenkins, G., & Reinsel, G. (2008). Time Series Analysis (4th Editio). Hoboken.

Chen, L., Zhang, D., Wang, L., Yang, D., Ma, X., Li, S., … Jakubowicz, J. (2016). Dynamic cluster-based over-demand prediction in bike sharing systems. UbiComp 2016 - Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 841–852. https://doi.org/10.1145/2971648.2971652

Contreras, J., Espínola, R., Nogales, F. J., & Conejo, A. J. (2003). ARIMA models to predict next-day electricity prices. IEEE Transactions on Power Systems, 18(3), 1014–1020. https://doi.org/10.1109/TPWRS.2002.804943

de Myttenaere, A., Golden, B., Le Grand, B., & Rossi, F. (2016). Mean Absolute Percentage Error for regression models. Neurocomputing, 192, 38–48. https://doi.org/10.1016/j.neucom.2015.12.114

El-Assi, W., Salah Mahmoud, M., & Nurul Habib, K. (2017). Effects of built environment and weather on bike sharing demand: a station level analysis of commercial bike sharing in Toronto. Transportation, 44(3), 589–613. https://doi.org/10.1007/s11116-015-9669-z

Eren, E., & Uz, V. E. (2020, March 1). A review on bike-sharing: The factors affecting bike-sharing demand. Sustainable Cities and Society, Vol. 54. Elsevier Ltd. https://doi.org/10.1016/j.scs.2019.101882

Fattah, J., Ezzine, L., Aman, Z., El Moussami, H., & Lachhab, A. (2018). Forecasting of demand using ARIMA model. International Journal of Engineering Business Management, 10, 1–9. https://doi.org/10.1177/1847979018808673

Gao, X., & Lee, G. M. (2019). Moment-based rental prediction for bicycle-sharing transportation systems using a hybrid genetic algorithm and machine learning. Computers and Industrial Engineering, 128, 60–69. https://doi.org/10.1016/j.cie.2018.12.023

Giot, R., & Cherrier, R. (2014). Predicting bikeshare system usage up to one day ahead. 2014 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS), 22–29. https://doi.org/10.1109/CIVTS.2014.7009473

Hyland, M., Hong, Z., Pinto, H. K. R. de F., & Chen, Y. (2018). Hybrid cluster-regression approach to model bikeshare station usage. Transportation Research Part A: Policy and Practice, 115, 71–89. https://doi.org/10.1016/j.tra.2017.11.009

Kabak, M., Erbaş, M., Çetinkaya, C., & Özceylan, E. (2018). A GIS-based MCDM approach for the evaluation of bike-share stations. Journal of Cleaner Production, 201, 49–60. https://doi.org/10.1016/j.jclepro.2018.08.033

Lewis, C. D. (1982). Industrial and business forecasting methods. London: Butterworths.

Lu, M., Hsu, S. C., Chen, P. C., & Lee, W. Y. (2018). Improving the sustainability of integrated transportation system with bike-sharing: A spatial agent-based approach. Sustainable Cities and Society, 41, 44–51. https://doi.org/10.1016/j.scs.2018.05.023

Martinez, M. (2017). The Impact Weather Has on NYC Citi Bike Share Company Activity The Impact Weather Has on NYC Citi Bike Share Company Activity. 4(1).

Sathishkumar, V. E., Park, J., & Cho, Y. (2020). Using data mining techniques for bike sharing demand prediction in metropolitan city. Computer Communications, 153, 353–366. https://doi.org/10.1016/j.comcom.2020.02.007

Shen, Y., Zhang, X., & Zhao, J. (2018). Understanding the usage of dockless bike sharing in Singapore. International Journal of Sustainable Transportation, 12(9), 686–700. https://doi.org/10.1080/15568318.2018.1429696


Soheil, S., Paleti, R., Balan, L., & Cetin, M. (2020). Real-time prediction of public bike sharing system demand using generalized extreme value count model. Transportation Research Part A: Policy and Practice, 133, 325–336. https://doi.org/10.1016/j.tra.2020.02.001

Sun, F., Chen, P., & Jiao, J. (2018). Promoting public bike-sharing: A lesson from the unsuccessful Pronto system. Transportation Research Part D: Transport and Environment, 63, 533–547. https://doi.org/10.1016/j.trd.2018.06.021

Wang, B., & Kim, I. (2018). Short-term prediction for bike-sharing service using machine learning. Transportation Research Procedia, 34, 171–178. https://doi.org/10.1016/j.trpro.2018.11.029

Wang, K., & Chen, Y. J. (2020). Joint analysis of the impacts of built environment on bikeshare station capacity and trip attractions. Journal of Transport Geography, 82. https://doi.org/10.1016/j.jtrangeo.2019.102603

Xu, C., Ji, J., & Liu, P. (2018). The station-free sharing bike demand forecasting with a deep learning approach and large-scale datasets. Transportation Research Part C: Emerging Technologies, 95, 47–60. https://doi.org/10.1016/j.trc.2018.07.013

Xu, T., Han, G., Qi, X., Du, J., Lin, C., & Shu, L. (2020). A Hybrid Machine Learning Model for Demand Prediction of Edge-Computing-Based Bike-Sharing System Using Internet of Things. IEEE Internet of Things Journal, 7(8), 7345–7356. https://doi.org/10.1109/JIOT.2020.2983089

Yang, H., Xie, K., Ozbay, K., Ma, Y., & Wang, Z. (2018). Use of Deep Learning to Predict Daily Usage of Bike Sharing Systems. Article Transportation Research Record, 2672(36), 92–102. https://doi.org/10.1177/0361198118801354

Yang, Y., Heppenstall, A., Turner, A., & Comber, A. (2020). Using graph structural information about flows to enhance short-term demand prediction in bike-sharing systems. Computers, Environment and Urban Systems, 83, 101521. https://doi.org/10.1016/j.compenvurbsys.2020.101521

Yang, Z., Hu, J., Shu, Y., Cheng, P., Chen, J., & Moscibroda, T. (2016). Mobility modeling and prediction in bike-sharing systems. MobiSys 2016 - Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, 165–178. https://doi.org/10.1145/2906388.2906408

Zhou, Y., Wang, L., Zhong, R., & Tan, Y. (2018). A Markov Chain Based Demand Prediction Model for Stations in Bike Sharing Systems. https://doi.org/10.1155/2018/8028714

Zhu, R., Zhang, X., Kondor, D., Santi, P., & Ratti, C. (2020). Understanding spatio-temporal heterogeneity of bike-sharing and scooter-sharing mobility. Computers, Environment and Urban Systems, 81. https://doi.org/10.1016/j.compenvurbsys.2020.101483