Identification of patterns and prediction of parking in the city of Lisbon to improve surveillance efficiency.



To create new models either to predict or to generate viable alternatives for illegal parking in the city.

Research question

What is the risk of illegal parking for a specific road segment and period of the day?

Analytical Service Specifications

Analytical Service Specifications

Learn More
This image for Image Layouts addon

Analytical Model Code

Learn More
This image for Image Layouts addon

Analytical Service Dashboard

Learn More

 Challenge Brainstorming Session


Business understanding

To understand the needs regarding parking in the city of Lisbon, meetings with the Municipal Police have been carried out to delineate the service that was created in the project. In these meetings the Municipal Police showed interest in having a service in which was possible to assess the risk of parking illegalities for a specific road in a certain period of day.

The goal of this service was to make more efficient the dispatch of police officers in patrolling parking illegalities, as until know this dispatchment is based on their historical knowledge and not based on empirical evidence.

Data understanding

For the development of the #3 Parking use case, was provided by the Municipal Police data from 86 152 parking illegalities occurrences, corresponding to the period between 02/01/2017 to 31/12/2020. The details of the information provided is presented in Table 19.

Table 19. Description of the provided parking illegalities data

The spatial distribution of parking illegalities counts was computed through a Kernel Density Estimation (KDE) (Silverman, 1998), and is presented in Figure 27.

Figure 27. KDE of parking illegalities estimated counts in a square grid of 50x50 m cells using parking illegalities data from 2017 to 2020

Data preparation

The datasets presented in Table 20 were chosen as relevant datasets for the development of the #3 Parking use case and were included in the modelling phase.

Table 20. Datasets necessary for the development of the analytical model for #3 Parking use case

As the parking illegalities descriptions were not systematized, was developed a text classification model to create categories based on the descriptions provided by the police officers when was registered a parking illegality.

The text classification model was implemented to classify the description of each parking illegality into one of four classes – on crosswalk, on sidewalk, conditions access, reserved for the disabled, reserved places, others (when the description it does not fit any of the other classes) and unknown (when there is no description).

The text classification model was based on a multi-class logistic regression that receives a DistilBERT (Sanh, Debut, Chaumond, & Wolf, 2019) vector representation of each description and gives a probability of that description belonging to one of the classes, being the class with the highest probability the one chosen.

The rational for this approach was that these parking illegalities categories for one side could improve the predictive capability of the model that was developed and that the predictive model that was developed could also predict the risk of parking illegality accordingly with the above-mentioned parking illegalities categories.

All the data presented in Table 20 was aggregated to the nearest road segment to allow the implementation of the modelling strategy that will be presented in the next section.

In Figure 28 is presented the count of parking illegalities aggregated to the road segments in the city of Lisbon.

Figure 28. Number of parking illegalities associated to the nearest road segment from 2017 to 2020


Several approaches have been developed for the prediction of illegal parking namely using machine learning techniques (J. Gao & Ozbay, 2017; S. Gao et al., 2019; Jiang, Chen, & Hsieh, 2020), or using bike sharing trajectories (He et al., 2018).

The modelling strategy developed for the #3 Parking use case was divided in two stages. In the first stage the probability of the occurrence of illegal parking was computed.

In Table 21 are presented the variables necessary for the development of the first stage of the modelling strategy.

Table 21. Variables used for the computation of parking illegalities probability

The probability of parking illegalities was computed dividing [sum_illegalities] by [count].

In the second stage of the modelling strategy, all combinations of the features [road_id], [temperature], [precipitation], [period], [off_day], and [class_parking] with a count value lower than 100 were discarded, as they were not considered statistically significant.

To estimate a probability for the cases where statistical significance was not met two machine learning algorithms were trained and tested, namely a gradient boosting algorithm - LightGBM (LGBM) (Lv, Lou, Feng, Chen, & Lv, 2021) that uses tree-based learning algorithms, and a deep learning model through Keras framework (Ketkar, 2017).

Both frameworks were implemented in two different steps:

  1. in which was used as a classification algorithm to identify (for the situations in which the combination of features was < 100) the observations were the probability was non null; and
  2. from the identified observations in the previous step, LGBM and Keras were used as a regressor to assign a probability of the occurrence of illegal parking for each observation.

In Table 22 is presented the input data that was used for the application of the classification and regression algorithms.

Table 22. Input data for the prediction of the probability of traffic accidents occurrences in the observations, were the combination of the features [road_id], [temperature], [precipitation], [period], [off_day] and [class_parking] is < 100

To assess the quality of the models was considered the AUC (Huang & Ling, 2005) for the classification models and the MAPE (de Myttenaere et al., 2016) for the regression models. The results of both models considering the classification and regression steps are presented in Table 23.

Considering the results presented in Table 23, the neural network using the Keras framework, presents better results when compared with the gradient boosting. The illegal parking simulator that is presented in the next section was based on the neural network.

Table 23. Quality metrics of the models developed for the #3 Parking use case


To validate the results, was elaborated a dashboard with several reports based on a galaxy-schema dimensional model.

Figure 29. Galaxy schema model defined for the #3 Parking use case


de Myttenaere, A., Golden, B., Le Grand, B., & Rossi, F. (2016). Mean Absolute Percentage Error for regression models. Neurocomputing, 192, 38–48. https://doi.org/10.1016/j.neucom.2015.12.114

Gao, J., & Ozbay, K. (2017). A Data-Driven Approach to Estimate Double Parking Events Using Machine Learning Techniques. Transportation Research Board 96th Annual Meeting, (646).

Gao, S., Li, M., Liang, Y., Marks, J., Kang, Y., & Li, M. (2019). Predicting the spatiotemporal legality of on-street parking using open data and machine learning. Annals of GIS, 25(4), 299–312. https://doi.org/10.1080/19475683.2019.1679882

He, T., Bao, J., Li, R., Ruan, S., Li, Y., Tian, C., & Zheng, Y. (2018). Detecting vehicle illegal parking events using sharing bikes’ trajectories. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/3219819.3219887

Huang, J., & Ling, C. X. (2005). Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 17(3), 299–310.

Jiang, J., Chen, Y. C., & Hsieh, H. P. (2020). Detection of Illegal Parking Events Using Spatial-Temporal Features. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. https://doi.org/10.1145/3397536.3428350

Ketkar, N. (2017). Introduction to Keras. In Deep Learning with Python: A Hands-on Introduction (pp. 97–111). Berkeley, CA: Apress. https://doi.org/10.1007/978-1-4842-2766-4_7

Lv, Z., Lou, R., Feng, H., Chen, D., & Lv, H. (2021). Novel Machine Learning for Big Data Analytics in Intelligent Support Information Management Systems. ACM Trans. Manage. Inf. Syst., 13(1). https://doi.org/10.1145/3469890

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. 2–6. Retrieved from http://arxiv.org/abs/1910.01108

Silverman, B. W. (1998). Density Estimation for Statistics and Data Analysis (1st Editio). https://doi.org/https://doi.org/10.1201/9781315140919