WASTE MANAGEMENT

Identification of patterns/profiles and solid waste production prediction in the city of Lisbon. 

 

Objective
To identify patterns to support the prediction of the production of urban waste associated with a variety of context information (e.g. events, climate situation, etc.)


Research question

What is the social and economic profile of the citizens that produce more undifferenciated waste?

What is the predicted quantity of produced undifferenciated waste that have to be collected by the municipality on a weekly basis?

 

Challenge

 

Challenge Brainstorming Session

 

Data understanding

Image
Image

 

Undiferentiated waste collection by freight in 114 circuits for 2018 and 2019. 

Image

Data preparation

From the location of the collection points for each circuit, was elaborated a map with the zones covered by each circuit through Thiessen polygons.

Image
Image

Indiferentiated waste collection by freight in each circuit for 2018 and 2019.

Image

Waste collection data was aggregated to the areas that define each circuit, along with the contextual data.

Image

Undifferenciated waste production profile

Through correlation analysis was identified the profile of undiferentiated waste producers in Lisbon:

  • Families with dependents from younger ages to young adults
  • Residents looking for job
  • Residents with more than 10 years that don’t know to write or read

According whit this profile residents with less education produce more undiferentiated waste. Will be because they recycle less?

 

Modeling

To estimate the production of indiferentiated waste on a weekly basis for each circuit several regression alghoritms were trained.

Capturar3

 

Evaluation: Machine Learning

 

 

 

 

LinearRegression:

  • RMSE: 278370 kg
  • MAE: 213160 kg
  • MAE per capita (anual): 237.14 kg = 0,63 kg per day
  • MAPE: 0.15
Image

Evaluation: Time series

Results (ARIMA):

  • Best parameters (2, 0, 4)
  • AIC = 1302,77 kg

 

Image
Image

Evaluation

  • To assess if the residuals have spatial autocorrelation, Global Moran’s I was computed;
  • The residuals for the OLS model present a clustered pattern;
  • The residuals for the random forest present a random pattern.

Ordinary Least Squares Imagem10

Image

Random forest regressor

Imagem11

Image

 

 

 

 

 

The residuals present a gaussian distribution for KNN and Random forest.

Image
  • To test another approach a clustering was made based on the maximization of the distances of the cumulative distibution function (cdf) of the waste weight of each circuit through;
  • Kolmogorov – Smirnov test;
  • Three clusters (distributions) were identified.

without clustering

Image

clustering

Image

To test another approach a clustering was made based on the maximization of the distances of the density function of the waste weight of each circuit through Kolmogorov – Smirnov test
The clusters were used as explanatory variables along with the variables: number of families with two or more unemoployed; comercial buildings; cluster 0; cluster 1

Capturar6

There was a substantial increase in performace for OLS (0,37 to 0,6).