SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
Study And Development Of Ad Hoc Algorithms For Designing Waste Collection Routes: Test Of Capabilities CLAUDIA CABALLINI PIETRO GIRIBONE ROBERTO REVETRIA ALESSANDRO TESTA
DIPTEM – Department of Industrial Production, Technology, Engineering and Modelling Via Opera Pia 15, Genoa, ITALY
[email protected];
[email protected];
[email protected];
[email protected]
Abstract – This paper presents the analysis, development and testing of a Decision Support System (DSS) allowing planning, management and optimization of waste collection operations in an urban context. A proprietary simulator developed in Java™ and composed by four functionality modules (Global Positioning System, Data Mining system, Waste collection points placement optimizer, planner for routing and resource exploitation) was implemented by the Authors, and was then validated on a specific case study thanks to the cooperation with a big town (about 50’000 inhabitants) on Central Italy that was available for testing the simulation model.
Key Words – Decision Support System (DSS), Simulation, Global Positioning System(GPS), Data Mining system, optimizer, planner.
attention. In some big towns especially of central
1. Introduction
and southern Italy, the capacity of disposal and
The problem of waste collection is a problem more
recycling waste has been a critical issue since
and more critical in the modern society, where
long time, and the necessity of differential waste
people is used to have all comforts that they can
collection led to an increase of number of
afford, and where packaging and “durable” goods
collection
themselves are often discarded and not recycled or
repaired.
The
Governments
and
with
trucks
and
other
specialist means to be bought, and an increase in
many
number of collection points was mandatory. These
multinational companies tried to make people
changes
more sensitive to the waste problem, but the
need
to
be
matched
with
traffic
conditions, physical bonds in terms of routing and
global situation itself is a contradiction in terms, on
accessibility, and the necessity of an high service
that subject, and everything we buy seems to be
level anyway, since it’s not possible to leave waste
born to be discarded very soon with no particular
ISSN: 1792-507X
resources,
unattended for days in front of houses or public
270
ISBN: 978-960-474-230-1
SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
places. Hence the need of a simulation tool able to
test, test R2 Lack of Fit test) and feed the form
support decision making in such topic. It’s not the
of artificial neural networks for determining the
first time that DIPTEM – University of Genoa,
response surface (RSM). The forecasting
faces such problem with a simulator, but in this
system is thus defined to be more reliable than
new generation system, many more functionalities
those currently in the literature and specialized
(more linked to modern necessities) were added
software. 3. Optimizing placement of collection points.
respect to the first models developed in the middle of the ‘90s.
Development optimization
of in
a
coherent
"placement"
model
of
Collection
Points based on mathematical programming
2. Structure of the Model
models with integer numbers, developed ad
The simulator consists of four macro features:
hoc and tested by the Authors in several case
1. Geo-referencing roads and attributes through
studies in Northern and Central Italy. This
the SOAP protocol. The integration with geo-
placement is applicable both to the collection
referenced systems using the SOAP protocol,
bins located by the road, both with separate
(Simple Object Access Protocol) allowing to
collection bins or door to door collection.
build GIS (Geographic Information System)
4. Planning routes and fleet for collection.
support skipping the stage of data entry (or at
Development of an integrated planning of the
least drastically reducing it), thus reducing
fleet for collecting MSW based on algorithms
implementation costs. The significant points
and heuristics CRVP (capacitated Vehicle
can still be included (geo-referenced) to the
Routing
digital mapping with the aid of GPS ensuring
Problem)
derived
from
The language used for development is Java
2. Data Mining System on Waste production.
(J2SE) 1.4, the interface with SOAP Web Service
Use of a Data Mining based on Artificial
has been provided by Apache Axis Framework.
Neural Networks and Polynomial Models of
The development of interfaces to enable systems
the Second Order. The module extracts the
integration capabilities above were made on their
data reporting and collection of census data
applications using JBuilder.
(population, non-domestic users, etc.), the map of the relationship between the socio-
3. The Validation Phase
(Municipal Solid Waste). The results are subjected to a thorough statistical analysis (F-
ISSN: 1792-507X
MIT
(Massachusetts Institute of Technology).
the completeness of the information.
economic area and the production of MSW
for
1. Validation of Georeferencing Module
271
ISBN: 978-960-474-230-1
SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
The application allows geo-referencing roads. In particular obtains, by connecting to the GIS Web server, longitude and latitude of each address entered.
Via Maestri del Lavoro 3 (12.91401-42.42176)
The Method called georeferencing has two
Via Antonazzi E 10 (12.89279-42.40738)
different components: The first, given an address, returns the longitude and latitude of the point, taking in account if there is an ambiguous situation (e.g.
more addresses
corresponding
to the
Via Terminillo 3 (12.87915-42.40648)
Figure 1: Results of testing simulator against world
selection). In this case, the user must choose the
wide web GPS tools.
correct one among a list of addresses matching the selection criteria. If nothing is found for the selection criteria, the user can insert manually the
2. Validation of Data Mining System The second application tool uses a data mining
parameters. If no civic number is given to the
system based on artificial neural networks and
system, it will calculate the center of the address
second-order polynomial models. The module
inserted. The second component given a file
extracts the data reporting and collection of
properly formatted of addresses, returns an array
census data (population, non-domestic users,
of addresses. Each address object array follows
etc.), the map of the relationship between the
the logic described above
socio-economic area and the production of MSW.
To validate the Geo-referencing module was necessary
to
test
the
correctness
of
Via Ternana 78 (12.85596-42.42923)
The results are subjected to a thorough statistical
the
analysis (F-test, test R2 Lack of Fit test). If the
assignment of addresses to latitude and longitude:
regression analysis can not find a suitable
a set of data was taken randomly with some
relationship is possible to use neural networks.
logical points and compared with the results given
This case involves an error of approximation
by a popular worldwide website, giving information
typically greater than the Analyzer method. The
on maps and place positioning. The results were
method has two different components: the first
really satisfactory (see Figure 1 where the points
using regression analysis, the second, using
calculated by the simulator and by the most
neural network. Before to run a regression it is
diffused web application are both reported as
necessary to configure the analyzer by method.
signals on the maps).
There is a great number of methodologies aimed to the formulation of laws designed to interpret mathematical relationships, more or less complex,
ISSN: 1792-507X
272
ISBN: 978-960-474-230-1
SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
between the variables (factors, responses) that
remains to observe that the model is built on
determine the behavior of the system. In other
available data, i.e. the selection of variables
words it’s possible to construct a model able to
influencing the system and the characterization of
connect the answer (waste) to factors (non-
the study areas (boxes census) were imposed by
domestic and domestic users) involved in the
the availability of these, and so were not detected
system. In general there is a dependent variable Y
by an aprioristic design. Therefore this approach
(response: MSW produced daily per census box)
has been necessary in the iterative regression test
which depends on K independent variables named
scenarios (in those cases the risk to detect
X1, X2, .., Xk (factors: Hotels, Shopping Centers
relationships between variables, that are not
etc). The relationship between these variables is
significant in reality, is very high) to find the correct
characterized by a mathematical model called
mix of factors able to represent the situation.
"regression
The regression model is represented by the
equation".
The
relationship
is
expressible symbolically by:
following equation:
Y = φ ( X 1 + X 2 + ........ + X k )
Y = b1 * Inhabi tan ts + b2 * Hotels + b3 * Industries + b4 * Other + (b5 * Hotels * Other ) + (b6 * Industries * Others)
It is unknown and it is necessary to choose an appropriate function (use polynomial models) to approximate Φ. Operationally it is necessary to identify the type of relationship (linear, nonlinear) that best approximates the system testing the significance and goodness of fit. Indeed since the model is a polynomial expression, whose order is given, it is our task to attempt to verify the correctness of the hypothesis made (order of the
The coefficients b1, b2, ...., b6 are the correlation coefficients of the model, whose values are presented in Table 1. To estimate the above parameters it was assumed that the errors are random variables with normal distribution and are not correlated. Thus for each coefficient it’s possible to identify a range of confidence which is a function of average and standard deviation.
polynomial adopted). The tests are performed according to the methodology ANOVA (acronym
b1 b2 b3 b4 b5 b6
for Analysis of Variance). Without going into formalisms
of
statistical
techniques
ANOVA,
analyzing the variance of the samples, by analyzing the sample average values in order to
AVG
P value
0,889 272,99 770,30 24,84 -8,169 -134,33
2,78372E-12 0,00604 0,00859 0,01901 0,01721 0,02040
Correlation Coefficients of the Model Std Error -95% 95% 0,07910 92,39 273,91 10,02 3,239 54,86
0,728 84,30 210,89 4,377 -14,78 -246,36
1,051 461,68 1329,7 45,30 -1,554 -22,29
t Stat
VIF
11,24 2,955 2,812 2,479 -2,522 -2,449
1,420 9,120 930,42 3,030 11,19 934,07
Table 1: Correlation Coefficients
achieve a unique test of significance, which can take decisions with a desired degree of risk. It
ISSN: 1792-507X
273
ISBN: 978-960-474-230-1
SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
As shown in Table 2, the significance of the regression is successfully passed, showing the correctness of regressive approach. Source
SS
ANOVA SS%
Regression Residual Total
15405729,71 3323476,85 18729206,56
82 18 100
MS
F
2567621,618 27,81 110783
F Signif 8,87987E11
df 6
Figure 2: production of domestic and non domestic waste
Table 2: Significance of regression
The regression model estimates the production
The correctness of the model is translated into its
percentage from households in 67% of total waste
ability to represent the system itself, as shown in
generated. But to verify the reliability of the model
Figure 2. Regarding the quantities of waste
identified, it’s necessary first of all to make some
produced daily, the difference between simulated
considerations.
data and real ones is about 7% (considering the
In
the
first
analysis,
the
quantitative and qualitative production of waste
average correlation coefficients). Percentage that
variates according to the geographic reference
can be considered as a first approximation the
(north and south of Italy). In addition, the socio-
error of regression model for this test.
economic realities of territory are influencing the type
of
this
production.
Under
the
above
considerations, let’s compare the case study with data from similar reality, but while there are abundant studies on the product breakdown of the waste, research on distribution channels are very scarce. Only some regions in Italy have published studies in this regard, and they are the most advanced from the socio-economic point of view.
Figure 1: real data vs. simulation
The model produced by the neural network is to be used especially when the regressive analytical
The construction of the regressive model is a preliminary identification of the specific production of MSW for domestic and for non-domestic use. Examining the equation previously found, these two factors are clear. It’s possible to spin off the total waste produced daily by type of user. The households controls about 67% of the daily production of waste while the non-domestic 33%.
ISSN: 1792-507X
model is unable to successfully resolve the problem, and is not explained by human symbolic language: the results must be accepted "as is", hence the definition of Neural Networks as "black boxes". In other words, unlike an algorithmic
274
ISBN: 978-960-474-230-1
SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
system, where you can examine the step-by-step path from the input that generates output, a neural network is able to generate a valid result, or a result with a high probability of being acceptable, but it is not possible to explain how and why such a result is generated. The graph in Figure 3 shows the actual daily production of MSW vs. simulated with the neural network. The error provided by the application of the model is higher than the regressive one of about 12%.
Figure 4: daily MSW production (regression and neural network) 3. Validation of Optimizer for placement of
collection points.
Figure 3: Production of real and simulated daily MSW (Neural Network) The comparison between the production of MSW obtained by the regression method with that obtained with the neural network quality shows the same trend (Figure 4) unless quantities of small offsets.
ISSN: 1792-507X
The optimal location of bins, both for the number both for the type, was made using the methodology of Branch & Bound. This methodology, developed by Land and by Doing starts from the set of all feasible solutions, which is then divided into two sets with empty intersection to summarize the initial set (Branching). Then it is calculated for each set a limit not higher than the minor "cost" (defined as objective function to be maximized / minimized) for each element (bounding). Proceeding in this direction, gradually branching sub sets that contains the best solution, it’s possible to reach a set with one element which is excellent. This methodology appears to be one of the most appropriate in literature for solving resource allocation problems. Regarding the boundary conditions for the definition of the optimum point, they are summarized as follows: • Type of box. Represents the volume of containers to be placed on the ground. For different types of garbage collection there are standard sizes • Filling% maximum. Represents the load factor of the box. • Frequency of collection. Represents the number of collections made weekly. The frequency gives the number of days for which the bins have to buffer the production of MSW. For example, a frequency 3 to 7 (three times a week) means the necessity to have a capacity of containers of 3 days in buffer.
275
ISBN: 978-960-474-230-1
SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
• Maximum Distance allowable user / bin. In general, municipal regulations provide for a maximum allowable distance between the house and location of the nearest box. The output of the method in addition to the collection points (characterized by longitude and latitude, number and type of containers present, utilities served) must also provide a possible list of non-compliant users (typically in relation to the distance) to be managed manually downstream optimization. The validation of the tool for placement was made by considering a sample area comprising two roads with 284 people resident. For the test area it was assumed a collection of wet fraction of proximity. In particular, the variables considered were: 1. Volume of containers. Volumes ranging from 120 liters to 80 liters 2. Frequency of collection. Two possible frequency of collection, six times a week or three times a week. 3. Maximum distance allowable from the box to houses. Varying between 50 meters and 100 meters. Tables 3 and 4 show the results.
Validation of route and fleet planner In the optimization process and the establishment of delivery sequences, the cost of a complete tour of deliveries (in this case the collection) is usually proportional to the distance traveled, trying to calculate the path that has a minimum total distance among the locations visited once each. The matrix of savings method is quite simple to be implemented and can be used to assign deliveries / collections to the vehicle if there is a time constraint. The method comprises the following steps: • Creation of the distance matrix • Create the array of savings • Allocation of delivery /collection vehicles Once performed the steps the above, it’s possible to perform real optimization of each mission. Since typically the cost of each solution is proportional to distance traveled, the distance matrix is calculated based on geometric means of the formula: Dist(A,B) = √[Xa – Xb] The matrix expresses the convenience in term of savings, for two deliveries to be grouped on the same path; a higher value corresponds to greater convenience and the saving is calculated using the formula: S (A, B) = Dist (DC A) + Dist (DC, B) - Dist (A, B)
Table 3: results for zone 1
A path is composed by at least one delivery and covers a journey from the base and the return (eg DC -> Delivery -> DC), same thing happens if there are multiple deliveries grouped together (eg DC -> Delivery 1 -> Delivery 2 -> DC). The procedure consists in grouping on the same track (vehicle) deliveries /collections that have the highest value of savings, but respecting the maximum capacity of the vehicle (in this case, the maximum daily flow rate and time). The analysis examines one by one all couples of deliveries starting from the highest value of saving and then moved immediately to the next. Deliveries are then aggregated gradually in the available vehicles that, once filled, will be optimized to minimize travel time and thus guarantee a tangible resource saving for the company. To test this last module, consider to compute the path according to Travelling Salesman Problem, just to
Table 4: results for zone 2 This shows some interesting information: a. The days of waste accumulation depend only from collection frequency (in the same area) and not on the type of bins installed in the territory and their number. b. By varying the frequency and type of collection bin installed, the volume of waste managed per year changes. c. The filling system of bins, that literature provides as an index of efficiency of the system, is below 80% defined as input. This parameter is influenced not only by the frequency of service but also by the maximum allowable distance between house and garbage box.
ISSN: 1792-507X
276
ISBN: 978-960-474-230-1
SELECTED TOPICS in SYSTEM SCIENCE and SIMULATION in ENGINEERING
empty the bins placed with the 3rd module (Table 8). For this purpose it is necessary to fix the place of departure and arrival of the means for the collection (in this case, collecting organic fraction using a vehicle with tank type porter) and the type of vehicle used (Table 7).
[2] Briano E., Caballini C., Revetria R., Schenone M., Testa A. (2010); “Use of System Dynamics for modelling customers flows from residential areas to selling centers”, Proceedings of WSEAS ACMOS’10, Catania, Italy, May 29-31. [3] Cantarella, G. E.,and A. Sforza, Traffic assignment, in Concise Encyclopedia of Traffic and Transportation Systems (ed. Papageorgiou, M.) Pergamon Press, Oxford, 1991, pp. 513-520. [4] Christofides, N. (1976), “Worst-case analysis of a new heuristic for the travelling salesman problem”, Technical Report 388, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh. [5] Cormen, T. H.; Leiserson,., C. E.; Rivest, R. L.; Stein, C. (2001), "The traveling-salesman problem", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 1027–1033, ISBN 0-26203293-7. [6] Land A.H., Doig A.G. (2010), “An Automatic Method for Solving Discrete Programming Problems”; in “50 Years of Integer Programming 1958-2008”, Springer Berlin Heidelberg, pp.105132; ISBN 978-3-540-68274-5; [7] Potts, R.B., and R.M. Oliver, Flows in Transportation Networks, 1972, Academic Press, New York. [8] Sheffi, Y.Urban Transportation Networks, 1985, Prentice Hall, Englewood Cliffs, New Jersey. [9] Zhang G., Patuwo B.E., Hu M.Y. (1998); “Forecasting with artificial neural networks: The state of the art”; International Journal of Forecasting, Volume 14, Issue 1, 1 March 1998, pp. 35-62
Table 7: Points collection object routing
Table 8: Points collection object routing The results of the module were compared with the ones calculated mathematically according to the Traveling Salesman Problem, and the overlapping was obtained with a satisfactory approximation.
4. Conclusion The simulation model realized by the Authors has proven good results in the validation phase based on a real case study. The 4 modules have been tested one by one, but since they have independent functionalities an integration test is implicitly passed. Any case, before the application on the town that has been the pilot case study, that has shown the interest in using the simulator as DSS permanently, the authors will perform a further integration test to measure the adaptability of the simulation model to different boundary conditions. In the next months the simulation model here presented, will be in fact operating as DSS on at least 4 towns in Italy. References [1] Applegate, D. L.; Bixby, R. M.; Chvátal, V.; Cook, W. J. (2006), “The Traveling Salesman Problem”, ISBN 0691129932.
ISSN: 1792-507X
277
ISBN: 978-960-474-230-1