1
Improved genetic algorithm optimization of water distribution
2
system design by incorporating domain knowledge
3
Bi W., Dandy G. C. and Maier H. R.
4
Bi W.: Corresponding author, PhD student, School of Civil, Environmental and Mining
5
Engineering, University of Adelaide, Adelaide, South Australia, 5005, Australia.
6
[email protected]. Tel: +61-8-83136139.
7 8
Postal address: N223, Engineering North, School of Civil, Environmental and Mining
9
Engineering, North Terrace Campus, University of Adelaide, Adelaide, South Australia,
10
5005, Australia.
11 12
Dandy G. C.: Professor, School of Civil, Environmental and Mining Engineering,
13
University
14
[email protected].
of
Adelaide,
Adelaide,
South
Australia,
5005,
Australia.
15 16
Maier H. R.: Professor, School of Civil, Environmental and Mining Engineering,
17
University
18
[email protected].
of
Adelaide,
Adelaide,
South
Australia,
5005,
Australia.
19 20 21 22 23
Citation: Bi W., Dandy G.C. and Maier H.R. (2015) Improved genetic algorithm optimization of water distribution system design by incorporating domain knowledge, Environmental Modelling and Software, 69, 370-381, DOI: 10.1016/j.envsoft.2014.09.010
1
24
Abstract: Over the last two decades, evolutionary algorithms (EAs) have become a popular
25
approach for solving water resources optimization problems. However, the issue of low
26
computational efficiency limits their application to large, realistic problems. This paper
27
uses the optimal design of water distribution systems (WDSs) as an example to illustrate
28
how the efficiency of genetic algorithms (GAs) can be improved by using heuristic domain
29
knowledge in the sampling of the initial population. A new heuristic procedure called the
30
Prescreened Heuristic Sampling Method (PHSM) is proposed and tested on seven WDS
31
cases studies of varying size. The EPANet input files for these case studies are provided as
32
supplementary material. The performance of the PHSM is compared with that of another
33
heuristic sampling method and two non-heuristic sampling methods. The results show that
34
PHSM clearly performs best overall, both in terms of computational efficiency and the
35
ability to find near-optimal solutions. In addition, the relative advantage of using the PHSM
36
increases with network size.
37
Keywords: Optimization; Genetic algorithms, Water distribution systems; domain
38
knowledge; heuristics; computational efficiency.
39
2
40
1. Introduction
41
Evolutionary algorithms (EAs) have been used successfully and extensively for solving
42
water resources optimization problems in a number of areas, such as engineering design,
43
the development of management strategies and model calibration (Nicklow et al., 2010;
44
Zecchin et al., 2012). However, a potential shortcoming of EAs is that they are
45
computationally inefficient, especially when applied to problems of realistic size.
46
Consequently, there is a need to improve the computational efficiency of EAs to make them
47
easier to use for the optimization of realistic water resources problems (Maier et al., 2014).
48
One application area where this is the case of is the design of water distribution systems
49
(WDSs) (Marchi et al., 2014; Stokes et al., 2014). Over the past two decades, a variety of
50
EAs have been applied to this problem, as detailed in Zheng et al. (2013a). Among these,
51
genetic algorithms (GAs) have been used most extensively (Simpson et al., 1994; Dandy et
52
al., 1996, Gupta et al., 1999; Vairavamoorthy et al., 2005; Krapivka and Ostfeld, 2009;
53
Kang and Lansey, 2012; Zheng et al., 2013b). However, GAs have been primarily applied
54
to relatively simple benchmark problems, such as the 14-pipe problem (Simpson et al.,
55
1994), the New York Tunnels problem with 21 tunnels (Dandy et al., 1996), and the Hanoi
56
problem with 34 pipes (Zheng et al., 2011a). In recent years, there has been a move towards
57
increasing the complexity and realism of the case studies to which GAs are applied,
58
including the Balerma network with 454 pipes (Reca and Martínez, 2006), the Rural
59
network with 476 pipes (Marchi et al., 2012), the BWN-II network with 433 pipes (Zheng
60
et al., 2013c), and the network used by Kang and Lansey (2012), which has 1274 pipes and
61
will be referred to as the “KL” network for the remainder of this paper.
62
Increased network size and complexity result in significant challenges in terms of
63
achieving good quality near-optimal solutions given the computational budgets that are
64
typically available in practice (di Pierro et al., 2009; Fu et al., 2012). This is because (i) the
65
time for hydraulic simulation increases appreciably for large WDSs; and (ii) the complexity
66
and size of the search space associated with a large WDS are increased significantly. As a
67
result, computational efficiency has been identified as a key concern for the widespread
68
uptake of GAs for the optimization of large, real-world WDSs (di Pierro et al., 2009).
3
69
In order to address this issue, two main approaches have been adopted in the literature.
70
As part of the first approach, it is argued that for large, real problems, the focus should be
71
on finding the best possible solution within a realistic computational budget, rather than on
72
attempting to find the global optimal solution (e.g. Tolson and Shoemaker, 2007; Gibbs et
73
al., 2008; Tolson et al., 2009; Gibbs et al., 2010, 2011). This is because for such large
74
problems, the global optimal solution is unlikely to be found within a reasonable
75
computational timeframe.
76
As part of the second approach, efforts have been made to increase the computational
77
efficiency of the optimization process. This has been done in a number of ways, including
78
the use of increased computational power, such as parallel and distributed computing (Wu
79
and Zhu, 2009; Roshani and Filion, 2012; Wu and Behandish, 2012), the use of surrogate-
80
and meta-modeling to speed up the simulation process (e.g. Broad et al,. 2005; di Pierro et
81
al., 2009; Broad et al., 2010; Razavi et al., 2012), and the seeding of the initial population
82
of EAs with good solutions obtained using a variety of analytical techniques (e.g. Keedwell
83
and Khu, 2006; Zheng et al., 2011a; Fu et al., 2012; Zheng et al. 2014(a,b,c)). It should be
84
noted that similar concepts have recently also been used in conjunction with other
85
optimization techniques (e.g. Zhang et al., 2013; Housh et al., 2013) and non-optimization
86
based WDS design approaches (e.g. Sitzenfrei et al, 2013).
87
Although some of the methods mentioned above utilize engineering knowledge in their
88
development (e.g. Keedwell and Khu, 2006; Zheng et al., 2011a), there have been limited
89
attempts to incorporate engineering knowledge and experience directly. Only Kang and
90
Lansey (2012) have combined engineering experience with GAs in order to increase the
91
computational efficiency of the optimization process. This was achieved by seeding half of
92
the initial GA population with solutions that result in flow velocities below a threshold
93
selected from within a pre-defined velocity range. However, the approach has only been
94
applied to a single case study thus far and its relative performance has not yet been assessed
95
in a rigorous and comprehensive manner. In addition, the approach has a number of
96
potential shortcomings. Firstly, selection of an appropriate range for the velocity threshold
97
is subjective, which might make the method difficult to apply and could result in
98
inconsistent results from repeated, independent implementation of the method. Secondly,
99
pipe sizes that result in appropriate velocities are determined using a structured trial-and4
100
error process. However, in practice, pipe sizes generally reduce with distance from the
101
source (Walski, 2001). Consequently, there exists an opportunity to incorporate this domain
102
knowledge into the initial pipe sizing process. Finally, there is limited control over
103
population diversity, as this is achieved by seeding the initial population with 50% of
104
randomly generated solutions and 50% of the solutions obtained based on engineering
105
experience.
106
In order to address these shortcomings, the objectives of this paper are (i) to introduce a
107
new heuristic sampling method for determining the initial population of GAs for the least-
108
cost design of WDSs that is based on engineering experience / domain knowledge and that
109
overcomes the potential shortcomings of the method of Kang and Lansey (2012); and (ii) to
110
provide a rigorous assessment of the performance of this method compared with that of
111
Kang and Lansey’s sampling method (KLSM) and two sampling methods that do not
112
consider any domain knowledge (i.e. random sampling (RS) and Latin hypercube sampling
113
(LHS)) on seven WDS design case studies of varying size and complexity.
114
The remainder of this paper is organized as follows. The proposed heuristic, domain
115
knowledge based sampling method for determining the initial population of GAs for the
116
least-cost design of WDSs is introduced in next section, followed by the methodology for
117
assessing the performance of this method against that of the KLSM and the two non-
118
heuristic sampling methods. Next, the results are presented and discussed, followed by a
119
summary and conclusions.
120
2. Proposed prescreened heuristic sampling method for WDS design
121
The proposed heuristic sampling method for initializing the population of GAs for the
122
least-cost design of WDSs based on domain knowledge is called the Prescreened Heuristic
123
Sampling Method (PHSM). It uses a three-step procedure that (i) selects pipe sizes based
124
on knowledge that pipe diameters generally get smaller the further they are from the source;
125
(ii) dynamically adjusts the velocity threshold to account for the fact that appropriate
126
velocity thresholds are likely to be network dependent; and (iii) enables the diversity of the
127
initial population to be controlled by sampling from distributions centred on the solutions
128
determined using the heuristic procedures in (i) and (ii). The PHSM has some similarities to
129
the KLSM in that it aims to find initial pipe sizes that restrict flow velocities to lie within 5
130
certain ranges. However, it overcomes the potential limitations of the KLSM outlined in the
131
Introduction. Details of the three steps of the PHSM are given below.
132
Step 1: Assign pipe diameters based on distances between demand nodes and supply
133
sources
134
As mentioned above, the first step of the PHSM is motivated by the knowledge that, in
135
real WDSs, the diameters of upstream pipes are generally larger than those further
136
downstream (Walski, 2001). However, for WDSs, each demand node usually has a number
137
of different paths that connect it to the supply source (reservoir). This indicates that the
138
spatial distance between each demand node and the reservoir may vary according to the
139
paths selected to deliver the required demands. In the proposed method, the shortest
140
delivery path to each demand node is selected and used to represent the spatial distance
141
between that node and the source node. The rationale behind this is that it has been
142
demonstrated that the majority of the demand at a node is supplied by the path with the
143
shortest distance for an optimal design of WDSs (Zheng et al., 2011a). The detailed process
144
of step 1 of the PHSM is as follows:
145
i: Find the shortest distance to a reservoir in the water network, li for each node i
146
(i=1,2…..,n, where n is the total number of demand nodes in the network) using
147
the Dijkstra algorithm (Deo 1974). When dealing with a water network with
148
multiple reservoirs, an augmented source node is created to connect all the
149
reservoirs to enable the determination of li following Deuerlein (2008) and
150
Zheng et al. (2011a).
151
ii: Obtain the largest value of the shortest distance L by L=max(li).
152
iii: Divide the network into P specific areas with the shortest distance to the source
153
node interval of L/P, where P is the number of available pipe diameters for the
154
design.
155
iv: Assign pipes in each area a different diameter, with the largest diameter
156
assigned to the pipes in the area nearest to the source and the smallest diameter
157
to the pipes in the area furthest from the source (reservoir). All pipes in a single
158
area are assigned the same diameter.
6
159
For example, for the WDS introduced by Zheng et al. (2011a), which has 164 pipes
160
(Figure 1), the largest shortest distance of all nodes (L) is obtained after steps i and ii. If
161
there are five diameter options for this network (i.e. P=5), the network will be divided into
162
five areas in step iii. In order to do this (i) all nodes that have a shortest-distance that is not
163
greater than L/P (i.e. 0