Water Resour Manage (2011) 25:1537–1544 DOI 10.1007/s11269-010-9759-9
Genetic Programming for Predicting Longitudinal Dispersion Coefficients in Streams Hazi Mohammad Azamathulla · Aminuddin Ab. Ghani
Received: 20 May 2010 / Accepted: 5 December 2010 / Published online: 23 December 2010 © Springer Science+Business Media B.V. 2010
Abstract This paper presents a genetic programming (GP) approach to predict the longitudinal dispersion coefficients in natural streams. Published data were compiled from the literature for the dispersion coefficient for a wide range of flow conditions, and they were used for the development and testing of the proposed method. The proposed GP approach produced excellent results (R2 = 0.98 and RMSE = 0.085) compared to the existing predictors (Rajeev and Dutta, Hydrol Res 40(6):544–552, 2009, R2 = 0.345 and RMSE = 1778.6) for dispersion coefficient. Keywords Streams · Rivers · Dispersion · Pollutants · GP Notations B, W H U U∗ KX
Width (m), Depth (m), Velocity (m/s), Shear velocity (m/s), Longitudinal dispersion coefficient (m2 /s)
1 Introduction The longitudinal dispersion of pollutants in rivers is crucial to hydraulic and environmental engineers for designing outfalls or water intakes and evaluating risks from accidental releases of hazardous contaminants (Deng et al. 2001). Many researchers have focused on the mechanisms of longitudinal dispersion in rivers, beginning with
H. Md. Azamathulla (B) · A. Ab. Ghani River Engineering and Urban Drainage Research Centre (REDAC), Universiti Sains Malaysia, Engineering Campus, Seri Ampangan, 14300 NibongTebal, Pulau Pinang, Malaysia e-mail:
[email protected] A. Ab. Ghani e-mail:
[email protected]
1538
H.M. Azamathulla, A.Ab. Ghani
the simplest dispersion of dissolved contaminants in pipe flow (Ahsan 2008). The concept of dispersion was later extended to the mixing in constructed channels and further to natural streams. Many theoretical and empirical formulations were proposed to determine the longitudinal dispersion coefficient (Kx ). In the present study, an alternative approach is proposed to estimate Kx in natural streams, using GP. Fitness of models has been tested using the observed dispersion coefficient as available in the literature. Data corresponding to various natural streams has been used for this purpose. From the published results, it was shown that Kx varied within a wide range (1.9–2,883.5 m2 /s) (Azamathulla and Wu 2010). Accurate estimation of Kx is important in many applied hydraulic problems such as river engineering, environmental engineering, intake designs, estuary problems, and risk assessment of injection of hazardous pollutants and contaminants into river flows (Sedighnezhad et al. 2007; Cheong et al. 2007; Seo and Bake 2004). Investigation of water-quality conditions of natural rivers using a one dimensional (1D) mathematical model needs the best assessment of Kx (Fischer et al. 1979). When measurements and data of mixing processes in river are available, Kx can be determined. However, in rivers where mixing and dispersing data are not available, alternative methods should be employed for the estimation of Kx (Kashefipur and Falconer 2002). In such cases, owing to the complexity of mixing phenomena in natural rivers, the best estimations of Kx are impossible; hence, several linear regression equations were used for this purpose (e.g., Deng et al. 2001). The empirical equations for the estimation of Kx in natural rivers (Seo and Cheong 1998) will be presented in the following sections. Estimation of Kx in rivers using equations shown in Table 1 requires hydraulics and geometry data sets (Azamathulla and Wu 2010). These equations are valid only in their calibrated ranges of flow and geometry conditions; outside these ranges, the results had large uncertainties. The main objective of this study is to develop new soft computing Genetic Programming (GP) to estimate dispersion coefficients, and to assess the accuracy of the proposed GP method with natural-river data. Table 1 Empirical equations for estimation of longitudinal dispersion coefficient (Riahi-Madvar et al. 2009) Reference
Equation
Author
R2
Tayfour and Singh (2005) Deng et al. (2001) Fischer et al. (1979) Seo and Bake (2004) Seo and Bake (2004)
= 5.93HU∗ = 0.58 (H/U)2 UB = 0.011U2 B2 /HU∗ = 0.55BU ∗ /H2 = 0.18(U/U∗ )0.5 ×(B/H)2 HU∗ Kx = 2.0(B/H)1.5 HU∗
Elder (1959) McQuivey and Keefer (1974) Fisher (1967) Li et al. (1998) Liu (1977)
0.38 1,450.34 0.47 882.45 0.43 960.25 0.44 920.56 0.37 1,989.45
Iwasa and Aya (1991)
0.38 1,479.67
Seo and Cheong (1998)
0.41 1,045.43
Koussis and Rodriguez-Mirasol (1998) Li et al. (1998)
0.39 1,567.89
Rajeev and Dutta (2009)
0.45
Tavakollizadeh and Kashefipur (2007) Seo and Cheong (1998)
Kx Kx Kx Kx Kx
Kx = 5.92(U/U∗ )1.43 ×(B/H)0.62 HU∗ Sedighnezhad et al. (2007) Kx = 0.6(B/H)2 HU∗
FaghforMaghrebi and Givehchi (2007) Rajeev and Dutta (2009)
Kx = 0.2(B/H)1.3 ×(U/U∗ )1.2 HU∗ K/HU∗ = 2(W/H)0.96 ×(U/U∗ )1.25
RMSE
0.35 1,792.45 895.56
Predicting Longitudinal Dispersion Coefficients in Streams
1539
2 Overview of Genetic Programming Genetic programming (GP), a branch of the genetic algorithm (GA) (Holland 1975), is a method for learning the most “fit” computer programs by means of artificial evolution (Johari et al. 2006). GP initializes a population consisting of the random members known as chromosomes (individual), and the fitness of each chromosome is evaluated with respect to a target value. The principle of Darwinian natural selection is used to select and reproduce “fitter” programs. GP creates equal or unequal length computer programs that consist of variables (terminal) and several mathematical operators (function) sets as the solution. The function set of the system can be composed of arithmetic operations (+, −, /, and *) and function calls (such as {ex , x, sin, cos, tan, lg, sqrt, ln, power}). Each function implicitly includes an assignment to a variable, which facilitates the use of multiple program outputs in GP, whereas in tree-based GP those side effects need to be incorporated explicitly (Brameier and Banzhaf 2001). The present GP utilizes a two-point variable-length strings crossover. A segment of random position and random length is selected in both parents and exchanged between them. If one of the resulting children would exceed the maximum length, crossover is abandoned and restarted by exchanging equalized segments (Brameier and Banzhaf 2001). An operand or an operator of an instruction is changed by mutation into another symbol over the same set. The fitness of a GP individual may be computed by using the equation:
f =
N X j − Y j ,
(1)
j=1
Where X j is the value returned by a chromosome for the fitness case j, and Y j is the expected value for the fitness case j. In GP, the maximum size of the program is usually restricted to avoid overgrowing programs without bounds (Brameier and Banzhaf 2001). This configuration has been tested for the proposed GP model and has been found sufficient. The best individual (program) of a trained GP can be converted into a functional representation by successive replacements of variables starting with the least effective instruction (Oltean and Gro¸san 2003). Only a few studies exist in the literature related to the use of GP in the field of water resources engineering and the application of GP in hydraulic processes in natural channels has been limited. Savic et al. (1999) used GP for rainfallrunoff, Davidson et al. (1999), Babovic and Keijzer (2000) determined empirical relationships for the friction in turbulent pipe flow and the additional resistance to flow induced by flexible vegetation, respectively. Keijzer and Babovic (2002) derived empirical equations using real-world hydraulic data, Giustolisi (2004) determined Chezy resistance coefficient in corrugated channels, and Kizhisseri et al. (2005) explored a better correlation between the temporal pattern of flow field and sediment transport by utilizing numerical model results and field data. Recently Azamathulla et al. (2010) applied GP to predict bridge-pier scour using genetic programming technique.
1540
H.M. Azamathulla, A.Ab. Ghani
Table 2 Range of collected data (Toprak and Cigizoglu 2008)
Max value Min value Avg. value
Flow width, W (m)
Flow depth, H (m)
Average flow velocity, U (m/s)
Shear velocity, U∗ (m/s)
Kx (m2 /s)
711.20 11.89 59.86
25.1 0.22 3.69
2.23 0.034 0.71
0.553 0.0024 0.095
2,883.5 1.9 223.1
2.1 Genetic Programming to Predict Dispersion Coefficient in Natural Channels The scenarios considered in building the GP model include inputs (W/H, U/U∗ ) and output (Kx /HU∗ ). Table 2 shows the range of variation of collected data and its parameters measured in 2 rivers in the USA. The data set was collected from Deng et al. (2001) and Toprak and Cigizoglu (2008). From the collected data sets used in this study, about 70% (63 data sets) of these patterns were used for training (chosen randomly until the best training performance was obtained), while the remaining patterns 30% (33 data sets) were used for testing, or validating, the GP model for estimating dispersion coefficients. In this study, four basic arithmetic operators (+, −, *, and /) and some basic √ mathematical functions ( , x2 , power, Sin and Cos) were utilized. A large number of generations (5000) were tested. First, the maximum size of each program was specified as 256, starting with 64 instructions for the initial program. The functional set and operational parameters used in GP modeling during this study are listed in Table 3. The simplified analytic form of the proposed GP model may be expressed as:
d21 ecos d1+ d0+3.956 Kx d0∗ 10.76 Sin (d1d0) ∗ (d1d0) d1 − (2) + + = e HU ∗ eSind0 1.037 d1 − 11.38 Where d0 = W/H and d1 = U/U∗ N (oi − ti )2 R2 = 1 − Ni=1 ¯ i )2 i=1 (oi − o
(3)
Table 3 Parameters of the optimized GP model Parameter
Description of parameter
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
Function set Population size Mutation frequency % Crossover frequency % Number of replication Block mutation rate % Instruction mutation rate % Instruction data mutation rate % Homologous crossover % Program size
Setting of parameter √ +, −, *, /, , power 250 96 50 10 30 30 40 95 Initial 64, maximum 256
Predicting Longitudinal Dispersion Coefficients in Streams Table 4 Performance of GP
1541
GP
R2
RMSE
Training Testing
0.99 0.986
0.0046 1.3333
RMSE
N i=1
(oi − ti )2 N
(4)
where ti denotes the target values of Kx /HU∗ , while oi and o¯ i denotes the observed and averaged observed values of Kx /HU∗ , respectively, and N is the number of data points.
3 Results and Discussion The results of empirical equations listed in Table 1 were calculated using all compiled data set for Kx and the results are compared with measured data. Based on the results of these equations, none of these empirical equations have good results and shows considerable errors in comparison with measured data (Table 4).The values of these statistical indexes show the poor performance (large uncertainties)of four empirical equations for prediction of longitudinal dispersion coefficients Rajeev and Dutta (2009), R2 = 0.45; Fischer et al. (1979), R2 = 0.43; Seo and Cheong (1998), R2 = 0.41, and Sedighnezhad et al. (2007), R2 = 0.39). From Fig. 1, it is clear that there is substantial scatter between observed and predicted longitudinal dispersion
Fig. 1 Observed versus predicted longitudinal dispersion coefficient using linear regression analyses by different researchers
1542
H.M. Azamathulla, A.Ab. Ghani
Fig. 2 Comparison of observed versus predicted Kx/HU∗ for training data using GP
Fig. 3 Comparison of observed versus predicted Kx/HU∗ for testing data using GP
Predicting Longitudinal Dispersion Coefficients in Streams
1543
coefficients for the four empirical equations, and none of the existing predictors produce accurate dispersion coefficient. The results of the GP model for training and testing are presented in Figs. 2 and 3, respectively, and statistical results of this model are presented in Table 4. The GP model predicted longitudinal dispersion coefficient in natural rivers very accurately (R2 = 0.998 and RMSE = 0.0456) when compared with previous researchers’ results of R2 = 0.98 and RMSE = 0.085 for the testing data. With the advancements in computer hardware and software, the application of soft-computing tools should not pose problems in even complex applications.
4 Conclusions A genetic programming approach was used to derive a new expression for the prediction of the longitudinal dispersion coefficient (Kx) in natural rivers. The expression makes use of selected geometric (river width, flow depth) and hydraulic parameters (cross-sectional average shear velocities). A performance evaluation of a new GP expression was carried out by comparing the predictions from the new formula with other reported expressions, using previously published data(R2 = 0.99 and RMSE = 0.046). The comparison study shows that the new expression has the lowest RMSE and the highest coefficient of determination. The expression is found to be especially suited to wide rivers, where predictions are very close to the measured dispersion coefficients. These results indicate that practicing engineers can improve their designs and evaluations using GP for predicting longitudinal dispersion coefficients in natural rivers by using modern data driven approaches in place of traditional statistical methods because of large improvements. Also, computing resources have expanded dramatically in the past 20 years and they are expected to continually improve computational efficiency for incorporating robust methods such as GP.
References Ahsan N (2008) Estimating the coefficient of dispersion for a natural stream. World Acad Sci, Eng Technol 44:131–135 American Society of Civil Engineers (ASCE) Task Committee (2000) The ASCE Task Committee on application of artificial neural networks in hydrology. J Hydrol Eng 5(2):115–137 Azamathulla HMd, Ghani AA, Zakaria NA, Aytac G (2010) Genetic programming to predict bridge pier scour. ASCE J Hydraul Eng 136(3):165–169 Azamathulla HM, Wu FC (2010) Support vector machine approach for longitudinal dispersion coefficients in natural streams. Appl Soft Comput (in press) Babovic V, Keijzer M (2000) Genetic programming as a model induction engine. J Hydroinform 2(1):35–60 Brameier M, Banzhaf W (2001) A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans Evol Comput 5:17–26 Chau KW (2000) Transverse mixing coefficient measurements in an open rectangular channel. Adv Environ Res 4:287–294 Cheong TS, Younis BA, Seo IW (2007) Estimation of key parameters in model for solute transport in rivers and streams. Water Resour Manage 27(7):1165–1186 Davidson JW, Savic DA, Walters GA (1999) Method for identification of explicit polynomial formulae for the friction in turbulent pipe flow. J Hydroinform 1(2):115–126 Deng ZQ, Singh VP, Bengtsson L (2001) Longitudinal dispersion coefficient in single channel streams. J Hydraul Eng 128(10):901–916
1544
H.M. Azamathulla, A.Ab. Ghani
Elder JW (1959) The dispersion of marked fluid in turbulent shear flow. J Fluid Mech 5:544–560 FaghforMaghrebi M, Givehchi M (2007) Using non-dimensional velocity curves for estimation of longitudinal dispersion coefficient. In: Proceedings of the seventh international symposium river engineering, 16–18 October, Ahwaz, Iran, pp 87–96 Fisher BH (1967) The mechanics of dispersion in natural streams. J Hydraul Div ASCE 93(6): 187–216 Fischer HB, List EJ, Koh RCY, Imberger J, Brooks NH (1979) Mixing in inland and costal waters. Academic Press Inc, San Diego, pp 104–138 Giustolisi O (2004) Using genetic programming to determine Chèzy resistance coefficient in corrugated channels. J Hydroinform 6(3):157–173 Holland JH (1975) Adaptation in natural and artificial system. University of Michigan Press, Ann Arbor Iwasa Y, Aya S (1991) Predicting longitudinal dispersion coefficient in open channel flows. In: Proceedings of international symposium on environmental hydraulics, Hong Kong, pp 505–510 Johari A, Habibagahi G, Ghahramani A (2006) Prediction of soil-water characteristic curve using genetic programming. J Geotech Geoenviron Eng 32(5):661–665 Kashefipur SM, Falconer A (2002) Longitudinal dispersion coefficients in natural channels. Water Res 36(6):1596–1608 Keijzer M, Babovic V (2002) Declarative and preferential bias in GP-based scientific discovery. Genet Program Evolvable Machines 1(3):41–79 Kizhisseri AS, Simmonds D, Rafiq Y, Borthwick M (2005) An evolutionary computation approach to sediment transport modeling. In: Fifth international conference on coastal dynamics, Barcelona, Spain Koussis AD, Rodriguez-Mirasol J (1998) Hydraulic estimation of dispersion coefficient for streams. J Hydraul Eng ASCE 124:317–320 Li ZH, Huang J, Li J (1998) Preliminary study on longitudinal dispersion coefficient for the gorges reservoir. In: Proceedings of the seventh international symposium environmental hydraulics, 16– 18 December, Hong Kong, China Liu H (1977) Predicting dispersion coefficient of stream. J Environ Eng Div ASCE 103(1):56–69 McQuivey RS, Keefer TN (1974) Simple method for predicting dispersion in streams. J Environ Eng Div ASCE 100(4):997–1011 Oltean M, Gro¸san C (2003) A comparison of several linear genetic programming techniques. Complex Syst 14(1):1–29 Rajeev RS, Dutta S (2009) Prediction of longitudinal dispersion coefficients in natural rivers using genetic algorithm. Hydrol Res 40(6):544–552 Riahi-Madvar H, Ayyoubzadeh SA, Khadangi E, Ebadzadeh MM (2009) An expert system for predicting longitudinal dispersion coefficient in natural streams by using ANFIS. Expert Syst 36(4):8589–8596 Savic AD, Walters AG, Davidson JW (1999) A genetic programming approach to rainfall-runoff modeling. Water Resour Manage 13:219–231 Sayre WW, Chang FM (1968) A laboratory investigation of the open channel dispersion process of dissolved, suspended and floating dispersants. US Geological Survey Professional Paper 433-E, p 71 Sedighnezhad H, Salehi H, Mohein D (2007) Comparison of different transportand dispersion of sediments in mard intake by FASTER model. In: Proceedings of the seventh international symposium on river engineering, Ahwaz, Iran, pp 45–54 Seo IW, Bake KO (2004) Estimation of the longitudinal dispersion coefficient using the velocity profile in natural streams. J Hydraul Eng 130(3):227–236 Seo IW, Cheong TS (1998) Predicting longitudinal dispersion coefficient in natural Streams. J Hydraul Eng 124(1):25–32 Sullivan PJ (1968) Dispersion in a turbulent shear flow. PhD thesis, University of Cambridge, Cambridge, England Tavakollizadeh A, Kashefipur SM (2007) Effects of dispersion coefficient on quality modeling of surface waters. In: Proceedings of the sixth international symposium river engineering, 16–18 October, Ahwaz, Iran, pp 67–78 Tayfour G, Singh VP (2005) Predicting longitudinal dispersion coefficient in natural streams by artificial neural network. J Hydraul Eng 131(11): 991–1000 Toprak ZF, Cigizoglu HK (2008) Predicting longitudinal dispersion coefficient in natural streams by artificial intelligence methods. Hydrol Process 22:4106–4129