Information-based system identification for predicting ... - Springer Link

1 downloads 0 Views 1000KB Size Report
Jul 6, 2011 - Information-based system identification for predicting the groundwater-level fluctuations of hillslopes. Yao-Ming Hong & Shiuan Wan. Abstract ...
Information-based system identification for predicting the groundwater-level fluctuations of hillslopes Yao-Ming Hong & Shiuan Wan Abstract The analysis of pre-existing landslides and landslide-prone hillslopes requires an estimation of maximum groundwater levels. Rapid increase in groundwater levels may be a dominant factor for evaluating the occurrence of landslides. System identification—use of mathematical tools and algorithms for building dynamic models from measured data—is adopted in this study. The fluid mass-balance equation is used to model groundwater-level fluctuations, and the model is analytically solved using the finite-difference method. Entropy-based classification (EBC) is used as a data-mining technique to identify the appropriate ranges of influencing variables. The landslide area at Wushe Reservoir, Nantou County, Taiwan, is chosen as a field test site for verification. The study generated 65,535 sets of numbers for the groundwater-level variables of the governing equation, which is judged by root mean square errors. By applying crossvalidation methods and EBC, limited numbers of validation samples are used to find the range of each parameter. For these ranges, a heuristic method is employed to find the best results of each parameter for the prediction model of groundwater level. The ranges for governing factors are evaluated and the resulting performance is examined. Keywords Entropy-based classification . Groundwater-level fluctuation . Rainfall . Taiwan . Numerical modeling

Received: 13 July 2010 / Accepted: 9 June 2011 Published online: 6 July 2011 * Springer-Verlag 2011 Y.-M. Hong ()) Department of Design for Sustainable Environment, Ming Dao University, 369 Wen-Hwa Rd, Chang hua, 52345 Taiwan, Republic of China e-mail: [email protected] S. Wan Department of Information Management, Ling Tung University, 1 Ling Tung Rd, Taichung, 40852 Taiwan, Republic of China Hydrogeology Journal (2011) 19: 1135–1149

Introduction Heavy rainfall can induce landslide, mudslide, debris flow or flooding, which may result in enormous loss of life and damage to property. In recent years, there has been growing concern over the relationship between landslides, groundwater level and rainfall intensity. Groundwater level can be defined as the level below which the ground is wholly saturated with water. Generally speaking, rising groundwater level is one of the main factors contributing to the occurrence of landslides. For instance, Van Asch et al. (1999) considered that landslides initiated at 5–20 m depth are mostly triggered by rising groundwater level, which is induced by positive pore pressures on the slip plane. Schmidt and Dikau (2004) developed a process-based spatio-temporal model for groundwater variations and slope stability. Neaupane and Achet (2004) used antecedent rainfall, rainfall intensity, infiltration parameter, shear strength, groundwater level and steepness as input parameters to build a backpropagation neural network for landslide monitoring. Hence, precise measurement of groundwater level improves the accuracy of forecasting slope stability. The literature contains several approaches to measuring parameters related to groundwater level. Among them, the water-table fluctuation method is the most widely used technique for estimating groundwater recharge (Healy and Cook 2002). Sophocleous (1991) combined a storm-based soilwater balance (lasting for several days) with the resulting rise in water table to obtain an effective storativity value of the region near the water table. Wu et al. (1996) found that the response of recharge to rainfall was very fast for a soil profile with a shallow water table, and the two were closely related. Malet et al. (2005) reported that rainfall together with rapid snowmelt induced significant rise in groundwater level, which in turn led to excess pore-water pressure, thus causing instability. Park and Parker (2008) used a mass-balance equation to develop a simple physical model for quantifying groundwater fluctuations in response to rainfall timeseries. Lautz (2008) applied diurnal water-table fluctuations to monitor the riparian zone of Red Canyon Creek, Wyoming, USA. Gasca and Ross (2009) used the waterbalance approach to model wetland water levels for the Pulborough Brooks site in West Sussex, UK. As seen in the previous, various methods have been developed for estimating the groundwater level, or for DOI 10.1007/s10040-011-0754-x

1136

characterizing statistically the dynamics with a limited set of parameters. More recently, methods have also been formulated for identifying the properties that determine the dynamics of a groundwater system. These methods deal with the problem using system identification (which involves mathematical tools and algorithms for building dynamic models from measured data) or time-series analysis (Box and Jenkins 1970; Ljung 1987; Hipel and McLeod 1994), or less complicated multiple-regression methods. The groundwater system is seen as a black box that transforms series of observations of input variables into the output variable or groundwater level. If groundwater systems are not disturbed by groundwater abstraction or other influences, the climatological conditions can be merely considered as a series of dynamic inputs. Accordingly, if a governing equation of fluid mass balance with an effective technique of system identification can be developed for measuring parameter values, it may possibly be the best solution for prediction of groundwater level. Unfortunately, the measurement of those parameters is tedious and sometimes even impossible. A novel approach, data mining, offers a new perspective of searching the possible ranges of variables/factors to attain the groundwater level. In recent years, data mining (Wan et al. 2008; Lei et al. 2008) has emerged as an approach to analyzing landslides and natural resources. Data mining is one of the fastest growing fields in the knowledge discovery. Specifically, it is a classification approach to grouping data with multiple attributes into relevant categories. It has become the most valuable process for acquiring implicit knowledge among datasets. Well-developed techniques of classification include decision trees, rule-based and nearest-neighbor classifiers, and support vector machines (Wan and Lei 2009). The entropy theory developed by Shannon (1948) and has been widely applied in many fields including data mining. For example, Cloude and Pottier (1997) studied synthetic aperture radar (SAR) data through the entropybased classification (EBC) method—their study found that the scattering entropy is a key parameter in determining the randomness of the developed model and can be seen as a fundamental parameter in assessing the importance of polarimetry in remote-sensing data on earth observations. Maruyama et al. (2005) assessed the potential water– resources availability in an area in terms of disorder in intensity and over-a-year apportionment of monthly rainfall. Thiergärtner (2006) studied attribute patterns, object pattern, spatial distribution of object classes, and temporal development of objects using multi-dimensional heuristic models of pattern recognition; the study also showed that meaningful associations, in both space and time, can be derived from irregular variations in concentration of hazardous substances in only a few groundwater observation wells. Huang et al. (2008) used EBC to attain proper ranges of design variables for lead-rubber bearings; with the new ranges thus obtained, they successfully reduced the precise machinery deformation subjected to strong ground motions in earthquakes. Wan et al. (2008) studied a hybrid model of a decision tree (with entropy based Hydrogeology Journal (2011) 19: 1135–1149

classifier) with a support vector machine (SVM) to analyze and resolve the classification of the occurrence of landslide-debris flow. Wan (2009) used EBC to construct a spatial decision support system for monitoring the potential landslide zone of Shei Pa National Park, Miao Li, Taiwan. The environmental data were obtained by remote-sensing and digital-elevation modeling through EBC to generate knowledge rules for constructing the landslide-susceptibility map. A well-developed governing equation with a data mining approach could be one of the effective approaches for predicting groundwater-level fluctuations in hillslopes. The governing equation can be developed using system identification. In this study, a model for forecasting groundwater-level fluctuations after torrential rainfall was first developed using the fluid-mass-balance equation. In the model-development stage, several groundwater parameters were introduced, and the best suitable parameters were defined as the factors to attain the minimum root mean square error (RMSE) between predicted and actual groundwater level. A finite-difference scheme was employed to build a numerical program for obtaining the best suitable groundwater parameters by a trial and error method. EBC was then utilized to determine the ranges of variables in the governing equation. The ranges thus obtained were then categorized into different levels of RMSE. Finally, the rainfall and groundwater-level records from two storms in a landslide-prone area in Taiwan were employed to illustrate the application of the procedure.

Groundwater -level prediction The fluid-mass-balance equation

The fluid-mass-balance equation describes the movement of water in the “control volume” (i.e. the unit of water associated with determination of groundwater level) as follows (Fig. 1): dS ¼I O dt

ð1Þ

where S is the storage of the control volume; I is the inflow rate or recharge rate, including infiltration rate and groundwater inflow rate from upstream; and O is the groundwater outflow rate. Figure 1 illustrates the inflow/outflow of the control volume in an unconfined aquifer. S represents the groundwater storage with an aquifer region of unit area L (Rasmussen and Andreasen 1959) and can be expressed as S ¼ 8 hL

ð2Þ

where fillable porosity (8) is the amount of water that an unconfined aquifer can store per unit rise in water table and per unit area (Sophocleous 1991); h is the average groundwater depth of the control volume (Maréchal et al. 2006; Park and Parker 2008). The derivative of S with DOI 10.1007/s10040-011-0754-x

1137

Fig. 1 Conceptual diagram of a “control volume” associated with groundwater level

theory (1957) in hydrology, this simple approach using linear reservoirs to describe groundwater drainage and transfer function to describe rainfall-recharge response has been frequently applied according to the literature. However, the gamma-probability-density function has been widely accepted as an instantaneous unit hydrograph for a watershed that is represented by a series of identical linear reservoirs (Jeng and Coon 2003). In other words, the time lag and infiltration rate between unit rainfall and infiltration recharge may be represented by a series of n identical linear reservoirs, each having the same storage constant β. Therefore, for a groundwater system, a downward flux of unit rainfall ΔIp,n(t) is defined as follows:   aLPðt  Td Þ t n1  bt e DIp;n ðtÞ ¼ b ðn  1Þ! b ¼ aLPðt  Td ÞHðtÞ

ð5aÞ

where Td is the initial time lag of infiltration recharge, i.e. the time from the onset of infiltration to the infiltrating water

respect to time is ð3Þ

reaching the water table (Wu et al. 1997); α represents n1 t the 1 t infiltration ratio of rainfall; andHðtÞ ¼ bðn1Þ! b eb is a

I is the water inflow rate into the control volume. According to Fig. 1, the water inflow rate of the control volume can be expressed as

unit pulse of infiltration recharge rate (Wu et al. 1997). In addition, the peak pulse of infiltration recharge rate Hp and the peak time tp can be obtained by differencing H(t) as follows

dS dh ¼ 8L dt dt

I ¼ Ip þ Ig

ð4Þ

where Ig is the groundwater inflow rate from upstream infiltration recharge; Ip is the infiltration rate of the control volume, which is the product of the rainfall (depth measurement) and the travel-time distribution of infiltration recharge. The recharge rate and outflow rate are described in detail in the following sections.

Infiltration rate Transfer-function methods have been developed for relating infiltration patterns to groundwater-recharge patterns (Besbes and de Marsily 1984; Morel-Seytoux 1984; Wu et al. 1997; Juki´c and Deni´c-Juki´c 2004). For example, Wu et al. (1997) used the gamma-probability-density function to represent the travel-time distribution of infiltrating water by rainfall events. Inspired by Nash’s

tp ¼ b ðn  1Þ

Hp ¼

ð5bÞ

1 ðn  1Þn1 eðn1Þ b ðn  1Þ!

ð5cÞ

where a large tp represents a long time lag between unit rainfall and maximum infiltration rate; a small Hp denotes a small peak pulse of infiltration recharge rate. According to Eqs. (5b) and (5c), a large β and a large n will result in a long time lag and a small peak pulse of infiltration recharge rate, respectively. Let the time domain be separated into discrete intervals of duration Δt. Assuming that Pm is the rainfall during the time interval between (m–1)Δt and mΔt. The infiltration recharge rate in the Nth time interval (t=NΔt) is

Ip;N ðt þ Td Þ ¼ aLðP1 H ðN Dt Þ þ P2 H ½ðN  1ÞDt  þ ::: þ Pm H ½ðN  m þ 1ÞDt  þ ::: þ PN H ½Dt Þ N P Pm H ½ðN  m þ 1ÞDt  ¼ aL

ð6Þ

m¼1

where N is the influence range of infiltration pulse recharge rate, which varies with parameters n and β. Generally speaking, a large n and β will induce a large tp Hydrogeology Journal (2011) 19: 1135–1149

and a small Hp, which represents the present rainfall as a long-term effect to groundwater level. Therefore, a large N is necessary for a large n and β. DOI 10.1007/s10040-011-0754-x

1138

Upstream inflow and downstream outflow

Determining the range of governing factors

Figure 1 is the conceptual diagram of a control volume. h2 and h1 are the groundwater depth for inflow and outflow, respectively. Assuming that the gradient and linear  is 2small  within the control volume, then h ¼ h1 þh is the average 2 groundwater depth of the control volume. According to Darcy’s law, the horizontal groundwater inflow/outflow rate for the control volume shown in Fig. 1 is Ig  Og ¼ k

h1  h2 h L

ð7Þ

where k is the hydraulic conductivity. Assuming that the local groundwater gradient in the control volume is the h1 h2 same as the regional gradient, i.e. Dh Dx ¼ L , then Ig  Og ¼ k

Dh h Dx

ð8Þ

On the other hand, it can be assumed that Ig–Og is proportional to groundwater storage S such that Ig  Og ¼ KS ¼ K 8 Lh

ð9Þ

where K is a transport parameter. Physically, a large value of K will lead to a slow inflow Ig and a fast outflow Og, thus decreasing the groundwater level. Comparing Eqs. (8) and (9), one gets K ¼ 8kL Dh Dx , which is similar to the parameter obtained by Park and Parker (2008).

Equation for predicting groundwater level Substituting Eqs. (3), (6) and (9) into Eq. (1) yields N X Dh ¼ Kh þ  Pm H ½ðN  m þ 1ÞDt Dt m¼1

ð10aÞ

where Δh is the prediction difference value of groundwater level during Δt time interval. Kh is the height loss by groundwater flux;  ¼ 8a is a composite parameter of infiltration recharge rate. A big η will cause a rapid rise in N P groundwater level;  Pm H ½ðN  m þ 1ÞDt is the m¼1

groundwater recharge from infiltration of rainfall. DefinN P ing W t ¼  Pm H ½ðN  m þ 1ÞDt, a finite difference m¼1

method (FDM) is formulated as htþDt  ht ¼ KhtþDt þ W t Dt so that htþDt ¼

ht þ W t Dt 1  KDt

ð10bÞ

A computer program has been developed via this study using visual C++ 6.0 to calculate ht+Δt. Hydrogeology Journal (2011) 19: 1135–1149

From Eq. (10b), after obtaining K, η, n, N, Td and β, the groundwater level at time ht+1 can be determined by the known ht and a series of rainfall. Therefore, the K, η, n, N, Td and β selected will determine the prediction accuracy for groundwater level. The ranges of governing factors for groundwater-level prediction are described in the following. Travel-time distribution of infiltration recharge rate (n, β, N, Td) Previous studies have found that travel-time distribution dominates the infiltration recharge rate. For instance, Van Asch et al. (1999) chose 1 week as the duration of observation for correlating rainfall and frequency of landslides. Some of the field-monitoring data obtained by Lee et al. (2006) showed that the time lag for peak rainfall and peak groundwater level ranges from half a day to 20 days. The majority of rapid responses are observed during the winter/spring recharge period, during which the unsaturated zone is the thinnest while the unsaturatedzone moisture content is the highest. In other words, time lag is a function of thickness of unsaturated zone, seasons and rainfall intensity. Park and Parker (2008) used rainfall of a single day to calculate the infiltration of the subsequent day. According to Eq. (5a), n and β control the travel-time distribution of the infiltration recharge rate. In this study, the developed approach is applied to groundwater-level prediction of hillslopes after torrential rainfall; therefore, a short time lag ranging between 1≤β≤ 4 and 2≤n≤5 is selected. If Δt=1 hour, according to Eq. (5b), the shortest time lag tp is 1 hour and the longest is 16 hours. In addition, this study uses N=40. On the other hand, when the groundwater level of a hillslope rises instantly after torrential rainfall, the initial time lag of the infiltration recharge (Td) may decrease. Wu et al. (1997) carried out 21 simulations to obtain Td for various conditions of rainfall, groundwater depths, and soil hydraulic properties. The Td obtained ranged from 0.35 to 22.5 days. The study described here focuses on groundwater fluctuations triggered by rainfall; therefore, Td should be short enough for a rapid rise in groundwater level. Thus, the Td adopted in this study ranges from 1 to 12 hours. Fillable porosity (φ) The φ adopted by the Department of Transportation in the State of Florida (2004) is in the range of 0.1–0.3. For more conservative prediction, the φ used in this study ranges from 0.02 to 0.5. Groundwater-recharge ratio parameters (α, η) There are many methods for estimating the ratio of infiltration to rainfall. For example, Bhark and Small (2003) obtained an infiltration ratio of 60–85% by observing rainfall and infiltration for four rainfall events. Li et al. (2006) conducted a full-scale field experiment in an instrumented saprolite slope, and found an infiltration DOI 10.1007/s10040-011-0754-x

1139

ratio of 70% of the total rainfall. In accordance with the aforementioned studies, an infiltration ratio range (α) of 0.5–1 was adopted, thus obtaining η in the range of 1–50 by  ¼ 8a :

and entropyðx < xm Þ ¼ 

2 X

pðx < xm jjÞlog2 pðx < xm jjÞ

j¼1

Transport parameter of groundwater outflow (K) Park and Parker (2008) adopted K=−0.15/day in the Hongcheon area of South Korea. If the unit time is 1 hour, then K=−0.00625/hour. This study assumes K to be in the range of −0.02225 to −0.0005/hour. In the foregoing analyses, the unknown parameters used in the governing equation include n, β, η, Td and K. With suitable ranges of governing factors selected, the prediction accuracy of groundwater level may greatly be improved.

Validation method and error estimation Entropy-based classification (EBC) The classification approach involving Shannon entropy is developed to search the relations between attributes and decisions. The basic concepts of classification are (1) quantifying the disordered degree (Shannon entropy) between the attribute (groundwater variables) and the decision (errors among output ranges), and (2) obtaining the appropriate ranges of attributes to achieve the lower category of decision according to the lower value of entropy. The following example illustrates the EBC procedures. Table 1 shows a dataset of 10 data, x1–x10 in column (i), with an attribute, column (ii), and a corresponding decision, column (iii), taken from a related study. Take, for instance, defining m as the index of the cutting point (see Table 1): while *m=2, **FCP(2)=5.0. The cutting point is a point that divides two different decisions from each attribute. FCP(2) is the mean value of 3 and 7. That is, 5 is the cutting point for dividing xi into two parts, namely xi 5. First, the data attributes are sorted in ascending order. Then, a fictitious cutting point, FCP(m), is defined as the mean values of two different attribute values from column (v) of Table 1, and m represents the identification number of the fictitious cutting point. The parameter FCP(m) divides the attributes into two different classes (attribute-classes): attribute-class 1 means the attribute value is smaller than FCP(m), and attribute-class 2 indicates that it is larger than FCP(m). In a real database, it is possible that two data are of the same attribute-class but with different decision-classes, and one of them is then taken as noise. Hence, in this case, there may be different entropy values with regard to the data of the two attribute-classes. The entropy values can be calculated by the following equation:

entropyðx > xm Þ ¼ 

2 X

pðx > xm jjÞlog2 pðx > xm jjÞ

j¼1

ð11aÞ Hydrogeology Journal (2011) 19: 1135–1149

ð11bÞ where j is the decision-class (“1”or “2” in this example); pðx < xm jjÞ is the probability for data of decision-class j for x xm jjÞ ¼ 0 is assigned pðx < xm jjÞlog2 pðx < xm jjÞ ¼ 0 or pðx > xm jjÞlog2 pðx > xm jjÞ ¼ 0: Entropy is an index of disorderliness between decisions and attributes. The entropy values here range between 1.0 and 0.0. When entropy(x) is 1.0, it means that there is only a 50% of chance to make the right decision. If entropy(x) is 0.0, it indicates that there is 100% chance to make the right decision. Hence, each attribute class is expected to have a value smaller than entropy(xm). A concept of information gain (IG) opposite to entropy is then used. IG can be computed by the following equation: IGðx > xm Þ ¼ 1:0  entropyðx > xm Þ

ð12aÞ

and IGðx < xm Þ ¼ 1:0  entropyðx < xm Þ

ð12bÞ

According to Eqs. (12a) and (12b), the values of IG(x) are between 0.0 and 1.0. Note that the values of IG(x) are contrary to entropy but in accordance with the tendency of the attribute-class proportion. In the next step, a given FCP(m) can derived from the proportion of each attributeclass. Multiplying the FCP(m) thus derived by IG(x) yields IG(xm) as follows: IGðxm Þ ¼ pp ðx > xm Þ½1  entropyðx > xm Þ þ pp ðx < xm Þ½1  entropyðx < xm Þ

ð13Þ

where pp(xxm) is the proportion of data for xxm to all of the data. Therefore, the index of IG(xm) used in this study can be considered as a measurable value of the assessment for a given FCP(m). The larger the IG(xm), the better the choice of FCP(m). That is, in computing a series of IG(xm), the largest IG(xm) was selected to display the best relations between attributes and decisions. For example, to compute entropy(x2 = 5) (m = 2 and FCP(2) = 5.0) in columns (iv) and (v) of Table 1 involves the following step as shown in Table 2. 1. In column (ii) of Table 1, there are two (x1 and x2) and eight (x3–x10) data that fall into attribute-class 1 (the data for attribute values smaller than FCP(2) with x< DOI 10.1007/s10040-011-0754-x

1140 Table 1 An example of entropy-based classification Data (i)

Attribute xi (ii)

Decision (iii)

x1

1

1

x2

3

1

x3

7

2

x4

9

1

x5

11

1

x6

15

2

x7

23

1

x8

30

2

x9

36

2

x10

40

2

a

m (iv)

Attribute of FCP(m) (v)

IG(xm) (vi)

1

2.0

0.108

a

a

*2

**5.0

***0.236a

3

8.0

0.035

4

10.0

0.125

5

13.0

0.278

6

19.0

0.125

7

26.5

0.396

8

33.0

0.236

9

38.0

0.108

*, **, ***: see text and Table 2

x2) and attribute-class 2—for those larger than FCP(2) with x>x2—respectively. That is, pp ðx < x2 Þ ¼ 0:2 and pp ðx > x2 Þ ¼ 0:8, as shown in column (c) of Table 2. 2. A(j, xxm), from column (iv), is the number of data for conditions of j. The aforementioned data (x1 and x2) for attribute-class 1 (x xm Þ ¼ 3 represents the data of x4, x5, and x7 while Aðj ¼ 2; x > xm Þ ¼ 5 represents the data of x3, x6, x8, x9, and x10. 3. x x2 jj ¼ 1Þ ¼ 0:375 and pðx > x2 jj ¼ 2Þ ¼ 0:625. 4. Columns (vi), (vii) and (vii) in Table 2 are derived from Eqs. (11a and 11b)–(13). Steps 1–4 are repeated to obtain the other results of IG (xm) with respect to different conditions of FCM(m). These results are shown in column (vi) of Table 1. Among them, IG (x7) is the biggest one; hence, FCP(7)=26.5 denotes the optimal cutting point. Table 1 shows that the majority of decisions fall into class 1 if the attribute value ranges

between 1 and 26.5. Additionally, the decision-class is close to class 2 when the attribute value exceeds 26.5. Accordingly, knowledge rules for the appropriate range of the attribute to achieve the expected decision are obtained.

Iterative cross-validation As mentioned previously, all data concerning the geographical information, rainfall, and groundwater-level fluctuations are incorporated into a database. The next step is to select the most representative training samples using iterative cross-validation. With the selected representative training samples, the upper-limit and lower-limit for each governing factors can be found. Figure 2 shows the flowchart of steps in using EBC to find the proper variable ranges. Cross-validation (Golub et al. 1979; Kohavi 1995) is a statistical practice of partitioning a sample of data into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis. In this study, iterative cross-validation is employed to evaluate the original database. The original samples are partitioned into w subsamples. Of the w subsamples, a single subsample is retained as the training data for developing the model, and the remaining w-1

Table 2 Entropy-based classification of FCP(2) of Table 1 (m=*2, attribute=**5.0) for the corresponding IG(xm) cutting point value (***) P(x)

A(j, x)

p(x | j)

entropy(xm)

IG(x)

IG(xm)

(ii)

(iii)

(iv)

(v)

(vi)

(vii)

(viiib)

xxm xxm

0.2

2 0 3 5

1.0 0 0.375 0.625

0

1.0

0.236

0.954

0.046

Description m

x

(ia) 1 2 a b

0.8

The same as column (iv) in Table 1 The same as column (vi) in Table 1

Hydrogeology Journal (2011) 19: 1135–1149

DOI 10.1007/s10040-011-0754-x

1141

Fig. 2 Flowchart of steps in using EBC to find the proper variable ranges

subsamples are used as testing data (or verification data). Cross-validation will then be repeated iteratively for w times (the folds), with each of the k subsamples used exactly once as the training data. The w-fold results can help assess the accuracy of the model. If the accuracy is lower than the criterion predefined by the user, the program will iteratively reselect the training samples for obtaining the best representative data that satisfy the required criterion. The next section will describe how the accuracy is defined.

Estimation of accuracy The RMSE between measured and predicted groundwaterlevel Ep is employed to express the average difference Hydrogeology Journal (2011) 19: 1135–1149

between simulated and measured groundwater levels: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP 2 u nr  p h  hm u iþ1 t i¼1 iþ1 Ep ¼ nr

ð14Þ

where Ep is the average of squared differences between measured and predicted groundwater levels; nr is the number of records made in a storm; hpiþ1 is the predicted groundwater level at time i+1; and hm iþ1 is the measured groundwater level at time i+1. A fundamental question arises: What Ep values constitute a reasonable error range of prediction accuracy? If there is no available model for predicting the groundwater level at the next time step, the worst case is that the current groundwater level is DOI 10.1007/s10040-011-0754-x

1142

employed to “guess” the groundwater level at the next time step. Mathematically, vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP 2 u nr  m hi  hm u iþ1 t Eo ¼ i¼1 ð15Þ nr where Eo denotes the differences in time steps of groundwater levels; hm i is the measured groundwater level at i time, which is employed to predict the value at i+1 time. Then, “predicted efficiency” Ei is defined as follows: Ei ¼

Ep Eo

ð16Þ

Ei 1 indicates that the prediction method is ineffective in enhancing prediction accuracy. In this study, Ei

Suggest Documents