A framework based on hidden Markov model with

0 downloads 0 Views 2MB Size Report
Traditional approaches focus on fitting data precisely but less consider such ... sunlight, temperature, and wind speed [20], and 2) the uncertainty of variations of ...
Decision Support Systems 84 (2016) 89–103

Contents lists available at ScienceDirect

Decision Support Systems journal homepage: www.elsevier.com/locate/dss

A framework based on hidden Markov model with adaptive weighting for microcystin forecasting and early-warning P. Jiang a, X. Liu a,b,⁎, J. Zhang c, X. Yuan d a

Department of Industrial Engineering & Management, Shanghai Jiao Tong University, Shanghai 200240, PR China Department of Industrial & System Engineering, National University of Singapore, Singapore 119260, Singapore Department of Civil and Environmental Engineering, National University of Singapore, Singapore 117576, Singapore d Department of Cell Biology, University of Alberta, AB T6G 2H7, Canada b c

a r t i c l e

i n f o

Article history: Received 9 July 2015 Received in revised form 9 February 2016 Accepted 9 February 2016 Available online 16 February 2016 Keywords: Decision support systems Framework Hidden Markov model Adaptive exponential weighting Microcystin forecasting Early warning of risk

a b s t r a c t Harmful algal blooms during the eutrophication process produce toxins, such as microcystins (MCs), which endanger the ecosystems and human health. Accurate forecasting and early-warning of MCs can provide theoretical guidance for quick identification of risk in water management systems. The variation of MC concentration is affected by not only the status quo of numerous manifest biotic and abiotic factors, but also a hidden variable that represents the uncertainty of variations of these factors. Traditional approaches focus on fitting data precisely but less consider such a hidden variable, which would experience formidable barriers when encountering fluctuations in time-serial data. In this study, to address the forecasting problem with a hidden state variable and the problem of early-warning-of-risk, we build a novel integrated framework which is consist of three parts: 1) a forecasting model based on a Principal Component Analysis (PCA) and an improved Continuous Hidden Markov Model (CHMM) with adaptive exponential weighting (AEW), where the AEW-CHMM is proposed to forecast both the single-step-ahead concentration for general points and fluctuating points, and the three-step-ahead concentration existing immediately after the fluctuating point; 2) Bayesian hierarchical modeling for a ratio estimation; and 3) revised guidelines for the risk-level grading. The case study tests a real dataset of one shallow lake with the proposed approaches and other supervised machine learning methods. Computational results demonstrate that the proposed approaches are effective to offer an intelligent decision support tool for MC forecasting and early warning of risk by risk-level grading. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Eutrophication, which is an aging process of water bodies caused by nutrient enrichment, constitutes one of the most serious environmental issues regarding the degradation of water quality in rivers, reservoirs, lakes, and oceans. Harmful algal blooms (e.g., microcystis blooms) formed during the eutrophication process can release toxins, which belong to compounds of emerging organic contaminants (EOCs). The most common of these are the microcystis toxins (microcystins or MCs) that include over 80 different congeners [46], of which the microcystin-LR (MC-LR) is the most toxic and most frequently occurring [2]. MCs threaten safe drinking water and the recreational use of beaches and lakes, which lead to illness or death for animals and humans, as well as many environmental and recreational related problems [40,56]. In recent years, as emergency cases of acute liver failure ⁎ Corresponding author at: Department of Industrial Engineering & Management, Shanghai Jiao Tong University, 800 Dongchuan Road, Min-Hang District, Shanghai 200240, PR China. Tel.: +65 6601 4089; fax: +65 6601 4089. E-mail address: [email protected] (X. Liu).

http://dx.doi.org/10.1016/j.dss.2016.02.003 0167-9236/© 2016 Elsevier B.V. All rights reserved.

and gastrointestinal syndromes have been increasing, growing attentions have been paid globally to water bodies contaminated by MCs [46,49]. Unfortunately, residents cannot directly observe the degree of MC contamination. For this reason, they cannot make informed choices regarding which aquatic recreational activities are safe to participate in and which are not. They also have difficulties judging whether or not tap-water is drinkable at a certain time because MCs cannot be removed by conventional processes in a drinking water treatment plant. These serious health issues caused by these “invisible killers” indicate that it is critical to identify effective early-warning-of-risk systems to detect dangerous levels of microcystis blooms in reservoirs, lakes, and oceans. Actually, the variation of MC concentration is affected by: 1) the status quo of numerous manifest biotic and abiotic factors, such as predators, nutrients (e.g., nitrogen and phosphorus), other phytoplankton species, dissolved oxygen, dissolved organics, salinity, pH, transparency, sunlight, temperature, and wind speed [20], and 2) the uncertainty of variations of these factors. The variation uncertainty is a hidden variable, which can be synthetically reflected by the variation of the water body eutrophication level. Traditional approaches focus on fitting or training data precisely and implementing forecasting according to the

90

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

fit or training principle but less consider such a hidden variable, which would present enormous difficulties when encountering fluctuations in time-serial data. In most cases, historical data regarding MC concentrations are not available as chemical extraction and determination of MCs are much more difficult than those of biomass or chlorophyll-a (Chl-a). Without possessing the original available data, accurate forecasting of MCs is quite difficult. However, a large amount of field data have shown that a highly correlated linear relation exists between MCs and Chl-a [19, 21,47]. Thus, the Chl-a concentration could serve as a helpful indicator of the MC concentration [39,47]. Nevertheless, how to estimate the relation between them without ignoring sampling uncertainties of water samples poses another challenge. To the best of our knowledge, there is not currently an integrated framework offering an intelligent decision support tool for MC forecasting and early warning of risk. The absence of such a framework presents great inconvenience for reservoir, lake, or ocean management systems. In fact, few researchers and organizations have implemented early warning of risk for MCs due to the absence of original data regarding MCs, and that the World Health Organization (WHO) guidelines for relative risk of exposure to MCs are not detailed. Thus, residents often possess no direct impressions of the potential risk. Motivated by the aforementioned three challenges, our solutions and contributions of this study are listed as below: 1) In order to address the Chl-a or MC forecasting problem with a hidden state variable in a more robust manner, we propose an improved CHMM with AEW schemes as an extension of forecasting modeling via the CHMM [24, 26], by discovering similar patterns and bestowing exponential weighting to them adaptively; 2) To cope with the issue of historical data absence regarding MCs, and to perform uncertainty analysis, a Bayesian hierarchical model is proposed to estimate the ratio of MCs/ Chl-a, and transform MC forecasting into Chl-a forecasting, which helps to reduce work complexity; and 3) To effectively implement early warning of risk for MCs, a novel integrated framework is built, which is consist of the PCA-based AEW-CHMM forecasting model, Bayesian hierarchical modeling for the estimation of the above ratio, and revised guidelines for the risk-level grading. The remainder of this paper is structured as follows: Section 2 reviews the related work. Section 3 introduces forecast targets and the problem characteristics. The framework is presented in Section 4. Section 5 demonstrates a case study to test the effectiveness of the proposed approaches. Finally, in Section 6, we close this paper with some conclusions and future work.

substantial deficiencies do exist in applications of current approaches, including: 1) lacking coping capacity for fluctuation and uncertainty (e.g., MLR and ARIMA); 2) “black-box” system properties (e.g., ANN and SVM); 3) expert experience dependency (e.g., FL and BN); and 4) only the median forecasting (e.g., MC and BN). To the best of our knowledge, no study has yet forecast MCs via a CHMM which has been successively applied to the fields of genetic analysis, customer relationship forecasting, stock market forecasting, failure prognosis, etc. In this study, to cope with the above deficiencies, the AEW-CHMM is constructed to elucidate the complex relationship between observations and the hidden parameter of the water body eutrophication level, and forecast possible changes of MCs driven by related factors.

2.2. Early-warning-of-risk systems Some literature has presented early warning approaches or earlywarning-of-risk systems for algal blooms [11,29,41]. However, they did not directly measure adverse health effects, which prevents residents from possessing direct impressions of the risk, or permissible range of aquatic activities. To measure adverse health effects, major institutions, such as the WHO and various national government agencies, released guidelines with varying degrees of elaboration. For example, in 1999 and 2003, the WHO successively promulgated standards on the MC-LR in which the upper limit for drinking water is 1 μg/L and for recreational water is 20 μg/L [12,53]. In Australian government guidelines, recreational water is divided into water of whole-body contact (primary contact), water of incidental contact (secondary contact), and water of no contact (esthetic uses) [37]. The Great Lakes Environmental Research Laboratory (GLERL) indicated that the Australian guideline for recreational water of whole-body contact (e.g., swimming, surfing, and bathing) is 20 μg/L, and for water of incidental contact (e.g., fishing, boating, and walking on the beach) is approximately 100 μg/L [18]. In this study, revised guidelines regarding WHO and Australian guidelines are utilized to rank the risk level of exposure to MCs which are forecast by the proposed approaches.

3. Forecast targets and problem characteristics This section first defines forecast targets and then introduces the characteristics of MC concentration forecast problem.

2. Related work This section reviews the major forecasting approaches and analyzes their advantages and weaknesses, and interprets the deficiencies of existing early-warning-of-risk systems. 2.1. Forecasting approaches In the literature, approaches that have been newly established to forecast Chl-a, MCs, or algal blooms can be divided into six streams, which include: 1) traditional regression-based models; 2) supervised machine learning; 3) genetic-based models; 4) system simulation models; 5) Markov chain models; and 6) Bayesian-based models. We summarize their advantages and weaknesses in Table 1. According to Table 1, it is difficult to determine which one is the best approach if no concrete application scenario exists. Nonlinear forecasting, limitation of data volume, and algorithm speed do not constitute main challenges for MC forecasting because most approaches possess their own nonlinear processing abilities, the consciousness of data collection are raising in the big data era, and there are sufficient time and computing power to perform MC forecasting as concentration does not tend to change radically in real time, respectively. However, certain

3.1. Forecast targets To validate the proposed approaches in a detailed manner, the targets to be forecast are divided into two types: single-step-ahead forecasting (including general points and fluctuating points) and threestep-ahead forecasting immediately after the fluctuating point. Generally, there exist two kinds of points in conventional time-series, i.e., fluctuating points and general points. They are defined as follows: Definition 1. Let the scatter plots in Fig. 1 be a scatter function y = f(x) , xϵ{1, 2, ..., t,...}. If f(t− 1)≤ f(t) ≥ f(t + 1) or f(t − 1) ≥ f(t) ≤ f(t + 1), t is an extreme value point in the time series. Definition 2. Let the first point following the extreme value point be the fluctuating point. All points except for the fluctuating points are called general points in this paper. For example, in Fig. 1, E1, E2, E3, and E4 are extreme value points; f1, f2, f3, and f4 are fluctuating points; and Ti1, Ti2, and Ti3 are continuous three-day general points immediately after the fluctuating point fi (i = 1 , 2 , 3 , 4).

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

91

Table 1 Literature review regarding Chl-a, MCs, or algal blooms forecasting approaches. Stream

Approach

MLR Traditional regression-based ARMA model

Genetic-based model

System simulation model Markov chain model Bayesian-based model

Advantages

Weaknesses

The simplest method with the best model interpretability; can handle both small and large samples Both are autoregressive models, can take into accounts other related variables; can forecast the average values effectively.

Hard to obtain high forecast accuracy for nonlinear problems; troubling by multicollinearity most often

Able to solve problems with uncertainty; can effectively create relationships among inputs and outputs of data sets

Based on expert experience; model performance is sensitive to the type of datasets and model structures.

[43]

ARIMA

Supervised machine learning

Literature [7,13]

[38,52]

FL or FL-based approach ANN or ANN-based approach

[28,42]

[9,35,41,52] Can solve problems with uncertainty; self-organization, self-adaptability and error tolerance

SVM

[32,41]

ELM

[31]

GA

[6]

GP

[36,50]

SD

[1]

MC

[14]

BMA

[22,45]

BN

[8,54]

Can handle both small and large samples; a nonlinear intelligence model Can handle both small and large samples; can solve nonlinear problems with high speed; can solve issues like local minima and over-fitting Global search and robust optimization procedures GP is a self-evolution method; it is particularly useful for “data rich, theory poor” cases. Can simulate complex feedback systems Able to estimate the probability of forecast concentrations/states Able to fully account for uncertainty of both parameter and model structure Can handle problems with uncertainty and mine causality among variables

Would be hugely affected by fluctuation values

It is a black-box system for time series forecasting considering data of attributes, which is hard to offer insight into the nature of the dataset. It is also a black-box system for multiple variables time series forecasting, whose mechanisms are hardly to be interpreted. Much more hidden modes and huge model structure; adjustment capacity of the hidden layer is weak due to parameter random selection. Search speed is slow; need tricks for design of crossover and mutation operators. Demanding specific skills of users; need several amounts of data to train the model and more data to validate it Hard to achieve model validation due to the multifarious interactions between the selected parameters Only can be used to forecast the median value Hard to determine the weighting schemes Generally, needing expert experience; the quality of results depends on the amount of evidences; it is a method for classification, only providing forecast values of discrete states.

Note: MLR = multiple linear regression; ARMA = autoregressive moving average; ARIMA = autoregressive integrated moving average; FL = fuzzy logic; ANN = artificial neural networks; SVM = support vector machine; ELM = extreme learning machine; GA = genetic algorithms; GP = genetic programming; SD = system dynamics; MC = Markov chain; BMA = Bayesian model averaging; BN = Bayesian network.

whose architecture is shown in Fig. 2. This integrated framework comprises four major components: 1) data-pattern setup; 2) tentative exploration of data-pattern similarity; 3) AEW-CHMM-based forecasting; and 4) early warning of risk, which will be introduced in the following subsections, respectively.

3.2. Characteristics of the forecast problem As introduced in Section 1, the variation of MC concentration is related to numerous biotic and abiotic factors which interact with each other and whose correlations with MCs do not reflect linear relationships. And the variation is also highly affected by uncertainty of variations of these factors, which is a hidden and comprehensive parameter in water systems. So, MC generation possesses the characteristics of uncertainty, nonlinearity, fluctuation, and strong interference. These multitudinous related factors increase the difficulty of MC forecasting, especially when the mechanisms driven by these external forces are poorly understood.

4.1. Data-pattern setup This subsection presents the data-pattern setup (part I in Fig. 2) which includes features extraction of related factors and data-pattern setting. 4.1.1. Features extraction MC concentration forecasting is a complex process that is related to numerous biotic and abiotic factors. While, too many dimensionalities of input data would reduce the training and recognition speed of machine learning methods, and interactional factors make it difficult to interpret a complex system using qualitative and quantitative knowledge. Features extraction of related factors can simplify problems.

4. The integrated framework To provide a complete set of solutions for the problem of MC forecasting and early warning of risk, we propose a novel framework

E2 f2 E4

T21

T13 Values

T22 T12

f4

T23

T11

E3

f3

T41

T33 T31 T32

T42 T43

f1 E1 0

2

4

6

8

10

12

14

t

16

18

20

22

Fig. 1. A simple example of a conventional Chl-a time-series data signal.

24

26

28

30

92

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

Fig. 2. Architecture of the integrated framework for early warning of risk.

Among methods for features extraction and dimensionality reduction, the PCA can reduce the dimensionality of input variables, by which original data information is maintained by some principal components with certain abstract physical meanings. In this regard, the PCA outperforms other methods, such as the linear discriminant analysis and the locally linear embedding. After orthogonal transformation with the PCA, a set of correlated variables can be transformed into a set of linearly uncorrelated variables. This can be specified by: F i ¼ μ i1 X 1 þ μ i2 X 2 þ ::: þ μ ip X p ; 1≤i≤p

ð1Þ

where Xi is the input variable; μi is the related Eigen vector; Fi is the principal component. The PCA mainly includes the following six major steps. After reducing the dimensionality of original data (D), the new data matrix (F) can be used to set up data patterns (Ω). The pseudo code of the PCA

Inputs: D , ε, // D = [x1, x2, … , xi, … , xn]T ∈ Rn × p and ε = 0.9, where xi = (xi1, xi2, ... , xip) . Step 1. Z= standardization (D), // standardization for input variables X1 , X2 , . . . , Xp. Step 2. Σ = (ZTZ)/n, // compute the correlation matrix. Step 3. (λ1, λ2, ... , λp) = eigenvalues (Σ), // compute eigenvalues. Step 4. (μ1, μ2, ... , μp) = eigenvectors (Σ), // compute eigenvectors. m

4.1.2. Data-pattern setting Data patterns that can reflect the features of related factors refer to specific data vectors which follow a defined rule. In order to implement forecasting via the AEW-CHMM, datasets of observations are usually arranged in a specific order so that every data vector forms a pattern which constitutes the basis for data-pattern recognition in the AEW-CHMM. Chl-a concentration and the principal components derived from the PCA can be regarded as a data pattern Ω t = (Chla t, F 1t, F2t , ... , Fmt) T , which can interpret the condition or behavior of a eutrophic water body at time point t. Historical data patterns can provide reference for future forecasting. When a similar situation occurs again, this historical pattern can be utilized to forecast the following trend. 4.2. Tentative exploration of data-pattern similarity We show the coarse similarity of daily data patterns in this subsection (part II in Fig. 2). A combination of five mainly related factors including total nitrogen concentration, total phosphorus concentration, temperature, intensity of sunlight and wind speed were selected to extract the features of related factors. Chl-a concentration and three features extracted by the PCA on each day constitutes a data pattern. To tentatively implement similarity exploration of daily data patterns, the cosine of the angle between two vectors in multi-dimensional space (cosine similarity, 1 Eq. (2)) is used to measure the coarse similarity of two data patterns(SΩiΩj, for i ≠ j). In

p

Step 5. Select the smallest m with f ðrÞ ¼ ∑i¼1 λi =∑i¼1 λi ≥ε, // choose dimensionality via the criterion of cumulative variance contribution. Step 6. F = (F1, F2, ... , Fm) = {ai | ai = (μ1, μ2, ... , μm)Txi, for i = 1, ... , p}, // generate the reduced dimensionality data. Outputs: new data matrix F = (F1, F2, ... , Fm), // F ∈ Rn × m.

1 The data-pattern similarity discovered by the cosine similarity and the hidden Markov model has a little difference, the former only calculates the vector angle to discern similar related factors (the coarse similarity), and the latter focuses more on the influence of a hidden variable to discover similar situations under the same state of the hidden variable given an observation sequence (the fine similarity). Because the cosine similarity can be easily implemented, it is utilized to tentatively explore similarity of daily data patterns.

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

93

general, two data patterns can be regarded as exactly similar if the cosine value exceeds a threshold of 0.9.

the referential point possess a similar growth rate which is measured by the growth rate difference (τ) of Chl-a:

    Chlai Chla j þ F 1i F 1 j þ F 2i F 2 j þ F 3i F 3 j SΩi Ω j ¼ cos Ωi ; Ω j ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð2Þ Chlai 2 þ F 1i 2 þ F 3i 2 Chla j 2 þ F 1 j 2 þ F 3 j 2

    ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ τ¼γ Tþ1 −γ t þ1 ¼ ΩTþ1 −ΩT =ΩT − Ωt  þ1 −Ωt =Ωt 

It is common knowledge that a historical situation which can be expressed by a data/context/event pattern will be repeated more or less in most areas. We have a conjecture as follows naturally. Conjecture 1. The data pattern defined above in a massive historical dataset of a eutrophic water body satisfies the characteristics of repetition. To test our conjecture via field data, we randomly chose a series of historical data which include both general points and fluctuating points to perform numerical analysis. Graphical results are not all presented here due to space limitations. Fig. 3a, b, c, d, e, and f shows that we can detect a similar data pattern to the current time point from historical data patterns, which indicates that Conjecture 1 is reasonable. Now that the referential point for the current point can be discovered in historical data, it is quite natural that we aim to make use of the referential point to forecast Chl-a of the following day. Conjecture 2. The following day (T + 1) of the current point (T) exhibits a similar data pattern to the next day (t⁎ + 1) of the referential point (t⁎). Fig. 3a’, b’, c’, d’, e’, and f’ reveals that the data patterns between the T + 1 day and the t⁎ + 1 day are indeed similar, only with a little variation on the basis of T and t⁎, which can be interpreted by the coherence of meteorological, hydrological and water quality conditions. So, similar data patterns in historical data can provide referential meaning for future forecasting. In addition to discovering the pattern similarity described in Conjecture 1 and Conjecture 2, we also find that the current point and

(a)

ð3Þ

where Ω(1) is the first element in the data vector, which denotes the Chl-a concentration; γT+1 and γt⁎ +1 are the growth rates of Chl-a of the T+1 point and the t⁎ +1 point, respectively. Theoretical numerical range of τ is real, and two growth rates are more similar if τ is closer to 0. Consequently, based on pattern similarity and growth rate similarity, the core assumptions for performing forecasting (in Section 4.3.2) are listed as: 1) the similar referential point in the historical dataset can be used to perform forecasting; and 2) the growth rate that follows the similar referential point is the same as the point that we are attempting to forecast. 4.3. AEW-CHMM-based forecasting procedure This subsection presents the AEW-CHMM-based forecasting procedure (part III in Fig. 2) which includes three parts: 1) CHMM modeling; 2) performing forecasting; and 3) the performance evaluation, which correspond to the following three subsections, respectively. 4.3.1. Continuous hidden Markov model modeling The hidden Markov model (HMM) [3] is a double embedded stochastic process which includes a Markov process and a general stochastic process. The simple CHMM component structure includes a hidden state sequence and a directly visible observation sequence. In this study, the eutrophication level of a water body is taken as a hidden variable. The hidden state sequence is assumed to follow a discrete-time, finite-state, and first-order Markov chain which implies that the current state is dependent only on the previous state. Chl-a concentration and principal components derived from the PCA are regarded as an observation vector which is a continuous random vector.

(a )

(b)

(b )

(c)

(c )

(d)

(d )

(e)

(e )

(f)

(f )

Fig. 3. Current data pattern matched with historical similar pattern (a, b, c, d, e, f); data-pattern comparison between the following day of current point and the referential point (a’, b’, c’, d’, e’, f’). For all horizontal axes, the number 1, 2, 3, and 4 denote the Chl-a, and three extracted principal components F1, F2, and F3, respectively. All vertical axes are dimensionless value.

94

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

 BÞ  ML ¼ ðπ  ¼ arg maxλ PðOjλÞ, where π ¼ fπ i g, A ¼ faij g, and B ¼f  ; A; λ  b ðOÞg. The updated initial state probability (π ), transition probabil-

a1N a1j a2j

a1i a11

a22 a12 a21

s1

j

a2N

s2

aii a2i ... ai2

ai1

aiN ajj aij ... aji

si aj2

sj

jk

aNN ajN ... aNj

sN

aNi aN2

aj1 aN1 b1(O) O

b2(O) O

bi(O) ...

O

bj(O) ...

O

bN(O) ...

 i ¼ ξ1 ðiÞ π

ð5Þ

T−1 i j ¼ ∑T−1 a t¼1 ξt ði; jÞ=∑t¼1 ξt ðiÞ

ð6Þ

c jk ¼ ∑Tt¼1 ξt ð j; kÞ=∑Tt¼1 ∑Kk¼1 ξt ð j; kÞ

ð7Þ

T T μ jk ¼ ∑t¼1 ξt ð j; kÞ  Ot =∑t¼1 ξt ð j; kÞ

ð8Þ

  T T  ¼ ∑T ξ ð j; kÞ  O −μ Ot −μ jk =∑t¼1 ξt ð j; kÞ Σ t jk jk t¼1 t

ð9Þ

O

Hidden states

Observation vector

Transition probability

Emission probability

where ξt(i) and ξt(j) respectively represent the probability of being in si and sj at time t; ξt(i, j) represents the probability of being in si at time t and in sj at time t + 1; ξt(j, k) represents the probability of being in sj at time t with the kth mixture component; and Ot is the observation at time t in the observation sequence. The forward–backward variables α and β can be used to illustrate these probabilities:

Fig. 4. The relationship of hidden states and observations in an AEW-CHMM.

The main difference between continuous HMMs and discrete HMMs lies in the statistic property of observations. In this study, the observation of each state is a vector of continuous random variables. Fig. 4 shows the translation relationship of hidden states and observations in a CHMM. For each hidden state, a probability exists for the current state to transform into any possible state; while each observation vector (i.e., data pattern) can be observed with a probability in its corresponding hidden state. The primary notations that will be used for the rest of this paper are listed as follows: N: The number of the hidden states of this model, i , j =1 , 2 , . . . , N; T: Length of the state sequence or observation sequence, t = 1, 2 , . . . , T; S: A set of states, S ={s1, s2, ..., si, … , sN}; Q: State sequence {q1, q2, ..., q3}, where qt =si represents the hidden state at time t; O: Observation vector sequence, O= {O1, O2, ... , Ot, ... ,ΟT}; π: The prior probability of initial state {πi}; A: Transition matrix A = {aij}, aij represents state transition probability; B: A family of probability density functions B = {bj(O)}; bj(O): the occurrence probability of observation at state j; λ: The overall parameters λ =(π,A, B); þ∞ where ∑i πi ¼ 1, ∑ j aij ¼ 1, ∫ −∞ b j ðxÞdx ¼ 1, and bj(O) can be illustrated by a widely used multi-dimensional Gaussian distribution:   b j ðOÞ ¼ ∑k P K jh¼ kjst ¼ j ib jk ðOÞ ¼ ∑k c jk Φ O; μ jk ; Σ jk ; 1 ≤ j≤N; 1≤k ≤K

i

i j ), mixture coefficients (c jk ), mean vector (μ jk ), and covariance ity (a  ) are expressed as follows: matrix (Σ

ð4Þ

where O is the vector of observations being modeled; Φ[O, μjk, Σjk] denotes Gaussian density; cjk, μjk, and Σjk represent mixture coefficients, mean vector, and covariance matrix for the kth mixture component at state j, respectively; and ∑k cjk ¼ 1. A priori knowledge concerning values of parameters λ = (π, A, B) is extremely useful for parameter initialization, but it is often not available. In the step of parameter initialization, only the basic constraints above are considered to initialize parameters (λ) randomly. In the training step, once we obtain the observation sequence O = {O1, O2, ... , Ot, ... , ΟT} the parameters (λ) of an the CHMM can be trained by the Expectation–Maximization (EM) algorithm [44] which iterates ceaselessly until it converges to the maximum likelihood. After updating all parameters, the new parameter set is

XN ξt ðiÞ ¼ α t ðiÞβt ðiÞ= i¼1 α t ðiÞβt ðiÞ

ð10Þ

XN ξt ð jÞ ¼ α t ð jÞβt ð jÞ= α ð jÞβt ð jÞ j¼1 t

ð11Þ N

N

ξt ði; jÞ ¼ α t ðiÞai j b j ðOtþ1 Þβtþ1 ð jÞ=∑i¼1 ∑ j¼1 α t ðiÞai j b j ðOtþ1 Þβtþ1 ð jÞ ð12Þ 2

32 h i 3 6 α t ð jÞβt ð jÞ 76 c jk Φ Oth; μ jk ; Σ jk i7 ξt ð j; kÞ ¼ 4XN 54XK 5: α ð jÞβt ð jÞ c Φ Ot ; μ jk ; Σ jk j¼1 t k¼1 jk

ð13Þ

Since the EM algorithm converges to a local optimum or saddle point most of the time, the log-likelihood estimator of the EM algorithm may differ from the maximum log-likelihood estimator. To improve the quality of parameter estimation, in this study, the model is trained by varied initial parameters with multiple runs to obtain updated parameters by choosing the biggest log-likelihood. It is arduous but crucial to determine the number combination of hidden states in a CHMM and mixture components of the Gaussian Mixture Model (GMM). A tradeoff invariably exists between a more accurate result and a smaller scale calculation for this problem. In this paper, Bayesian Information Criterion (BIC) [48] which is a useful model selection criterion for large sample dataset, is employed to determine the numbers as the method attempts to optimize the tradeoff between complexity and goodness of fit. We consider the number combination with the smallest BIC as the best for modeling.   BIC ¼ −2 ln ðLÞ þ pln N0

ð14Þ

p ¼ N þ NðN−1Þ þ N ðK þ K ðK þ 1Þ=2Þ

ð15Þ

where L is the likelihood function; p is the number of parameters to be estimated which include state, transition, and emission parameters; and N′ is sample size; N is the number of hidden states; and K is the number of mixture components of the GMM. In the decoding step, the new parameter set λ and the observation sequence O are available. The most probable sequence of states Q⁎ = {q 1⁎ , q 2⁎ , ... , q T⁎ } can be calculated by the Viterbi decoding [44].

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

Q⁎ can be expressed as:     : Q  ¼ arg maxQ P Q jO; λ

ð16Þ

Let δt(i) be the maximal probability of state sequences {q1 , q2 , . . . , qt , qt = si} at time t which produce the observations {O1 , O2 , . . . , Ot} for the given model.    : δt ðiÞ ¼ maxq1 ;q2 ; :::;qt−1 P q1 ; q2 ; :::; qt ; qt ¼ si ; O1 ; O2 ; :::; Ot jλ

ð17Þ

The Viterbi decoding is a dynamic programming method. Q⁎ = {q1⁎, q2⁎, ... , qT⁎} can be derived when achieving recursive results as max[δt(i)]. 4.3.2. Performing forecasting This subsection introduces how to forecast using the most similar pattern via the CHMM or similar patterns weighted by AEW schemes, respectively.

point values, of which each element corresponds to the element of the same place in the matrix LL. The similarity matrix (LL) and the timeliness matrix (TL) can reflect the “distance” of similarity and the “distance” of timeliness. P = (pij)m× n is a matrix storing forecast values, of which each element corresponds to the element of the same place in the matrix TL, which means that each element is forecast by Eq. (16) and each t⁎ value in Eq. (16) is extracted in the matrix TL. Here, m is the number of previous neighbor patterns before current time point T, and n is the number of similar patterns discovered in historical data. Hassan et al. [23] pointed out that more weights should be assigned to recent days than days further in the past, and that this relation should be non-linear. In this study we selected an exponential function yi = exp (λix) to convey it. Exponential weighting scheme matrices can be expressed by:   W 1 ¼ ω1ij

¼

  W 2 ¼ ω2ij

¼

mn

mn

(1) Forecasting via the most similar pattern The method of searching for the most similar pattern [26] can be adopted to forecast the next value via the CHMM. Once the CHMM is trained, the log-likelihood estimator of each data pattern can be calculated by the forward algorithm [44]. Then, we use Eq. (18) to discover the time point with the most similar pattern in the historical data to the current pattern (i.e., the pattern at time T). t  ¼ arg min fjLLET −LLEt jg 1 ≤t ≤T−1 t

ð18Þ

95



  X   exp λ1 llij = exp λ ll 1 ij j

mn



  X   exp λ2 t ij = exp λ2 t ij j

mn

∀i; ∀j ∀i; ∀ j

ð20Þ ð21Þ

where W1 and W2 represent the exponential weighting scheme matrix of similarity and timeliness, respectively; and exp(λ1llij) and exp(λ2tij) are exponential values with parameters λ1 and λ2, respectively. Note that these two parameters are habitual notations in the exponential function, which are different from the λ introduced in the CHMM modeling. The mean absolute percentage error (MAPE) was selected as a criterion to evaluate the testing performance. So this optimization problem of AEW schemes can be given by: min MAPE

where LLET and LLEt are log-likelihood at current time (T) and historical time (t), respectively. Theoretically, the nearest value of the log-likelihood estimator can be discerned at time point t⁎ from the historical data. MC generation behavior of time point t⁎ is most similar to that of current point T. According to the assumption in Section 4.2, the ith forecast value in data pattern at T + 1 is: ði Þ

ði Þ

ΩTþ1 ¼ OT

 h i ðiÞ ði Þ ðiÞ Ot  þ1 −Ot  =Ot þ 1 ;

1 ≤i≤m þ 1

ð19Þ

where m is the number of principal components. (2) Forecasting via similar patterns weighted by AEW Discovering the most similar pattern is crucial for the forecasting method as above. However, since data quality cannot be guaranteed in reality, the forecast results will be not robust if they depend on only one similar pattern. Owing to the local optimum characteristic of the EM algorithm, the CHMM would find several similar patterns to the current one from the historical data. However, three problems arise if several similar patterns are all considered: 1) timeliness; 2) similarity; and 3) the tradeoff between timeliness and similarity. On the one hand, the closer local optimum may be a better predictor of behavior than a temporally distant global optimum, as similar patterns that are closer in time should have more impact on one another. On the other hand, the more similar pattern should, in principle, have more weight as a better predictor than other less similar patterns. In addition, we have to determine which aspect of similarity and timeliness is more important for the target problem. We summarize this into an AEW scheme optimization problem, in which m previous neighbor patterns before current time point T are used as validation patterns to adaptively generate the most suitable weighting schemes for similar patterns of current point. LL = (llij)m ×n is a matrix storing log-likelihood values which reflect the degree of similarity of similar patterns; each row of LL was ranked in order of descending value. TL = (tij)m × n is a matrix storing time

m n n X X 100 X 1 2  AT−mþi −½r W ði; jÞ  Pði; jÞ þ ð1−r Þ W ði; jÞ  Pði; jÞ =AT−mþi ¼ m i¼1 j¼1 j¼1

ð22Þ s:t: λ1 ≥λ1 N0 λ2 ≥λ2 N0 1Nr ≥r 0 ; r 0 N0

ð23Þ

where AT − m + i is the actual value at time T − m + i; r is the weight or ratio of similarity/timeliness used to determine which aspect of similarity and timeliness is more important. Our proposed approaches are based on similar patterns, in which the aspect of similarity should be naturally given more weight. So, the ratio r0 is set to be 0.5. λ1 and λ2 , the upper value of parameters λ1 and λ2, are determined by the biggest time span in concrete cases. The result of convex analysis shows that the target is neither a convex nor concave function, but rather a general nonlinear function; thus, traditional optimization algorithms, such as the interior point method and sequential quadratic programming, may only converge to a local optima of this nonlinear problem. However, Particle Swarm Optimization (PSO) is a population-based stochastic search algorithm for global optimization [27], which possesses the advantages of intelligence, simplicity, fast speed, and real number code among evolutionary algorithms. Especially, the PSO has the same effectiveness for searching the global optima, but significantly outperforms the genetic algorithm in the aspect of computational efficiency. Furthermore, the PSO is widely utilized to optimize parameters of nonlinear problems including the ANN and the SVM. The velocity update strategy and population update strategy of the PSO in this study can be expressed as: ðkþ1Þ

υid

ðkþ1Þ

xid

  ðkÞ ðkÞ ¼ ωυid þ c1 U ð0; 1Þ pid −xid þ c2 U ð0; 1Þ   ðkÞ  pgd −xid ∀i; ∀d ðkÞ

ðkþ1Þ

¼ xid þ υid

∀i; ∀d

ð24Þ ð25Þ

96

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

where k is the index of generations; i is the index of the target particle; d is the index of dimension; υi is the velocity; xi is the position of particle; pi is the best position for index i; pg is the global best position; ω is a constant representing the inertia weight; c1 and c2 are acceleration constants; and U(0, 1) is a uniform random number generator. Two computation examples, the first row and the second row as shown in Fig. 5, are used to analyze the characteristics of the target function. Graphs of target function values reveal that the number of local optima is relatively small and the downtrend is quite smooth, which empirically implies that searches can be easily and quickly implemented by employing the PSO algorithm. We summarize the proposed AEW-CHMM briefly. In order to forecast the value of the next pattern, we use a CHMM to discover n historical similar patterns for each of the m previous neighbor patterns before the current one. The AEW scheme optimization is proposed to adaptively search the most suitable weighting schemes for the m historical actual data. The new weighting schemes derived from optimized parameters are then used for those similar patterns of the current time point (T). In the horizon of a rolling process, weighting schemes can be adaptively updated in a data-driven manner. The forecast value of the first element in the data pattern at time T + 1 is: 0   1 exp λ1 ll j A n ð1Þ ΩTþ1 ¼ r  ∑ j¼1 @X    exp λ1 ll j j

1n

0   1 exp λ2 t j A n  P j þ ð1−r Þ∑ j¼1 @X    exp λ2 t j j

 Pj

ð26Þ

1n

where r⁎, λ1⁎, and λ2⁎ are derived from the exponential weighting scheme optimization; llj and tj are the jth value in similarity and timeliness vectors, respectively, which are newly discerned for the current point; and Pj is a forecast value which corresponds to the jth similar data pattern. To elucidate the adaptability of the proposed approaches when encountering concentration fluctuation, we take both the original training data and the forecast data as a new training dataset to forecast the next concentration dynamically. A three-step-ahead forecasting (n = 3 in Fig. 6) immediately after the fluctuating point was implemented based on single-step-ahead forecasting.

4.3.3. Performance evaluation The root mean square error (RMSE), the MAPE, and the adjusted coefficient of determination (Adjusted R2) were applied to assess the forecasting accuracy. These three criteria are given by: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u1 X RMSE ¼ t ðAt −F t Þ2 n t¼1

MAPE ¼

n 100 X At −F t  n At t¼1

Adjusted R2 ¼ 1−

 n−1 n n  2 ∑ t¼1 ðAt −F t Þ2 =∑ t¼1 At −A n−k−1

ð27Þ

ð28Þ

ð29Þ

where At is the actual value; Ft is the forecast value; n is testing sample size; and k is the number of explanatory variables. The ARIMA, the ELM, the radial basis function (RBF) network, and the SVM are all effective time-series forecasting models which have been wildly applied to perform forecasting or make comparisons with other machine-learning models [4,15,16,33,34,51]. In the case study of Section 5, the computational results of the proposed approaches were compared with those of the ARIMA, the ELM, the PSO-RBF network, and the PSO-SVM. In addition, the approach of the CHMM utilizing only the most similar pattern (MSP-CHMM) is specially employed as another comparison method to demonstrate the robustness of the results forecast by the AEW-CHMM. 4.4. Early warning of risk We show the early-warning-of-risk process in this subsection (part IV in Fig. 2). How to obtain the transformation ratio of MCs/Chl-a and perform early warning of risk is introduced consecutively.

Fig. 5. Graphs of the target function values (one parameter is fixed in each graph).

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

OT

OT -1

T +1



T + n -2

O1

O1

Notes: O1 , O 2 ,..., OT: Observation vectors of time series;

...

...

...

... O1

T + n -1

T +2

AEWCHMM model

OT

...

...

... O1

T +1

AEWCHMM model

T +1 ,

T +2 ,...,

T +n :

97

AEWCHMM model

T +n

Forecast data vectors.

Fig. 6. Architecture of multi-step-ahead forecasting based on single-step-ahead forecasting.

into accounts and ensure transformation accuracy compared with the sample average. Bayesian hierarchical modeling is a suitable method to discern the MC/Chl-a ratio in the practice. Unfortunately, as water sample data of previous cases some years ago may be not available, we also provide an alternative simple processing method to solve this problem for the case study under the premise of cyanobacteria dominance. The ratio of MCs/Chl-a was approximately 0.5 (μg/μg) for most samples if the cyanobacteria were dominant [17,30]. The WHO [53] recommended that this ratio be 0.4 under normal circumstances if the cyanobacteria dominate. Consequently, we forecast Chl-a first, and then the forecast value intervals of MCs are calculated via the ratio in the range of 0.4 to 0.5.

4.4.1. Transformation ratio of MCs/Chl-a If the sample average is simply regarded as the surrogate for the ratio of MCs/Chl-a, forecasting uncertainty of MCs would increase markedly since the mean value would always lose a substantial amount of useful information. As a Bayesian hierarchical modeling enables the exploitation of robust probabilistic measure of uncertainty and error via precisely accommodating parameter uncertainty and measurement error, in order to deal with above issues, we propose a Bayesian hierarchical model (Fig. 7) to discern this ratio which is utilized to obtain forecast values of MCs after performing Chl-a forecasting. Here, we take the Marina Reservoir of Singapore as an example to interpret the hierarchical modeling. By spatial segmentation, this reservoir is divided into seven subdomains (S1 , S2 , . . . , S7 ) of which S1 , S2 , S4, and S6 are inflow zones; and S 3 , S 5, and S7 constitute the main body of this reservoir. Let θij be the ratio of MCs/Chl-a at the jth water sample collection site of the subdomain Si. Under the assumption that the ratios collected from different water samples of certain subdomain obey a normal distribution, the sub-system parameter of each subdomain can be expressed as the sub-layer prior θi ~ N(μi, σ2i ) after collecting water sample data on a specific day of each month. The hyper prior θ ~ N(μ, σ2) is a global parameter. We can estimate the global ratio (θ) of this reservoir via the multi-layer prior method. Furthermore, all θi can be estimated via this Bayesian hierarchical modeling. Briefly, each month we collect water samples once, and then use the estimated global ratio or the estimated ratio interval instead of sample average as the ratio of MCs/Chl-a, or use the estimated θi if each subdomain of the reservoir needs to be considered. This method can reduce work complexity compared with the method of collecting daily MC data directly, and can take uncertainty

~ N( ,

2

1

~ N ( 1,

2 1)

~ N(

2 2, 2 )

4

3

~ N(

2 3, 3 )

~ N(

2

4.4.2. Performing early warning of risk After forecasting MCs, the risk of exposure to MCs can be measured via proper guidelines for the relative risk of adverse health effects. To facilitate understanding of the risk of exposure to MCs, the WHO provided a series of coarse-grained guidelines (Table 2) for the relative probability of experiencing adverse health effects during recreational exposure to cyanobacteria and MC-LR [53]. It is not convenient in our study to measure the risk of exposure to MCs via the WHO guidelines of the MC-LR. The WHO [53] indirectly showed that the average proportion of the MC-LR among the total MCs was probably 25%–50% if the cyanobacteria dominate. Bláhová et al. [5] suggested that this average proportion was 42% in their samples. Zhang et al. [55] indicated that this proportion was 28.1% from August to October, and the maximum was 51.6% in July. For a real problem in the future, we can collect field data to obtain this average proportion

)

2 4, 4 )

6

5

~ N(

2 5, 5 )

~ N(

2 6, 6 )

7

~ N(

2 7, 7 )

3...

39 38

31

32

37 35 33

36 34

Fig. 7. Architecture of the Bayesian hierarchical model. The map of the Marina Reservoir is extracted from Singapore's National Water Agency website.

98

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

Table 2 WHO guidelines for the relative probability of adverse health effects.

Table 4 Total variance explained for principal component analysis.

Relative probability of adverse health effects

Cyanobacteria (cells/mL)

MC-LR (μg/L)

Low Moderate High Extremely high

b20,000 20,000–100,000 100,000–10,000,000 N10,000,000

b10 10–20 20–2000 N2000

Extraction sums of squared loadings

Initial eigenvalues Component

of each month or season by using the Bayesian hierarchical modeling proposed above. Table 3 presents the final revised guidelines for the relative risk of adverse health effects. In addition to switching the guidelines of the MC-LR into the total MCs, we also add two risk levels according to the MC-LR standards promulgated by the WHO and the Australian government [12,18,53] Different colors are used to facilitate easier visualization of levels of warning. These guidelines contribute to our being able to successfully execute quick identifications and early warning for risk caused by microcystis blooms. 5. Case study To validate the proposed approaches, the case study tests a real dataset of one shallow lake. This section includes five subsections: 1) case description; 2) forecasting process; 3) parameter setups of comparison approaches; 4) forecast results and discussion; and 5) earlywarning-of-risk results. 5.1. Case description Lake A, the test object in this study, is a large shallow lake. The high total nitrogen and total phosphorus concentration in the waste water discharged from sewage disposal plants has caused serious eutrophication in this lake. The provided dataset has a total of 731 daily observations during a two-year period. This dataset possesses useful qualities in terms of data measurement and frequency, and would constitute a superior candidate for examining the proposed approaches. 5.2. Forecasting process MC generation is related to numerous biotic and abiotic factors, such as meteorological and biological factors. In this study, we concentrated on the five mainly related factors which include total nitrogen, total phosphorus, intensity of sunlight, temperature and wind speed. These five related factors were transformed into three standardized principal components via the PCA, by which the cumulative variance contribution is more than 90% (see the bold value in Table 4). Then, Chl-a and three

1 2 3 4 5

Total

% of variance

Cumulative %

Total

% of variance

2.666 1.067 0.789 0.321 0.157

53.325 21.346 15.770 6.412 3.147

53.325 74.671 90.441 96.853 100.000

2.666 1.067 0.789

53.325 21.346 15.770

Cumulative % 53.325 74.671 90.441

principal components at time point t constitute an observable data pattern Ωt = (Chlat, F1t, F2t, F3t)T. For the single-step-ahead forecasting, all seven fluctuating points and ten arbitrary general points after the 550th day were selected to test the forecast accuracy. Using the BIC criterion, we set the mixture components of the GMM to four and the number of hidden states to five (see bold values in Table 5). When the most similar pattern was used to forecast directly, to reduce the impact of local optima and facilitate forecasting accuracy, for each forecasting point, 100 runs of simulations from different initial parameters (λ) were carried out to obtain one forecast value by choosing the biggest log-likelihood estimator. When the similar patterns weighted by the AEW were utilized to forecast, for the single-step-ahead forecasting, three previous neighbor patterns were taken to optimize the exponential weighting schemes, and four most similar historical patterns were discovered to forecast the next value. For the three-stepahead forecasting immediately after the fluctuating point, the historical similar patterns were also set to be four. However, one previous neighbor pattern, two previous neighbor patterns, and three previous neighbor patterns were, respectively, selected to optimize the exponential weighting schemes for the first point, the second point, and the third point due to the interference of the fluctuating point. The biggest time span in this case is 550 to 731, so the upper value of the variable parameters of the exponential function (Eq. (23)) was empirically set to be 0.05. The parameters of the PSO are listed as follows: inertia weight was 0.7298 and two acceleration constants were both 1.4961, which are consistent with [27]; population size was 100; maximum generation was 500; and mutation rate was 0.02. To avoid blind searches of the particle swarms, the maximum velocity ranges of λ1, λ2 and r were chosen to be [−0.05, 0.05], [−0.05, 0.05], [−0.15, 0.15], respectively. An assumption test method of the chi-square statistics was used to verify Markovian property of the hidden state sequence. The null hypothesis is that the random process of state transitions follows a first-order Markov chain and Markovian property is significant. The alternate hypothesis is set to reject the null hypothesis. After implementing calculation of chi-square statistics, χ2 = 120.8 (sample

Table 3 Revised guidelines for the relative risk of adverse health effects. Relative risk of health effects

MC-LR (μg/L)

Total MCs (μg/L)

Warning color

Permissible range of aquatic Activities

Nearly no

b1

b1/ωt

Green

Nearly no risk for drinking water

Low

1–10

1 ωt

10 ω t

Cyan

Low risk for drinking water and other recreations

Moderate−

10–20

10 ωt

20 ω t

Blue

Moderate+

20–100

20 ωt

 100 ωt

Yellow

High

100–2000

100 ωt

 2000 ωt

Orange

Extremely high

N2000

N2000 ωt

Red

Controllable risk for recreational water of whole-body contact (such as swimming, surfing and bathing); not for drinking Controllable risk for recreational water of incidental contact (such as fishing or boating); not for drinking or whole-body contact. Activities had better be restricted because potential risk for acute poisoning exists. Nearly no aquatic activities should be permitted because high potential risk for acute poisoning and strong skin irritations exists.

Note: ωt represents the average proportion of the MC-LR among the total MCs in each month or each season.

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

99

Table 5 BIC scores of different continuous hidden Markov models. NHS

NGMC

3 3 3 3 4 4

1 2 3 4 1 2

MLE

BIC

−4810.3 −3681.7 −2839.9 −2086.4 −4204.6 −3064.0

9715.8 7515.6 5908.3 4496.4 8561.5 6356.4

NHS

NGMC

4 4 5 5 5 5

3 4 1 2 3 4

MLE

BIC

−2018.2 −1228.8 −3582.2 −2351.3 −1400.0 −763.5

4366.4 2914.5 7386.5 5019.9 3244.2 2129.7

NHS

NGMC

6 6 6 6

1 2 3 4

MLE

BIC

−3370.9 −2142.4 −1254.7 −692.5

7046.4 4703.6 3080.5 2146.5

Note: NHS = the number of hidden states; NGMC = the number of GMM mixture components; MLE = the maximum log-likelihood estimator.

size is 50), χ2 = 289.8 (sample size is 100), and χ2 = 613.3 (sample size is 200). χ2 N χ2α((q − 1)2) = 32 (q =5 and α= 0.01), where α is the significance level and (q − 1)2 is the degree of freedom. The test results show that the null hypothesis cannot be rejected, and the hidden state sequence {q1, q2, ... ,qT} satisfies Markovian property. 5.3. Parameter setups of comparison approaches For the ARIMA, three principal components are regarded as independent variables. In models of the ELM, the PSO-RBF network, and the PSO-SVM, for i-step-ahead forecasting (i = 1 ,2 , . . .), Chl-a concentration in t + 1 day (Chlat+1) is a function of Chl-a concentration in t + 1− i day (Chlat +1 − i) and another three principal component values in t + 1 − i day (F1 , t + 1 − i, F2 , t + 1 − i, and F3 , t + 1 − i). The best-fitted ARIMA model (ARIMA (1, 1, 2)) was selected to implement forecasting according to the autocorrelation function plot and normalized BIC. The ELM was designed by referencing [25]. In the ELM method, the Hardlim function was selected as an activation function compared with sigmoidal, sine, triangular basis, and radial basis functions according to the accuracy criterion of RMSE during the training and validation stages, and hidden nodes were chosen to be 25 through “trial and MSE error” experiments among the range 5 to 30. In the PSO-RBF network, the number of hidden neurons was decided through similar experiments. Centers and widths of the Gaussian functions were clustered by the k-means algorithm, and

set according to the variances of points in the corresponding clusters, respectively. The synaptic weights between hidden neurons and the output neuron were optimized by the PSO, whose ranges were set to be ±0.5 times the initial optimized weights vector. The SVM was designed with the LIBSVM software [10]. In the SVM, the RBF kernel was selected compared with linear, polynomial, and sigmoid kernel functions according to the criteria of RMSE and R2 during the training and validation stages. In the process of parameter selection, five-fold cross-validation was employed to avoid an over-fitting problem. Parameter ranges for the regularization C, the epsilon, and the gamma constant were optimized by the PSO, whose ranges were chosen to be[10− 1, 101], [10− 2, 10−1], and [10− 1, 100], respectively. To avoid blind searches, the maximum velocity ranges of these parameters were restricted to be 0.6 times of above ranges. Basic PSO parameters for both the RBF network and the SVM are listed as follows: inertia weight was 1.0; two acceleration constants were both 1.4961; mutation rate was 0.02; maximum generation was 300; and population size was 50. 5.4. Forecast results and discussion The forecast results of the AEW-CHMM, the MSP-CHMM, the ARIMA, the ELM, the PSO-RBF network, and the PSO-SVM for fluctuating points, general points, and three-step-ahead forecasting immediately after the fluctuating point were compared in Figs. 8, 9, and 10, respectively.

Actual Value AEW-CHMM MSP-CHMM ARIMA ELM PSO-RBF PSO-SVM

Chl-a concentrations mg/L

0.45 0.40 0.35 0.30 0.25 0.20 #577

#592

#607

#622 Day

#637

#682

#712

Fig. 8. Forecast values and actual values of the testing dataset of fluctuating points.

Chl-a concentrations mg/L

0.45

Actual Value AEW-CHMM MSP-CHMM ARIMA ELM PSO-RBF PSO-SVM

0.40 0.35 0.30 0.25 0.20 0.15 #650

#659

#666

#677

#683 Day

#691

#701

#714

#724

Fig. 9. Forecast values and actual values of the testing dataset of general points.

#731

100

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

Chl-a concentrations mg/L

0.50 Actual Value AEW-CHMM MSP-CHMM ARIMA ELM PSO-RBF PSO-SVM

0.45 0.40 0.35 0.30 #608 #609 #610

#623 #624 #625

#638 #639 #640

#683 #684 #685

Day Fig. 10. Forecast values and actual values of the testing dataset for three-step-ahead forecasting.

Table 6 Forecasting accuracy of different approaches. General points

ARIMA ELM PSO-RBF PSO-SVM MSP-CHMM AEW-CHMM

Fluctuating points

Three-step-ahead forecast

MAPE (%)

RMSE (*103)

Adjusted R2

MAPE (%)

RMSE (*103)

Adjusted R2

MAPE (%)

RMSE (*103)

Adjusted R2

1.08 1.63 1.63 1.87 0.47 0.55

4.26 7.03 5.98 6.37 2.50 2.35

0.991 0.975 0.982 0.979 0.997 0.997

3.46 3.01 2.66 2.49 2.28 2.36

12.80 13.90 9.27 9.42 8.60 9.20

0.802 0.772 0.897 0.895 0.910 0.898

3.21 1.66 2.26 1.59 0.46 0.57

21.4 9.40 10.42 7.83 2.53 2.68

0.617 0.925 0.909 0.952 0.994 0.994

Note: * means that all RMSE values above are yielded via multiplying 1000.

Table 6 summarizes statistical errors of these forecast values, which indicates that the MSP-CHMM and AEW-CHMM approaches outperform the ARIMA, the ELM, the PSO-RBF network, and the PSO-SVM. The proposed approaches also have stronger adaptability when encountering the concentration fluctuation as the MSP-CHMM and AEW-CHMM exhibit favorable abilities for the three-step-ahead forecasting immediately after the fluctuation point. To reveal transition states of the hidden variable, the path of hidden states for forecasting the 731th day is shown in Fig. 11, which reflects the changes of the water body eutrophication level, where states 1, 2, 3, 4, and 5 represent five increasing levels. As introduced in Section 4.2, two truly similar patterns have characteristics of both hidden state similarity and observation similarity. So the proposed approaches with the function of revealing corresponding hidden states are valuable for exploring the nature of problems which intend to forecast future trends using historical similar patterns. To validate the proposed approaches in a more detailed manner, we do not simply separate the data set into one training set and one testing set, but rather take different kinds of data points into consideration, which is more similar to dynamic daily forecasting. In this situation, each time, some parameters of the ELM and the RBF network need to be optimized manually and blindly, which imposes a heavy workload, and even produces imprecise results. However, the proposed approaches overcome this problem, as their model parameters can be

automatically updated by corresponding observations and the EM algorithm, and as their weighting schemes can be generated adaptively. The AEW-CHMM offers an obvious advantage over the ARIMA when encountering the concentration fluctuation since the proposed approaches focus on discovering similar data patterns and do not pay attention on data time-series fitting, while the ARIMA depends overly on recent values in the time series. The traditional regression-based approaches would perform worse considering that Chl-a concentration has greater fluctuation after artificial controlling for reservoirs and lakes, in addition to its own fluctuation of algal bloom. However, the proposed approaches are not affected by fluctuations or trend disruption in time-series signal, which demonstrates their robustness for forecasting. The MSP-CHMM slightly outperforms the AEW-CHMM in this case, as the used dataset possesses a favorable quality in terms of data measurement and has been preprocessed reasonably. To show the robustness for coping with the data noise using the AEW-CHMM approach, a total of six points, including two arbitrary fluctuating points (points 637 and 712) and four general points (points 650, 666, 683, and 724) in the testing dataset were selected to test its advantage by adding different levels of random data noise to the next patterns of these most similar patterns in the original data corresponding to these six points. The comparison results in Table 7 show that the AEW-CHMM performs much better than the MSP-CHMM. Combined with results in Table 6, we further conclude that the MSP-CHMM would constitute a good choice for performing forecasts if data quality can be guaranteed; however,

Hidden states

5 Table 7 Statistical errors of the two proposed approaches by adding different random data noise.

4 3 2 1 0

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 Day

Fig. 11. Transition diagram of hidden states for Chl-a forecasting of the 731th day.

Random noise levels

MAPE of MSP-CHMM (%)

MAPE of AEW-CHMM (%)

RMSE of MSP-CHMM (*103)

RMSE of AEW-CHMM (*103)

5% 10% 15% 20%

3.54 5.56 8.12 11.56

0.85 1.89 2.75 3.12

16.7 22.8 32.2 49.9

4.2 9.1 15.3 17.0

Note: * means that all RMSE values above are yielded via multiplying 1000.

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

101

Mcs concentrations mg/L

0.300

Forecasted vaalue intervals Interval lowerr/upper bound

0.250 0.200 0.150 0.100 0.050

#577 #592 #607 #622 #637 #650 #659 #666 #677 #682 #683 #691 #701 #712 #714 #724 #731

Day Fig. 12. Forecast value intervals of MC concentration of testing data.

the PSO-SVM for both single-step-ahead forecasting and three-stepahead forecasting. The results of a robustness test show that the AEWCHMM performs much better than the MSP-CHMM. Subsequently, a Bayesian hierarchical model has been proposed to estimate the ratio of MCs/Chl-a, and transform MC forecasting into Chl-a forecasting. Although this model was not employed in the case study due to the unavailability of historical sample data, this model still offers useful guidance for future practice. Finally, the risk of exposure to MCs has been measured and visually presented via referring to the revised guidelines for the relative risk of adverse health effects. Overall, the computational results clearly demonstrate that the proposed approaches are effective to offer an intelligent decision support tool for daily MC forecasting and early warning of risk. The proposed approaches have potential to be applied to rivers, reservoirs, lakes, and oceans of countries with varied geographies. For instance, the Qingcaosha Reservoir of Shanghai, China, located in the north of the Tropic of Cancer, has distinct four seasons; whereas, the Marina Reservoir of Singapore, situated near the equator, exhibits no obvious climate change. Although meteorological conditions are related factors which influence MC generation, the AEW-CHMM focuses on discovering similar data-patterns which are constituted of features extracted from historical datasets, and less pays attention on variations of meteorological conditions particularly. Future work is needed to improve the EM algorithm to reduce the high computational efforts caused by its local optimal characteristic. To forecast MCs in future work, it would be quite useful to obtain quarterly or monthly field data for: 1) the ratio of the MCs/Chl-a; and 2) the average proportion of the MC-LR among MCs, which are both utilized in the Bayesian hierarchical model to facilitate the accuracy of the early warning of risk. Although the effectiveness of the proposed approaches

the AEW-CHMM would be a more robust approach, even in some worse situations. 5.5. Early-warning-of-risk results The forecast value intervals of MC concentration are transformed and presented in Fig. 12 via Chl-a forecasts and the MC/Chl-a ratio in the range of 0.4 to 0.5. According to the early-warning-of-risk method introduced in Section 5.4, the forecast value intervals of MC concentration were visually arranged into different risk levels (Fig. 13). Based on the results, the permissible range of activities related to recreational water can be determined by referring to Table 3. Let us take the 637th day as an example. It is located in the “moderate + risk” level (yellow belt), which indicates that the risk for aquatic activities of incidental contact with water (e.g., fishing, boating, and walking on the beach) is controllable, and that aquatic activities of whole-body contact (e.g., swimming, surfing, and bathing) should be prohibited. 6. Conclusions and future work In this paper, we have built a novel framework to integrate MC forecasting and the early warning of risk. The AEW-CHMM has been proposed to perform forecasting by means of the robustness of the PCA and an improved CHMM with AEW schemes. Unlike other forecasting approaches, the AEW-CHMM considers both manifest variables and the hidden variable of the water body eutrophication level which function as an invisible hand helping to coordinate the double embedded stochastic process. The computational results indicate that the AEWCHMM outperforms the ARIMA, the ELM, the PSO-RBF network, and

+∞

Extremely high risk

MCs concentrations mg/L

10.000

High risk Moderate+ risk

1.000

Moderaterisk

0.100

Low risk

0.010

Nearly no risk Forecast intervals

0.001

#57

#65

#192

#443

#450

#515

#577

#592

#637

#714

Day Fig. 13. Visual diagram of the risk of exposure to MC forecast by the proposed approaches.

#731

102

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103

has been demonstrated by a case study, their capability and validity in solving more complicated and practical problems, especially the evolutions of complex chaotic systems, require further verification. Acknowledgments This work is jointly supported by the National Research Foundation Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE), and Research Foundation of China (No.14ZDB152, No.20130073110040). The authors are grateful to anonymous reviewers and the editor for their constructive comments. References [1] A. Afshar, M. Saadatpour, M.A. Marino, Development of a complex system dynamic eutrophication model: application to Karkheh reservoir, Environmental Engineering Science 29 (6) (2012) 373–385. [2] J. Andersen, C. Han, K. O'Shea, D.D. Dionysiou, Revealing the degradation intermediates and pathways of visible light-induced NF-TiO2 photocatalysis of microcystin-LR, Applied Catalysis B: Environmental 154 (2014) 259–266. [3] L.E. Baum, T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, The Annals of Mathematical Statistics 37 (6) (1966) 1554–1563. [4] T. Bellotti, R. Matousek, C. Stewart, A note comparing support vector machines and ordered choice models' predictions of international banks' ratings, Decision Support Systems 51 (3) (2011) 682–687. [5] L. Bláhová, P. Babica, E. Maršálková, B. Maršálek, L. Bláha, Concentrations and seasonal trends of extracellular microcystins in freshwaters of the Czech Republic—results of the national monitoring program, CLEAN-Soil, Air, Water 35 (4) (2007) 348–354. [6] J. Bobbin, F. Recknagel, Inducing explanatory rules for the prediction of algal blooms by genetic algorithms, Environment International 27 (2) (2001) 237–242. [7] H. Çamdevýren, N. Demýr, A. Kanik, S. Keskýn, Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs, Ecological Modelling 181 (4) (2005) 581–589. [8] Y. Cha, C.A. Stow, A Bayesian network incorporating observation error to predict phosphorus and chlorophyll a in Saginaw Bay, Environmental Modelling & Software 57 (2014) 90–100. [9] W.S. Chan, F. Recknagel, H. Cao, H.D. Park, Elucidation and short-term forecasting of microcystin concentrations in Lake Suwa (Japan) by means of artificial neural networks and evolutionary algorithms, Water Research 41 (10) (2007) 2247–2255. [10] C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3) (2011) 27, http://dx.doi.org/10. 1145/1961189.1961199. [11] N.B. Chang, B. Vannah, Y. Jeffrey Yang, Comparative sensor fusion between hyperspectral and multispectral satellite sensors for monitoring microcystin distribution in lake erie, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (6) (2014) 2426–2442. [12] I. Chorus, J. Bartram, in: I. Chorus, J. Bartram (Eds.), Toxic Cyanobacteria in Water: A Guide to Their Public Health Consequences, Monitoring and Management, E&FN Spon Press, London 1999, p. 416. [13] P.H. Dimberg, A.C. Bryhn, J.K. Hytteborn, Probabilities of monthly median chlorophyll-a concentrations in subarctic, temperate and subtropical lakes, Environmental Modelling & Software 41 (2013) 199–209. [14] P.H. Dimberg, J.K. Hytteborn, A.C. Bryhn, Predicting median monthly chlorophyll-a concentrations, Limnologica-Ecology and Management of Inland Waters 43 (3) (2013) 169–176. [15] M. Doumpos, C. Zopounidis, A multicriteria outranking modeling approach for credit rating, Decision Sciences 42 (3) (2011) 721–742. [16] M.A.H. Farquad, I. Bose, Preprocessing unbalanced data using support vector machine, Decision Support Systems 53 (1) (2012) 226–233. [17] J. Fastner, U. Neumann, B. Wirsing, J. Weckesser, C. Wiedner, B. Nixdorf, I. Chorus, Microcystins (hepatotoxic heptapeptides) in German fresh water bodies, Environmental Toxicology 14 (1) (1999) 13–22. [18] GLERL, Microcystin and Other Algal Toxin Guidelines [http://www.glerl.noaa.gov/ res/waterQuality/microcystinGuidelines.html] Great Lakes Environmental Research Laboratory, National Oceanic and Atmospheric Administration, 2014 [last accessed December 12, 2015]. [19] J.L. Graham, J.R. Jones, Microcystin in Missouri reservoirs, Lake and Reservoir Management 25 (3) (2009) 253–263. [20] Z. Gurkan, J. Zhang, S.E. Jørgensen, Development of a structurally dynamic model for forecasting the effects of restoration of Lake Fure, Denmark, Ecological Modelling 197 (1) (2006) 89–102. [21] J.H. Ha, T. Hidaka, H. Tsuno, Quantification of toxic microcystis and evaluation of its dominance ratio in blooms using real-time PCR, Environmental Science & Technology 43 (3) (2009) 812–818. [22] G. Hamilton, A.N.D. Ross Mcvinish, K. Mengersen, Bayesian model averaging for harmful algal bloom prediction, Ecological Applications 19 (7) (2009) 1805–1814. [23] M.R. Hassan, B. Nath, M. Kirley, A fusion model of HMM, ANN and GA for stock market forecasting, Expert Systems with Applications 33 (1) (2007) 171–180. [24] M.R. Hassan, A combination of hidden Markov model and fuzzy model for stock market forecasting, Neurocomputing 72 (16) (2009) 3439–3446.

[25] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1) (2006) 489–501. [26] P. Jiang, X. Liu, Hidden Markov model for municipal waste generation forecasting under uncertainties, European Journal of Operational Research 250 (2) (2016) 639–651. [27] J. Kennedy, Particle swarm optimization, in: C. Sammut, G.I. Webb (Eds.), Encyclopedia of Machine Learning, Springer Science & Business Media 2010, pp. 760–766, http://dx.doi.org/10.1007/9780-387-30164-8_630. [28] Y. Kim, H.S. Shin, J.D. Plummer, A wavelet-based autoregressive fuzzy model for forecasting algal blooms, Environmental Modelling & Software 62 (2014) 1–10. [29] J.H.W. Lee, I.J. Hodgkiss, K.T.M. Wong, I.H.Y. Lam, Real time observations of coastal algal blooms by an early warning system, Estuarine, Coastal and Shelf Science 65 (1) (2005) 172–190. [30] S.J. Lee, M.H. Jang, H.S. Kim, B.D. Yoon, H.M. Oh, Variation of microcystin content of Microcystis aeruginosa relative to medium N:P ratio and growth stage, Journal of Applied Microbiology 89 (2) (2000) 323–329. [31] I. Lou, Z. Xie, W.K. Ung, K.M. Mok, Freshwater algal bloom prediction by extreme learning machine in Macau storage reservoirs, Neural Computing and Applications (2014) 1–8, http://dx.doi.org/10.1007/s00521-013-1538-0. [32] I. Lou, Z. Xie, W.K. Ung, K.M. Mok, Integrating support vector regression with particle swarm optimization for numerical modeling for algal blooms of freshwater, Applied Mathematical Modelling 39 (2015) 5907–5916. [33] C.J. Lu, T.S. Lee, C.M. Lian, Sales forecasting for computer wholesalers: a comparison of multivariate adaptive regression splines and artificial neural networks, Decision Support Systems 54 (1) (2012) 584–596. [34] S. Moro, P. Cortez, P. Rita, A data-driven approach to predict the success of bank telemarketing, Decision Support Systems 62 (2014) 22–31. [35] I.E. Mulia, H. Tay, K. Roopsekhar, P. Tkalich, Hybrid ANN-GA model for predicting turbidity and chlorophyll-a concentrations, Journal of Hydro-Environment Research 7 (4) (2013) 279–299. [36] N. Muttil, J.H.W. Lee, Genetic programming for analysis and real-time prediction of coastal algal blooms, Ecological Modelling 189 (3) (2005) 363–376. [37] NH&MRC (Ed.), Guidelines for Managing Risks in Recreational Water, National Health and Medical Research Council Canberra, Australia 2008, p. 16. [38] K.H. Nicholls, Phosphorus and chlorophyll in the Bay of Quinte: a time-series/ intervention analysis of 1972–2008 data, Aquatic Ecosystem Health & Management 15 (4) (2012) 421–429. [39] H.M. Oh, S.J. Lee, J.H. Kim, H.S. Kim, B.D. Yoon, Seasonal variation and indirect monitoring of microcystin concentrations in daechung reservoir, Korea, Applied and Environmental Microbiology 67 (4) (2001) 1484–1489. [40] H.W. Paerl, T.G. Otten, Blooms bite the hand that feeds them, Science 342 (6157) (2013) 433–434. [41] Y. Park, K.H. Cho, J. Park, S.M. Cha, J.H. Kim, Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea, Science of the Total Environment 502 (2015) 31–41. [42] G.C. Pereira, A. Evsukoff, N.F.F. Ebecken, Fuzzy modelling of chlorophyll production in a Brazilian upwelling system, Ecological Modelling 220 (12) (2009) 1506–1512. [43] B. Qu, The correlation analysis and predictions for chlorophyll a, aerosol optical depth, and photosynthetically active radiation, The Impact of Melting Ice on the Ecosystems in Greenland Sea, Springer, Berlin Heidelberg 2015, pp. 65–80, http:// dx.doi.org/10.1007/978-3-642-54498-9_5. [44] L.R. Rabiner, Tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77 (2) (1989) 257–286. [45] M. Ramin, T. Labencki, D. Boyd, D. Trolle, G.B. Arhonditsis, A Bayesian synthesis of predictions from different models for setting water quality criteria, Ecological Modelling 242 (2012) 127–145. [46] A.F. Roegner, M.P. Schirmer, B. Puschner, B. Brena, G. Gonzalez-Sapienza, Rapid quantitative analysis of microcystins in raw surface waters with MALDI MS utilizing easily synthesized internal standards, Toxicon 78 (2014) 94–102. [47] M.K. Rogalus, M.C. Watzin, Evaluation of sampling and screening techniques for tiered monitoring of toxic cyanobacteria in lakes, Harmful Algae 7 (4) (2008) 504–514. [48] G. Schwarz, Estimating the dimension of a model, The Annals of Statistics 6 (2) (1978) 461–464. [49] D. Sedan, L. Giannuzzi, L. Rosso, C.A. Marra, D. Andrinolo, Biomarkers of prolonged exposure to microcystin-LR in mice, Toxicon 68 (2013) 9–17. [50] C. Sivapragasam, N. Muttil, S. Muthukumar, V.M. Arun, Prediction of algal blooms using genetic programming, Marine Pollution Bulletin 60 (10) (2010) 1849–1855. [51] Z.L. Sun, T.M. Choi, K.F. Au, Y. Yu, Sales forecasting using extreme learning machine with applications in fashion retailing, Decision Support Systems 46 (1) (2008) 411–419. [52] F. Wang, X. Wang, B. Chen, Y. Zhao, Z. Yang, Chlorophyll a simulation in a lake ecosystem using a model with wavelet analysis and artificial neural network, Environmental Management 51 (5) (2013) 1044–1054. [53] WHO, Algae and cyanobacteria in freshwater, In: Guidelines for Safe Recreational Water Environments, World Health Organization, Geneva, Switzerland 2003, pp. 136–158. [54] R. Williamson, J.G. Field, F.A. Shillington, A. Jarre, A. Potgieter, A Bayesian approach for estimating vertical chlorophyll profiles from satellite remote sensing: proof-ofconcept, ICES Journal of Marine Science 68 (4) (2011) 792–799. [55] D. Zhang, P. Xie, Y. Liu, T. Qiu, Transfer, distribution and bioaccumulation of microcystins in the aquatic food web in Lake Taihu, China, with potential risks to human health, Science of the Total Environment 407 (7) (2009) 2191–2199. [56] J. Zhang, D.L. Mauzerall, T. Zhu, S. Liang, M. Ezzati, J.V. Remais, Environmental health in China: progress towards clean air and safe water, The Lancet 375 (9720) (2010) 1110–1119.

P. Jiang et al. / Decision Support Systems 84 (2016) 89–103 Peng Jiang is currently a PHD student in Industrial Engineering at Shanghai Jiao Tong University, Shanghai, China. He received his Bachelor's degree in School of Management at Xi'an Jiao Tong University, China. His research interests include Probabilistic Graph Models for time series forecasting and risk assessment, and Deep Learning Models for earlywarning of real-time risk. His previous research has been published in the European Journal of Operational Research. Xiao Liu received the M.Sc. degree in system engineering from the Northeastern University, China, in 1999, and the Ph.D. degree in industry system from the Universit'e de technologie de Troyes (UTT), Troyes, France, in 2004. Currently, she is Professor in Industrial Engineering at Shanghai Jiao Tong University. Her research focuses on systems modeling and analysis, environmental and energy infrastructures resilient design method and optimization, risk assessment, prediction, and robust optimization. She has authored or co-authored more than one hundred technical papers, and currently serves on more than ten international research Journals. Jingjie Zhang is a senior research fellow at National University of Singapore and a senior advisor at NUSDeltares (a collaborative institute between NUS and Institute). He holds a PhD in environmental and ecological modeling. Dr. Zhang has more than 15 years of

103

experiences working on various national and international as well as UNEP projects. His main research focuses on developing and applying mathematical models for better understanding of ecosystem properties, describing general ecological patterns in aquatic ecosystems and examining different environmental impacts driven by forcing functions such as changes of climate and landscape, agricultural activities, pollutants, fragmentation and human disturbance. His research interests have also expanded to include ecosystem health and environmental impact assessment. He has been involved in developing Structurally Dynamic Models and PAMOLARE (Planning And Management Of Lakes And Reservoirs focusing on Eutrophication) for UNEP and International Lake Environment Committee Foundation (ILEC), Comprehensive Aquatic System Models and Integrative modeling framework for lakes, reservoirs, wetlands and coastal marine systems in Singapore, China, Denmark and US. Xiaoyang Yuan is an undergraduate student in Department of Cell Biology, University of Alberta, Canada. Her current research interests include environmental and ecological modeling and risk assessment.

Suggest Documents