Expert Systems with Applications 38 (2011) 10229–10239
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Mining Markov chain transition matrix from wind speed time series data Zhe Song a,⇑, Xiulin Geng a, Andrew Kusiak b, Chang Xu c a
School of Business, Nanjing University, Nanjing, Jiangsu 210093, China Department of Mechanical and Industrial Engineering, The University of Iowa, Iowa City, IA 52242-1527, United States c Hohai University, Nanjing, Jiangsu 210098, China b
a r t i c l e
i n f o
Keywords: Data mining Wind speed time series Wind power Evolutionary algorithms Markov chain Optimization
a b s t r a c t Extracting important statistical patterns from wind speed time series at different time scales is of interest to wind energy industry in terms of wind turbine optimal control, wind energy dispatch/scheduling, wind energy project design and assessment, and so on. In this paper, a systematic way is presented to estimate the first order (one step) Markov chain transition matrix from wind speed time series by two steps. Wind speed time series data is used first to generate basic estimators of transition matrices (i.e. first order, second order, third order, etc.) based on counting techniques. Then an evolutionary algorithm (EA), specifically double-objective evolutionary strategy algorithm (ES), is proposed to search for the first order Markov chain transition matrix which can best match these basic estimators after transforming the first order transition matrix into its higher order counterparts. The evolutionary search for the first order transition matrix is guided by a predefined cost function which measures the difference between the basic estimators and the first order transition matrix, and its high order transformations. To deal with the potential high dimensional optimization problem (i.e. large transition matrices), an enhanced offspring generation procedure is proposed to help the ES algorithm converge efficiently and find better Pareto frontiers through generations. The proposed method is illustrated with wind speed time series data collected from individual 1.5 MW wind turbines at different time scales. 2011 Elsevier Ltd. All rights reserved.
1. Introduction The US and China wind power market is rapidly expanding in recent years and even more aggressive wind energy installations are envisioned in various economy stimulus packages (AWEA, 2008; Manwell, McGowan, & Rogers, 2002; McElroy, Lu, Nielsen, & Wang, 2009; Wiser & Bolinger, 2006). Rapid development of wind energy provides rich environments for wind energy related research as well as present new research challenges. Among these challenges, understanding the wind speed and applying the knowledge to wind energy industry is one of the most important ones (Manwell et al., 2002). Knowing how wind behaves at specific wind farms is extremely important for today’s power systems in terms of operations and scheduling as wind energy is becoming a large asset of the world’s future energy portfolio. Researches related with wind speed are quite extensive, including wind speed forecasting (Kusiak, Zheng, & Song, 2009; Pourmousavi Kani, & Riahy, 2008), wind speed simulation (Ettoumi, Sauvageot, & Adane, 2003; Kantz, Holstein, Ragwitz, & Vitanov, 2004; Negra, Holmstrøm, Bak-Jensen, & Sørensen, 2007; Nfaoui, Essiarab, & Sayigh, 2004; Sahin & Sen, 2001) and so on.
⇑ Corresponding author. Tel.: +86 152 9551 4941. E-mail address:
[email protected] (Z. Song). 0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.063
Wind speed forecasting and simulation could be achieved either through physical models or statistical ones, or their combinations. Among statistical approaches, Markov chain is a popular tool to model, forecast and simulate the wind speed in a discrete and statistical way. Ettoumi et al. (2003) used first-order Markov transition matrices (three states for wind speed, and nine states for wind direction) to model the three-hour average wind speed and direction data in Algeria. Kantz et al. (2004) used continuous Markov chain to model the turbulent wind speed based on past measurements at a single site. The extracted transition matrix could be used for the prediction of future short-time turbulent gusts. Negra et al. (2007) developed a synthetic wind speed generator based on Markov chain concepts. Nfaoui et al. (2004) used the hourly wind speed time series data at Tangiers Morocco to calculate a 12-state transition matrix. The resulted wind speed transition matrix could be used to generate synthetic wind time series data. Sahin and Sen (2001) tried to extract a first-order Markov chain transition matrix from wind time series data at the northwestern region of Turkey. Neural networks and Markov transition matrix are combined to predict short-term wind speeds, where the Markov transition matrix is used to adjust the neural networks’ output in a statistical sense (Pourmousavi et al., 2008). Anahua, Barth, and Peinke (2008) applied the concepts of Markov chains to build wind turbine power curves which could capture the stochastic part of the real curves.
10230
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
Among these researches, estimation of the Markov transition matrix is not fully discussed. They are usually based on the simple established maximum likelihood estimator by using the wind time series data, which does not fully utilize the information hidden in the wind time series data. On the other hand, serious statistic researchers mainly focused on developing new methods for estimating the transition probabilities through proportions data (Jones, 2005; Lee, Judge, & Zellner, 1968; McGuire, 1969). To address this issue, this paper presents a systematic way to mine a first-order transition matrix from the wind speed time series data. The proposed method makes full use of the time series data through simple transformations. The mining process is essentially a nonlinear constrained optimization problem to be solved by multi-objective evolutionary algorithms (Deb, 2001), in which constraints or objective functions could be in flexible forms. As the size of the transition probability matrix becomes large, traditional evolutionary algorithms may not converge. This paper develops a new strategy to solve this potential issue and illustrates the effectiveness of the idea through a case study with real wind speed time series data.
2. Wind speed time series and Markov chains Wind speed time series data is of paramount importance for wind energy industry as it is closely related with wind energy output, wind turbine control, wind power integration and so on. Wind time series data can be obtained from different sources, for example, weather stations, met-towers or wind turbines equipped with anemometers. A wind speed time series data could be denoted as vt, t = 0, 1, 2, . . . , T, where vt is the wind speed (discrete) at time stamp t. t starts with 0 and ends with T. The total number of discrete wind speeds of this wind speed time series data is T + 1. Wind speed measurements are usually continuous values, but in order to fit for a Markov chain application, vt is discrete and can take a finite number of possible states. Suppose continuous wind speeds can be discretized into r states Si, i = 1, . . ., r. For example state S1 could stand for wind speed between 0 and 1 m/s. How to discretize continuous wind speeds into discrete r states depends on target applications as well as other factors, such as wind speed variations, wind turbine specifications. In this paper, cut-in (vcut-in) and rated (vrated) wind speeds are utilized as two distinctive numbers to discretize the wind speed in later computational study. The reason for this kind of categorization is so obvious that wind speeds between 0 and cut-in speed generate zero power output. When wind speeds are greater than the rated speed, a wind turbine usually generates a rated power output (relatively constant) unless the wind speed is too fast and greater than the cut-out (vcut-out) wind speed. Once this happens, the wind turbine is shut down to protect itself. Based on the cut-in wind speed and rated wind speed, S1 could be defined as a state containing wind speeds from 0 to vcut-in, i.e. 0 6 v < vcut-in. Sr could be defined as a state containing wind speeds greater than or equal to vrated. Then S2 to Sr1 could be defined based on different wind speed classification schemes. One simple way could be the equal interval (Dv) approach. Let the interval Dv be 1 m/s, then S2 would be the state containing wind speeds vcut-in 6 v < vcut-in + 1. S3 to Sr1 could be defined in a similar way. In this case, the number of states will be determined by the interval Dv, vcut-in and vrated. For a Markov chains application in wind speed time series data, the one-step transition matrix P contains the probability that the wind speed, when in some state Si, will move into another state Sj at next sampling time. The one-step (i.e. first-order) transition matrix P could be represented as
0
p11 B . B P ¼ @ .. pr1
1 . . . p1r .. .. C C . . A prr
ð1Þ
each entry of this matrix is a probability and they satisfy the followP ing conditions: pij P 0, rj¼1 pij ¼ 1, i, j = 1, . . . , r (Ross, 2007). Note that P is a one-step transition matrix and the probabilities are one-step transition probabilities. It is also interesting to see the 2-step transition probabilities as knowing what future wind speeds look like is very important for some applications in wind energy industry. The 2-step transition probability can be defined as p2ij , where the superscript 2 indicates the 2-step transition, p2ij means the probability that a wind speed in state Si will be in state Sj after two additional transitions. Or in other words, at current time t, the wind speed is in state Si, what is the probability that the wind speed will be in state Sj at sampling time t + 2. Similarly, a m-step transition matrix could be defined in theory. However, in practice, predicting wind speed into the long future with certain accuracies is very hard due to the stochastic nature of wind. According to the Chapman–Kolmogorov equations (Ross, 2007), the 2-step transition matrix P2 can be easily calculated from P, i.e. multiplying P by itself two times. Multiplying P by itself three times will give you the 3-step transition matrix P3. Given the wind speed time series data vt (t = 0, 1, 2, . . . , T), estimating the one-step transition matrix is a relatively straightforward process. More generally, let nij denote the number of wind speeds that were in state Si in period t (i.e. vt = Si) and are in state Sj in period t + 1 (i.e. vt+1 = Sj), for t starts from 0 to T 1. Then the transition probability can be estimated as
, pij ¼ nij
r X
nij :
ð2Þ
j¼1
It is widely known that Eq. (2) is a biased maximum-likelihood estimator with bias tending toward to zero as the sample size increases. As mention before, most of research papers about transition matrix estimation are focused on proportions data (Jones, 2005; Lee et al., 1968; McGuire, 1969). That is to say there is no reasonable time series data such that observing transitions over time is difficult. However in wind energy industry, wind time series data are usually recorded continuously at different time scales, e.g. every 10 min or 10 s. As a result, in this paper, effort is given to search from the time series data for the best transition matrix which can catch the wind speed transition probabilities, in terms of not only one step, but also several steps. The whole process is data-driven and relies heavily on data qualities. Practitioners should be aware that, before applying the method discussed in this paper to your applications, preparing a good and reliable wind time series data is the key factor for later successful implementations. If the wind time series data does not provide accurate and enough information about the wind speed transitions, the transition matrix mined from the data would certainly fail to capture the underlying patterns. 3. Optimization model for mining transition matrix In this paper, finding the best Markov chain transition matrix is accomplished through an constrained optimization model. Given the wind speed time series data vt (t = 0, 1, 2, . . . , T), the objective is to find the one-step transition matrix P with some cost function to be minimized. A general optimization model to find the best transition matrix is shown in model (3), where the cost function J() is a function of P. Constraints could include all types of conditions that have to be satisfied during the optimization process. For example, some general constraints are: the row summation
10231
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
of the transition matrix equals to 1, and each entry of P is between 0 and 1. Specific constraints could appear according to specific applications or requirements
arg min JðÞ P
ð3Þ
s:t: Constraints 3.1. Cost function considering multi-step transitions
As it is suggested by Eq. (2), the simplest way to get the onestep transition matrix is going through the time series data and find out what these numbers nij are. To differentiate in later discusP sion, let p0ij be the transition probability estimated from nij = rj¼1 nij , 0 similarly P is the one-step transition matrix estimated by using Eq. (2). If the cost function J() is defined as J ¼ kP P 0 k2 , where J ¼ k k2 is an entry-wise norm, i.e. for the two r r transition matrices, 0
kP P k2 ¼
r X r X i¼1
!1=2 jpij
p0ij j2
;
ð4Þ
j¼1
with a little abuse of notations, in this equation, the superscripts are the square and square root respectively. It is obvious that the optimal solution of model (3) is putting P = P0 . However this approach only considers one-step transitions. But in real applications, knowing 2-step transitions is also interesting and has practical justifications. For example, for an hourly power dispatch problem, knowing 2-hour ahead wind speeds will help system operators manage grid risks in a better position. With a little help from the Chapman– Kolmogorov equations, the 2-step transition matrix P2 can be calculated from P. Remember that if J ¼ kP P 0 k2 , the estimated one-step transition matrix P will equal to P0 . However, to what extent, can this calculated 2-step transition matrix P2 coincide with the time series data? From another point of view, the information hidden inside the time series data vt (t = 0, 1, 2, . . . , T) is not fully explored. Recall from Eq. (2), the time series data can be used to estimate the one-step transition probabilities. With a little more manipulations of the time series data, the 2-step transition probabilities can also be easily estimated. Let n2ij denote the number of wind speeds that were in state Si in period t (i.e. vt = Si) and are in state Sj in period t + 2 (i.e. vt+2 = Sj), for t starts from 0 to T 2. Then the 2-step transition probability can be estimated as
p2ij ¼ n2ij
r .X
which are based on the time series data. The cost function uses a simple weighting strategy to differentiate among its three components. According to different applications, one-step transition matrix may be more important than the 2- or 3-step transition matrices. So assigning a higher value to w1 will reflect this kind of importance. Sometimes the other way around may be the truth. As it is mentioned before, the cost function can be easily extended for the purpose of multi-step transition matrices. In order to explain the cost function (6) clearly, let us look at four special cases of weight combinations. First of all, let w1 = 1, w2 = w3 = 0, in this case, the optimal of solution for model (3) is P = P0 . When w1 = 0, w2 = 1, w3 = 0, the optimal of solution for model (3) may not look like P0 , but after a simple Chapman– Kolmogorov transformation, it will reflect the 2-step transitions supported by the time series data. When w1 = w2 = 0, w3 = 1, the optimal of solution for model (3) 0 may not look like P0 , and its square may not look like P2 , but its cu30 bic will resemble P . When w1 = w2 = w3 = 1/3, the optimal of solution for model (3) may not look like P0 , its square may not look like 0 P20 , and its cubic will not be P3 , but it is a result of balancing among these multi-step transitions and has its practical meaning when multi-step planning is involved.
n2ij :
ð5Þ
3.2. Extended optimization model Based on previous discussions, the optimization model (3) used for mining the best one-step transition matrix can be generalized into optimization model (7), where m is the maximum steps considered in the cost function, and it should be a relatively small inte0 ger and much smaller than T. Because estimating Pm from the time series data vt (t = 0, 1, 2, . . . , T) will require that the number of wind speeds that were in state Si in period t (i.e. vt = Si) and are in state Sj in period t + 2 (i.e. vt+2 = Sj) be calculated, for t starts from 0 to T m. Note that P and P0 are the simplified notations of P1 and 0 P1
argmin P
m X
wk kPk Pk0 k2
k¼1
s:t: pij P 0;
r X
ð7Þ pij ¼ 1;
for i; j ¼ 1; . . . ; r
j¼1
Other constraints At the bottom of model (7), other constraints could include domain knowledge about the wind speed transitions, or other applicationdependent considerations.
j¼1
And the 2-step transition matrix estimated through Eq. (5) is denoted as P20 , which is supposed to reflect the 2-step transitions from the data point of view. Similarly 3-step transition matrix P30 can be estimated from the time series data if needed. Although multi-step transition matrices, e.g. P40 , P 50 and so on, could be estimated from the time series data, without loss of generality, this paper stops at the 3-step transition matrix for easier illustration. But the method proposed in this paper can be easily extended to consider these multi-step transition matrices into the optimization model (3). Redefine the cost function as
J ¼ w1 kP P0 k2 þ w2 kP2 P 20 k2 þ w3 kP3 P30 k2 ;
ð6Þ
where w1, w2 and w3 are weights and they all between 0 and 1, and satisfy the equation w1 + w2 + w3 = 1. P2 and P3 are calculated from P by the Chapman–Kolmogorov equations. The cost function defined in Eq. (6) allows for choosing the optimal P so that these calculated transition matrices are similar to their estimated counterparts
4. Solve the optimization with evolutionary algorithms The solutions of model (7) are r r matrices. From a vector point of view, the solutions are vectors with r2 dimensions. Model (7) seems to be solvable by traditional gradient-based optimization heuristics, such as Newton or Quasi-Newton-type methods, so the question that arises is why use evolutionary algorithms? One reason is that getting initial guess of the solution is hard. It is easy to see that the complexity of the problem easily goes up as the number of states r increases. And these traditional methods tend to be trapped in a local optimum. Using EA could generate a large number of individuals which has the potential to relatively cover the search space. The second reason is that EA is easy to implement and not limited by the forms of the optimization model or constraints. EA could deal with models and constraints with or without specific mathematic forms. Besides, EA has proved itself to be a powerful search heuristic and solved a lot of hard optimization problems both in theoretic and practical examples (Deb, 2001).
10232
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
Among many types of evolutionary algorithms, evolutionary strategy algorithm (Eiben & Smith, 2003) is selected as a basis algorithm to solve model (7). In this paper the general ES is modified into a multi-objective ES to solve the constrained optimization problem. The solution of model (7) can be encoded as a matrix used by an evolutionary strategy algorithm. The general form of an individual in the evolutionary strategy algorithm is defined as (P, r), where P and r are two r r square 0 1 0 1 p11 . . . p1r r11 . . . r1r B .. .. C .. C r ¼ B .. .. matrices, i.e., P ¼ @ ... @ . . . A. r is a . A; . rr1 rrr pr1 prr mutation matrix used to mutate the solution P, where each element of r is used as a standard deviation of a normal distribution with zero mean. The initial population of r is generated by uniformly sampling from the range [rlow, rup], where rlow and rup are the lower and upper bounds for the standard deviation matrix. To solve a constrained optimization problem with a doubleobjective ES algorithm, all the constraints are transformed to formulate the second objective function (Deb, 2001). To standardize the optimization problem for illustration purpose, model (7) is transformed into a minimization problem with two objectives without considering the ‘‘Other constraints’’
argminfObj1 ; Obj2 g
ð8Þ
P
where
Obj1 ¼
Pm
k¼1 wk kP
k
0
Pk k2 ; Obj2 ¼
Pr Pr i¼1 j¼1 pij 1,
and
pij P 0 (for i, j = 1, . . . , r) will be considered during the initial population and mutation processes. Any negative probabilities will be replaced with 0. Minimizing Obj1 equals minimizing the cost function. The minima of Obj2 is zero, which means that every row summation equaling 1 is satisfied. To solve the double-objective optimization problem, a general evolutionary strategy algorithm is modified according to SPEA (Zitzler, Laumanns, & Thiele, 2001; Zitzler & Thiele, 1999). Other multi-objective evolutionary heuristics (Deb, 2001) are encouraged to solve this optimization problem without any doubt and comparisons among different heuristics are also desirable. On the other hand, some tactics could be researched to improve the optimization’s convergence speed and ability to find global optimum. However this paper focuses on formulating the optimization model of finding the best first order transition matrix and paves the road for future research and analysis. The SPEA Algorithm (Zitzler & Thiele, 1999; Zitzler et al., 2001) 1: Initialize three empty sets Parent, Offspring and Elite. Randomly generate lChild individuals (solutions) to form the initial children population and place them in Offspring. 2: Repeat until the stopping criterion is satisfied. 2.1: Find non-dominated solutions in Offspring and copy them into Elite. Remove dominated solutions in Elite. Reduce the size of Elite by clustering, if necessary. 2.2: Fitness assignment: Assign fitness to individuals in Offspring and Elite. 2.3: Selection: use tournament selection to select lParent individuals from Offspring [ Elite and store them in Parent. 2.4: Recombination: Generate a new population Offspring by selecting two parents in Parent. 2.5: Mutation: Mutate the individuals in Offspring. 2.6: Assign fitness to the individuals in Offspring. The stopping criterion in this paper is the number of generations.
4.1. Mutation An individual (P, r) can be mutated by following Eqs. (9) and (10), with r mutated first, P mutated next (Eiben & Smith, 2003)
0
1 0 . . . eNð0;s ÞþN1r ð0;sÞ C .. .. C . . A
0
eNð0;s ÞþN11 ð0;sÞ B ... r¼rB @ 0
eNð0;s ÞþNr1 ð0;sÞ
ð9Þ
0
eNð0;s ÞþNrr ð0;sÞ
where N(0, s0 ) is a random number drawn from normal distribution with 0 mean and standard deviation s0 ; Nij(0, s) (for i, j = 1, . . . , r) are two random numbers drawn from normal distribution with a mean of 0 and standard deviation s; Nij(0, s) is generated specifically for rij; while N(0, s0 ) is for all entries in this matrix; and ‘‘’’ denotes the Hadamard matrix product (Roger & Charles, 1994). The new solution is generated from Eq. (10)
P ¼ P þ Nð0; rÞ
ð10Þ
where N(0, r) is a matrix of the same size as P. Each element of N(0, r) is generated from a normal distribution with a mean of 0 and the corresponding standard deviation in matrix r. 4.2. Selection and recombination of parents To generate lChild children, two parents are selected from the parent population and recombined lChild times. Assume each time two parents are selected randomly to produce one child by using Eq. (11):
P
j2SeletedParents P j
2
P ;
j2SeletedParents
2
rj
ð11Þ
SelectedParents is a set consisting of the two indices of the randomly selected parents, subscript j indicates the jth individual in the parent pool. 4.3. Tournament selection Tournament selection (Eiben & Smith, 2003) with replacement is used in this paper to select out promising individuals going into the next generation based on their fitness values. The tournament size is a predefined parameter to control the selection pressure. In this paper a percentage number is defined to determine the tournament size. 4.4. Tailored optimization algorithm for wind speed transition matrix Using the previous double-objective ES algorithm to solve the optimization model is straightforward. However without considering two structural properties of the transition matrix, the computational cost will increase as the problem size expands. Thus utilizing the structural properties of the transition matrix could guide the evolutionary search process and lead to optimal solutions in an efficient fashion. Based on domain knowledge and patterns obtained from wind time series data analysis, two strong structural properties exist in a typical wind speed transition matrix: diagonal dominant probabilities and low probabilities in lower and upper areas. In a wind speed transition matrix, probabilities in the diagonal entries are usually larger than probabilities in other entries. More precisely, right or left neighbors of these diagonal entries also has certain probabilities, but tend to be smaller than diagonal ones. In a practical sense, a wind speed at current time t will stay in their original states or at most move into its neighbor states in the next sampling time t + 1. Sudden or dramatic changes in wind speeds are small-probability events for a long run. A wind speed belonging
10233
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
to a state cannot jump into another state that is too far away, which leads that the wind speed transition matrix is usually a sparse matrix and has many zeroes in the lower left and upper right areas. With these two strong patterns in the wind speed transition matrix, the evolutionary search algorithm will be redesigned and be more target-oriented, and should converge faster to the optima. In order to redesign the evolutionary strategy algorithm, some mathematic representations of these two patterns are defined. First of all, a diagonal entry pii is usually greater than pij, for i = 1, . . . , r, j = 1, . . . , r and i – j. Second, there are some inequalities that hold for most of wind speed transition matrices, pii P pii+1 P P pir and pii P pii1 P P pi1, for i = 1, . . . , r. Note that these inequalities do not have to be satisfied all the time. Third, to define the sparsity, the bandwidth concept is borrowed, i.e. there is a constant b P 0, pij = 0 for j > i + b or j < i b, where b is the constant that determines the bandwidth range. Two obvious places to apply these clues to the evolutionary search are initialization and selection processes. In the following discussion, two procedures are designed and can plug in the double-objective evolutionary strategy algorithm to help the targeted search. 4.4.1. Targeted initialization procedure In this procedure, only one individual is generated. To generate a pool of initial offspring, repeated calls will be needed. As the solution is a r r matrix form, this procedure populates the matrix row by row. 1: For the ith row, jth column in the transition matrix, generate a random number from a uniform sampling from the neighborhood p0ij kp0ij , check and make sure this number not exceed the [0, 1] bound. 2: Apply the bandwidth check, let pij = 0 for all j > i + b or j < i b. The parameter k is a small percentage that determines to which degree the neighborhood of P0 is to be explored. In this initialization procedure, the offspring pool is generated around the current estimated one-step, two-step and three-step transition matrices. Based on the weights w1, w2, w3, the numbers of initial children generated from these three matrices are roughly w1 lChild, w2 lChild, w3 lChild respectively. Although the estimated transition matrices may not be the best solution to the model (7), they still provide meaningful initial guesses. Otherwise many initial individuals are generated blindly. Note that the estimated one-step transition matrix is put into the initial offspring under any circumstances to show that there could be better solutions. 4.4.2. Targeted fitness calculation In the evolutionary strategy algorithm, each individual will be assigned with a fitness value. In order to select out individuals that tend to agree with the patterns of a wind speed transition matrix, a proper mechanism to calculate the fitness value is to be defined. According to the SPEA (Zitzler & Thiele, 1999), the fitness value is calculated separately for individuals in elite set and offspring set. For an individual in the elite set, its fitness value (i.e. strength value called in SPEA) is calculated through
Strengthi ¼
Dominatei jOffspringj þ 1
ð12Þ
where Strengthi is the strength value of the ith individual in the elite set, Dominatei is the number of individuals (in the offspring set) which are dominated by the ith individual in the elite set. |Offspring| is the size of the offspring set.
For an individual in the offspring set, its fitness value is calculated as
Fitnessj ¼ 1 þ
X
Strengthi
ð13Þ
i;ij
where Fitnessj is the fitness value of the jth individual in the elite P set, and i;ij Strengthi is the summation of the strength values of all the elite individuals who dominate the jth coming from the offspring set. i j means individual i coming from the elite set dominates individual j of the offspring set. In this paper two simple definitions are used to quantify the violations of the two patterns, one is to measure the violation of these inequalities: pii P pii+1 P P pir and pii P pii1 P P pi1 through definition (14). Large value of (14) means that more inequalities could be violated
( ) X r r1 1 X X Inequality ¼ minf0; pij pijþ1 g þ minf0; pij pij1 g i¼1 j¼i j¼i ð14Þ To quantify the bandwidth violation, all probabilities outside the bandwidth are summed together by definition (15)
Bandwidth ¼
r X
X
pij
ð15Þ
i¼1 8j;jiþb
"j, j < i b, j > i + b means that for all indices js in the transition matrix that are outside the bandwidth. Similarly, larger value of Pr P i¼1 8j;jiþb pij indicates more violations of the bandwidth pattern. One simple way to implement the targeted selection is to compound the quantities defined in (13)–(15) through weights. In (16), only two weights are used, however other weighting strategies are definitely implementable, which is beyond the focus of this paper
new Fitnessj ¼ wfitness Fitnessj þ wv iolation ðInequalityj þ Bandwidthj Þ ð16Þ For the jth individual in the offspring pool, its new fitness value will be calculated through (16). Strength calculations of elites will remain the same. In (16), wfitness + wviolation = 1, 0 6 wfitness, wviolation 6 1. Inequalityj and Bandwidthj are the two violation quantities calculated based on the jth offspring. Let wviolation = 0 and remove the targeted initialization process will disable the tailored SPEA algorithm and return to the original one. 4.5. Enhanced offspring generation from the elite set In the SPEA algorithm, the steps 2.3–2.5 could be replaced with an enhanced offspring generation procedure which is developed to deal with the original SPEA algorithm’s possible convergence issues. Recall that the next generation offspring is populated through two steps: a tournament selection to generate a parent pool from the offspring and the elite pools, and followed by a recombination process of two randomly selected parents to generate a new offspring pool. When the optimization problem is of high dimensions, and there are a lot of local optimums, the original SPEA algorithm will have certain probabilities to fail to converge in terms of the offspring pool. At the same time, the Pareto frontier will be trapped in the local optima, i.e. it is hard to push the Pareto frontier to the lower left corner for a minimization problem. The reason of this phenomenon could be explained with this fabricated scenario. Suppose the number of individuals in the elite set is much smaller than the number of individuals in the offspring pool. Then for each tournament, there will be a very small probability to select some elites. Thus the parent pool will be composed of individuals largely
10234
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
selected from the offspring pool. As it is mentioned before, the search space (i.e. high dimensions) is very large, randomly picking up two parents to generate a child and mutating the child will hardly generate a better solution than those in the elite set or even worse than its parents. As a result, the offspring may not converge at all and the elite set (i.e. the Pareto frontier) will hardly move toward the lower left corner although the generation number increases. To solve this problem, this paper imposes an enhanced offspring generation procedure on the original SPEA algorithm. This new procedure will be called periodically (e.g. every five generations) and generate the next-generation offspring directly from the neighborhood of the individuals in the elite set during the optimization process. SPEA Algorithm with Enhanced Offspring Generation
5.1. Small fabricated example 0
0
Using the above algorithm, individuals in the elite set are guaranteed to have the opportunity to populate their descendents. Thus the offspring pool is refreshed with at least some good candidate solutions.
5. Computational study In this computational study, two cases are to be presented. The first case is a simple test problem, which is fabricated to see the algorithm’s basic performance and the effects of different parameter combinations. For this fabricated example, the targeted initialization, the targeted fitness calculation and the enhanced offspring generation procedures are turned off due to the simple nature of this small case. The tailored optimization will be illustrated with the second case which has a larger transition matrix, i.e. more computational power is needed. The second case is based on real wind speeds measured at a specific 1.5 MW wind turbine within a wind farm. The wind speed is classified into 13 states based on a fixed interval length. Then corresponding transition matrices can be estimated from the data with counting techniques.
0
argmin kP P0 k2 þ kP2 P2 k2 þ kP3 P3 k2 P
s:t: p11 ; p12 ; p21 ; p22 P 0;
ð17Þ
p11 þ p12 ¼ 1; p21 þ p22 ¼ 1 0
1: Initialize three empty sets Parent, Offspring and Elite. Randomly generate lChild individuals (solutions) to form the initial children population and place them in Offspring. 2: Repeat until the maximum generation number is satisfied. 2.1: Find non-dominated solutions in Offspring and copy them into Elite. Remove dominated solutions in Elite. Reduce the size of Elite by clustering, if necessary. 2.2: Fitness assignment: Assign fitness to individuals in Offspring and Elite. 2.3: If the generation number is the multiple of a small number j, e.g. j = 5, go to step 2.4, else go to step 2.5. 2.4: Based on the number of offspring size and the number individuals in the elite set, determine roughly how many children will be generated from an elite; for each elite in the elite set, generate the previously calculated number of children by uniformly sampling from its neighborhood defined with the same parameter k used in the targeted initialization, apply the bandwidth check for each new child, go to step 2.1. 2.5: Selection: use tournament selection to select lParent individuals from Offspring [ Elite and store them in Parent. 2.6: Recombination: Generate a new population Offspring by selecting two parents in Parent. 2.7: Mutation: Mutate the individuals in Offspring, apply the bandwidth check. 2.8: Assign fitness to the individuals in Offspring, go to step 2.1.
0
Let the estimated wind speed transition matrices (i.e. P 0 ; P 2 ; P 3 ) 0 0 0:8 0:2 0:1 0:9 0:05 0:95 ; P2 ¼ ; P3 ¼ . Let be P0 ¼ 0:3 0:7 0:8 0:2 0:9 0:1 w1 = w2 = w3 = 1/3, model (7) is instantiated and simplified into model (17)
0
kP P0 k2 þ kP2 P2 k2 þ kP3 P3 k2 could be further expanded as a polynomial function which is not as concise as its current matrix representation. From this simplified example, traditional approach 0:8 0:2 would use as the wind speed transition matrix. But 0:3 0:7 2 0:8 0:2 ¼ based on Chapman–Kolmogorov transformation, 0:3 0:7 0:7 0:3 , which differs from the estimated two-step transi0:45 0:55 3 0 0:8 0:2 0:65 0:35 ¼ is differtion matrix P2 . Similarly, 0:3 0:7 0:525 0:475 0
ent from P 3 . As a result, using the estimated one-step transition matrix as the solution of a wind speed transition matrix may not coincide with the data if two or more steps transitions are expected. Solving model (17) would provide a balanced solution for the wind speed transition matrix. As the solution of model (17) is a 2 2 square matrix, there is no need to use the tailored evolutionary strategy algorithm to solve it. According to model (8), model (17) is transformed into double-objective optimization model and solved by the SPEA algorithm. For an evolutionary algorithm, some parameters are important for the algorithm’s convergence and efficiency, such as offspring size, the ratio between offspring pool size and parent pool size, the tournament size. Before starting a set of experiments to see the effects of those important parameters, other parameters of the algorithm should be fixed. For example the lower and upper bounds of the standard deviation matrices are set as 0.001 and 0.5, respectively, which will allow small perturbations of a solution. s0 and s are calculated based on a heuristic in (Eiben & Smith, ffi. In this small example, r = 2. Two parents 2003), s0 ¼ rp1 ffiffi2 ; s ¼ p1ffiffiffi 2r are selected to recombine a child. Another small technical modification is that putting the estimated one-step transition matrix into the initial offspring to see if it could survive or any better solutions could be found, which guarantees that the algorithm doesn’t find a worse solution than the estimated first-order transition matrix. The first interesting parameter is to see the convergence with the generation number. Before doing this experiment, let offspring and parent sizes be 700 and 100, respectively; tournament size is set as 16%, i.e. about 0.16 700 = 112 individuals of the offspring pool will join the tournament. From Fig. 1, it easy to see that the algorithm converges as the generation number increases from 10 to 100. The individual solutions lie on the vertical axis is the estimated one-step transition matrix. It is always there as it has perfect constraint satisfaction. As generation number goes up, the frontier (i.e. elite set) consists only two solutions, one is the estimated transition matrix, the other is the transition matrix found by the algorithm, which has lower cost function with a little sacrifice of the constraint. From Fig. 2, the offspring tends to converge to similar solutions as the
10235
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
0.665
0.665
Pareto frontier
0.66
0.65
10 generations
0.645
50 generations
0.64
Obj1 : cost function
0.655
Obj1 : cost function
Pareto frontier with different offspring/parent ratio
0.66
90 generations 0.635
0.655 0.65
100/100 300/100 500/100 700/100 900/100
0.645 0.64 0.635
100 generations
0.63
0.63 0
0.625 0
0.005
0.01
0.002
0.004
0.006
0.008
0.01
0.012
0.014
Obj2 : constraint
0.015
Fig. 4. Pareto frontiers at 100th generations for the small test example with different Offspring/Parent ratio, the tournament size is 16%.
Obj2 : constraint Fig. 1. Pareto frontiers at different generations for the small test example with Offspring/Parent = 700/100, tournament size is 16%.
0.665 Pareto frontier with different offspring sizes
0.66
Obj1 : cost function
3 Offspring
Obj1 : cost function
2.5
2
0.655 0.65
70/10 350/50
0.645
700/100 0.64
1400/200
0.635
Generation 0
1.5
Generation 100
0.63
0
0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 0.004 0.0045
Obj2 : constraint
1 Fig. 5. Pareto frontiers at 100th generations for the small test example with different Offspring size with fixed Offspring/Parent = 7/1, the tournament size is 16%.
0.5 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Obj2 : constraint Fig. 2. Offspring pool at different generations for the small test example with Offspring/Parent = 700/100, tournament size is 16%.
0.665
Pareto frontier with different tournament sizes
Obj1 : cost function
0.66 0.655 0.65 1%
0.645
8%
0.64
16% 0.635
32%
0.63 0.625 0
0.002
0.004
0.006
0.008
0.01
0.012
Obj2 : constraint Fig. 3. Pareto frontiers at 100th generations for the small test example with Offspring/Parent = 700/100, and different tournament sizes.
generation number increases from 0 to 100. Although many other experiments are performed, they are not reported in this paper. 100 generations seem to be enough for the algorithm to converge to acceptable solutions. For example, the solution in Fig. 1’s lower left corner, it can be adjusted manually to satisfy the constraint
perfectly, thus become a better solution than the estimated onestep transition matrix in terms of the cost function. Tournament size is another important parameter to affect the algorithm’s performance. To test the tournament size effects, other parameters are fixed, i.e. Offspring/Parent = 700/100, 100 generations. Then tournament size is changed from 1% to 32% and the corresponding Pareto frontiers are plotted to see the impacts. It is easy to see from Fig. 3 that tournament size does have the influence on the algorithm’s performance. In this small example, 16% seems to be relatively good choice as it provides the Pareto frontier with smallest constraint violations. Based on Fig. 4, 300/100, 700/100 and 900/100 seem to achieve similar performance. But after analyzing the solutions of these five different experiments, it is found that these solutions are similar and close enough to be regarded as one solution after manual modification (i.e. round off to perfectly satisfy the constraint), except the 100/100 case, which has a little difference. For these different Offspring/Parent ratios, the final solutions are: 100/100, 0:66 0:34 0:67 0:33 0:67 0:33 ; 300=100; ; 500=100; ; 0:36 0:64 0:36 0:64 0:36 0:64 0:67 0:33 0:67 0:33 700=100; ; 900=100; . 0:36 0:64 0:35 0:65 After seeing the effects of different Offspring/Parent ratios, it is interesting to see how the algorithm behaves with different offspring sizes. Fig. 5 shows that increasing the offspring size blindly does not always generate better performance for this small example. On the other hand, larger population will increase the computational cost. 700/100 seems to be an acceptable choice with relatively good performance and low computational cost.
10236
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
Table 1 Estimated one-step transition matrix. 0.96 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.04 0.80 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.17 0.84 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.11 0.76 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.11 0.76 0.22 0.01 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.13 0.71 0.24 0.01 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.07 0.59 0.20 0.01 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.01 0.15 0.52 0.28 0.01 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.24 0.38 0.19 0.02 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.25 0.50 0.24 0.02 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.08 0.26 0.58 0.31 0.12
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.15 0.53 0.46
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.13 0.42
0.00 0.01 0.15 0.66 0.16 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.01 0.15 0.67 0.28 0.02 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.01 0.15 0.64 0.25 0.03 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.01 0.05 0.51 0.25 0.04 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.01 0.17 0.46 0.18 0.07 0.01 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.01 0.04 0.16 0.42 0.15 0.04 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.05 0.18 0.38 0.31 0.09 0.04
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.13 0.31 0.45 0.36 0.33
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.07 0.19 0.39 0.44
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.01 0.16 0.19
0.00 0.03 0.16 0.62 0.18 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.01 0.16 0.62 0.33 0.04 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.02 0.18 0.59 0.23 0.06 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.01 0.05 0.49 0.24 0.09 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.02 0.18 0.32 0.24 0.10 0.02 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.01 0.04 0.23 0.27 0.16 0.06 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.07 0.23 0.31 0.30 0.15 0.04
0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.05 0.11 0.31 0.42 0.36 0.44
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.04 0.10 0.18 0.36 0.44
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.02 0.03 0.03 0.12 0.07
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.14 0.31 0.45 0.36 0.26
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.09 0.18 0.39 0.46
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.03 0.13 0.24
Table 2 Estimated two-step transition matrix. 0.94 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.06 0.69 0.08 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.26 0.76 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Table 3 Estimated three-step transition matrix. 0.91 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.09 0.63 0.09 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.29 0.74 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Table 4 2-Step transition matrix calculated through the estimated one-step transition matrix in Table 1 by Chapman–Kolmogorov equation. 0.92 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.07 0.65 0.08 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.01 0.28 0.73 0.19 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.02 0.18 0.60 0.18 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.01 0.17 0.62 0.32 0.07 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.01 0.19 0.55 0.31 0.06 0.01 0.00 0.00 0.00 0.00
5.2. Real wind speed data Wind speeds used in this example are called 10 s average data in wind energy industry. The continuous wind speeds are divided
0.00 0.00 0.00 0.00 0.01 0.09 0.40 0.23 0.07 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.02 0.17 0.37 0.26 0.06 0.01 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.22 0.26 0.17 0.06 0.01 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.08 0.24 0.36 0.27 0.10 0.04
into 13 states between 0 m/s and 13 m/s by letting Dv = 1 m/s, i.e., S2 = [1, 2), . . . , S13 = [12, 13). In this particularly example, wind speeds above 13 m/s are rare and thus omitted from further analysis without compromising the generality of the proposed method.
10237
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239 Table 5 3-Step transition matrix calculated through the estimated one-step transition matrix in Table 1 by Chapman–Kolmogorov equation. 0.09 0.53 0.10 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.02 0.35 0.65 0.23 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.04 0.22 0.50 0.21 0.06 0.01 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.03 0.20 0.53 0.37 0.12 0.02 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.03 0.22 0.45 0.33 0.10 0.02 0.00 0.00 0.00 0.00
Tables 1–3 are estimated transition matrices from the 10-s time series data with total 3590 data points, which is about one-day length. For displaying clearly the matrices, the summation of rows in those tables may not be one due to round off errors. Learning the 10-s transition matrix is of interest for controlling and optimizing a wind turbine’s energy conversion process. Optimal control strategy could be derived based on the transition matrix if it meets the practical application’s requirements in terms of accuracy and reliability. Tables 4 and 5 show the calculated transition matrices based on the Chapman–Kolmogorov equation. Knowing the one-step transition matrix, the corresponding two-step, three-step transition matrices can be calculated easily. However, comparing the matrices in Tables 4 and 5 with the matrices in Tables 2 and 3, it is easy to see that there are obvious differences between the estimated transition matrices and the calculated ones, especially between Tables 3 and 5. Thus using the optimization model could find better solutions that consider these discrepancy issues. As it is illustrated before with the small example, some parameters like the ratio, are important for the evolutionary algorithm to converge. But for this example, the situation is more complex because the fact is that the estimated one-step transition matrix may be close to the final optimal solution, or at least it is one of the local optima. As a result searching a solution dominating the estimated one-step transition matrix will be a challenging problem. In this case, too much randomness in the evolutionary algorithm will generate a lot of useless solutions, thus the probabilities of generating a better solution than the estimated one-step transition matrix is diminishing. Secondly, larger mutation steps will further undermine the algorithm’s convergence as some potential better solutions may be skipped.
0.00 0.00 0.00 0.00 0.02 0.10 0.29 0.21 0.09 0.02 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.03 0.19 0.29 0.22 0.09 0.03 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.01 0.07 0.19 0.21 0.16 0.09 0.03 0.01
0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.23 0.30 0.26 0.15 0.09
0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.06 0.18 0.32 0.39 0.37 0.33
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.05 0.12 0.19 0.32 0.39
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.04 0.11 0.16
Numerous experiments are performed and it is found that it is almost impossible to find better solutions if the estimated one-step transition matrix is put into the initial population. By using the targeted initialization and selection processes (let k ¼ 5%, b = 3), as well as limiting the standard deviation matrix between 0.001 and 0.05, exploring the neighborhood of the estimated one-step matrix is guaranteed and some Pareto-optimal solutions begin to show up in the elite set. Similarly s0 and s are calculated based on a heuristic ffi. In this example, r = 13. in Eiben and Smith (2003), s0 ¼ rp1 ffiffi2 ; s ¼ p1ffiffiffi 2r Two parents are selected to recombine a child. Although numerous computational experiments are done for this example, this section only reports the most important and typical ones. The first question is about the problem size. As the solution is a 13 13 matrix, it is obvious that the search space is much
Initial offspring
Offspring at 100th generation
1.6
Obj1: cost function
0.89 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.4 1.2 1 0.8 0.6 0.4 0.2 0 0
1
2
3
4
5
Obj2: constraint Fig. 7. Offspring distributions at different generations, Offspring/Parent = 1400/200, the tournament size is 16%, without using the enhanced offspring generation.
0.346 0.344 Initial offspring
0.342
Offspring at 100th generation
6
70/10
0.336
700/100
0.334
1400/200
0.332
7000/1000
0.33 0.328
Obj1: cost function
0.34 0.338
5 4 3 2 1
0.326 0
0.324 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Fig. 6. Pareto frontiers at 100th generations for the real wind speed case with different Offspring size with fixed Offspring/Parent = 7/1, the tournament size is 16%, without using the enhanced offspring generation.
0
2
4 6 Obj2: constraint
8
10
Fig. 8. Offspring distributions at different generations, Offspring/Parent = 700/100, the tournament size is 16%, with the enhanced offspring generation.
10238
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
70/10 without enhanced offspring generation
700/100 without enhanced offspring generation
1400/200 without enhanced offspring generation
7000/1000 without enhanced offspring generation
Obj1: cost function
700/100 with enhanced offspring generation
0.35 0.34 0.33 0.32 0.31 0.3 0.29 0.28 0.27 0.26 0.25 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Obj2: constraint Fig. 9. Pareto frontiers at the100th generations with different offspring sizes, the tournament size is 16%.
larger than the previous small example. As a rule of thumb, larger population size is needed. The weight combination in the cost function is set as w1 = 0.1, w2 = 0.45, w3 = 0.45 to guide the evolutionary algorithm to find solutions different from the estimated one-step transition matrix. From Fig. 6, it is clear that larger offspring size does provide the advantage of pushing the Pareto frontier to the lower left corner. As the problem size increases, more population will certainly help the algorithm cover more places of the search space and find better solutions. The second interesting point is about the original SPEA algorithm’s convergence issue. As mentioned before, the original algorithm may not converge in terms of the offspring. In other words, through generations, the offspring could move to the upper right part in the objective function space. Fig. 7 illustrates the scenario where offspring at the 100th generation is much worse than the initial one. To solve the problem illustrated in Fig. 7, enhanced offspring generation procedure is imposed on the original SPEA algorithm. Fig. 8 illustrates that the offspring is converging through generations even with a smaller population size compare with Fig. 7. Fig. 9 further illustrates the effectiveness of the enhanced offspring generation procedure. In Fig. 9, it is easy to see that, without enhanced offspring generation, it is hard to push the Pareto frontier to the lower left corner even when the offspring size is increased 100 times (i.e. from 70 to 7000). When the enhanced offspring generation is applied, the Pareto frontier is significantly moving toward the lower left corner even with a small population size. Fig. 10 further proves that the SPEA with enhanced offspring generation does converge in terms of the Pareto frontier. The Par-
100 generations
1000 generations
5000 generations
10000 generations
Obj1: cost function
0.36 0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2 0
0.05
0.1
0.15
0.2
0.25
Obj2: constraint Fig. 10. Pareto frontiers at different generations with Offspring/Parent = 700/100, the tournament size is 16%.
eto frontier at the 1000th generation is much better than the one at the 100th generation. Similarly, the Pareto frontier at the 5000th generation is better than the one at the 1000th generation. The constraint violation values of some solutions are reduced below 0.1. From 5000 generations to 10,000 generations, the Pareto frontier continues to move toward the lower left corner. More solutions are with cost function values below 0.2. Based on Fig. 10, there is a significant improvement from 100 generation to 1000 generations. As the generation number increases, the margin for improvement seems to shrink and the algorithm converges. 6. Conclusion Mining Markov chain transition matrix from the wind speed time series is of interest to the wind industry as the transition matrix could be applied various tasks, such as wind speed simulation, wind assessment, wind farm design, wind turbine optimal control and so on. This paper formulates the mining process as an optimization model with constraints and develops multi-objective evolutionary strategy algorithms to solve the optimization problem. In the optimization model, the optimal transition matrix is defined as the one that could not only match the estimated first order transition matrix, but also match those estimated higher order transition matrices after Chapman–Kolmogorov transformations. Since the multi-objective evolutionary strategy algorithms are used to solve the optimization problem, flexibility of constructing and extending the optimization model is guaranteed in terms of constraints and objective functions. In other words, objective functions and constraints could be free forms, such as nonlinear, non-concave and no gradient information. Some characteristics of the wind speed transition matrix are utilized in designing the evolutionary strategy algorithm to avoid blind searches. To deal with the potential convergence issue caused by high dimensions and many local optimums, this paper develops a new procedure called ‘‘enhanced offspring generation’’ for the original doubleobjective SPEA algorithm, which proves to be successful in terms of finding better Pareto frontiers and controlling the population’s convergence. The proposed method and model are tested with two computational examples. The first small example is fabricated with a 2-state transition matrix to validate the basic conceptions of this method and model. The second example uses real wind speed time series to find an optimal first order transition matrix with 13 states. This high dimensional example illustrates the effectiveness of the proposed enhanced offspring generation procedure as well as the overall applicability to real industry-size problems.
Z. Song et al. / Expert Systems with Applications 38 (2011) 10229–10239
Acknowledgments This paper is partially supported by National Science Foundation of China, project number 71001050; Junior Faculty Research Fellowship of Business School, Nanjing University, project number 2009-10; High Technology Project (‘‘863’’) of Chinese Science and Technology Department, project number 2007AA05Z445; National Basic Research Program (973 Program), project number 2010CB227102-1. References Anahua, E., Barth, S., & Peinke, J. (2008). Markovian power curves for wind turbines. Wind Energy, 11, 219–232. AWEA, 2008. Available from: Deb, K. (2001). Multi-objective optimization using evolutionary algorithms. New York: John Wiley & Sons. p. 515. Eiben, A. E., & Smith, J. E. (2003). Introduction to evolutionary computation. New York: Springer-Verlag. pp. 299. Ettoumi, F. Y., Sauvageot, H., & Adane, A. E. H. (2003). Statistical bivariate modeling of wind using first-order Markov chain and Weibull distribution. Renewable Energy, 28, 1787–1802. Jones, M. T., 2005. Estimating Markov transition matrices using proportions data: An application to credit risk. International Monetary Fund, Washington, DC, Tech. Rep. 05/219. Kantz, H., Holstein, D., Ragwitz, M., & Vitanov, N. K. (2004). Markov chain model for turbulent wind speed data. Physica A: Statistical Mechanics and its Applications, 342, 315–321. Kusiak, A., Zheng, H., & Song, Z. (2009). Short-term prediction of wind farm power: A data mining approach. IEEE Transactions on Energy Conversion, 24, 125–136.
10239
Lee, T. C., Judge, G. G., & Zellner, A. (1968). Maximum likelihood and bayesian estimation of transition probabilities. Journal of the American Statistical Association, 63, 1162–1179. Manwell, J. F., McGowan, J. G., & Rogers, A. L. (2002). Wind energy explained: Theory, design and application (1st ed.). London, UK: John Wiley & Sons. pp. 577. McElroy, M. B., Lu, X., Nielsen, C. P., & Wang, Y. (2009). Potential for wind-generated electricity in China. Science, 325, 1378–1380. McGuire, T. M. (1969). More on least squares estimation of the transition matrix in a stationary first-order Markov process from sample proportions data. Psychometrika, 34, 335–345. Negra, N. B., Holmstrøm, O., Bak-Jensen, B., & Sørensen, P. (2007). Model of a synthetic wind speed time series generator. Wind Energy, 11, 193–209. Nfaoui, H., Essiarab, H., & Sayigh, A. A. M. (2004). A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco. Renewable Energy, 29, 1407–1418. Pourmousavi, Kani, S. A., & Riahy, G. H., 2008. A new ANN-based methodology for very short-term wind speed prediction using Markov chain approach. In IEEE electrical power & energy conference. Roger, H., & Charles, J. (1994). Topics in matrix analysis. Cambridge: Cambridge University Press. Ross, S. M. (2007). Introduction to probability models (1st ed.). New York: Elsevier. Sahin, A. D., & Sen, Z. (2001). First-order Markov chain approach to wind speed modeling. Journal of Wind Engineering and Industrial Aerodynamics, 89, 263–269. Wiser, R., & Bolinger, M. (2006). Annual report on US wind power installation, cost, and performance trends. NREL, US Department of Energy, Golden, CO, USA. Available from: . Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the strength Pareto evolutionary algorithm. Computer Engineering and Networks Laboratory (TIK), Department of Electrical Engineering, Swiss Federal Institute of Technology, Zurich, Switzerland, Tech. Rep. TIK-Report 103. Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Transactions on Evolutionary Computation, 3, 257–271.