Wind Power Forecast Using Wavelet Neural Network Trained by Improved Clonal Selection Algorithm Hamed Chitsaza,∗, Nima Amjadyb , Hamidreza Zareipoura a
Schulich School of Engineering, University of Calgary, Alberta, Canada b Semnan University, Semnan, Iran
Abstract With the integration of wind farms into electric power grids, an accurate wind power prediction is becoming increasingly important for the operation of these power plants. In this paper, a new forecasting engine for wind power prediction is proposed. The proposed engine has the structure of Wavelet Neural Network (WNN) with the activation functions of the hidden neurons constructed based on multi-dimensional Morlet wavelets. This forecast engine is trained by a new improved Clonal selection algorithm, which optimizes the free parameters of the WNN for wind power prediction. Furthermore, Maximum Correntropy Criterion (MCC) has been utilized instead of mean squared error as the error measure in training phase of the forecasting model. The proposed wind power forecaster is tested with real-world hourly data of system level wind power generation in Alberta, Canada. In order to demonstrate the efficiency of the proposed method, it is compared with several other wind power forecast techniques. The obtained results confirm the validity of the developed approach. ∗ Corresponding Author: Hamed Chitsaz; Email:
[email protected]; Phone: +1-587-8996-8388; Fax:+1-403-282-6855
Preprint submitted to Elsevier
June 13, 2014
Keywords: Wind Power Forecasting, Wavelet Neural Network, Clonal Optimization
1
1. Introduction
2
In recent years, wind power has been the fastest growing renewable electric-
3
ity generation technology in the world. The worldwide wind capacity reached
4
approximately 300 GW by the end of June 2013, out of which 13.9 GW were
5
added in the first six months of 2013 [1]. In particular, Canada installed 377 MW
6
during the first half of 2013, which is 50 % more than in the previous period of
7
2012 [1]. In the Province of Alberta, Canada, in particular, the installed capacity
8
reached 1087 MW in late 2012, and is expected to grow to 2388 MW by 2016 [2].
9
Despite the environmental benefits of wind power, it has an intermittent nature,
10
which could affect power systems security [3] and reliability [4].
11
One approach to deal with wind power intermittency in operation time scale
12
is to forecast it over an extended period of time. Accurate wind power forecasting
13
can improve the economical and technical integration of large capacities of wind
14
energy into the existing electricity grid. Wind forecasts are important for system
15
operators to control balancing, operation and safety of the grid [5]. On the other
16
hand, wind power forecast errors might sometimes require system operators to re-
17
dispatch the system in real time. The costs of re-dispatch affect electricity prices
18
and system performance [6]. Moreover, reserve requirements are connected to
19
wind forecast uncertainty [7]. Hence, reducing the costs of re-dispatch and con-
20
tribution of spinning reserves by more accurate wind power prediction can effec2
21
tively increase system operation efficiency. For instance, the economic benefits
22
of accurate wind forecasting were assessed by GE Energy for the New York State
23
Energy Research and Development Authority (NYSERDA) and the NYISO. In
24
that study, it was estimated that $125 Million, or 36%, of the cost reduction is
25
associated with state-of-the-art wind power forecasting. It was about 80% of the
26
estimated cost reduction that could be achieved with a perfect wind power pro-
27
duction forecast [8].
28
Hence, various approaches have been proposed to improve wind power fore-
29
casting accuracy in the literature. In one group of models, wind speed and other
30
climate variables are predicted using Numerical Weather Prediction (NWP) mod-
31
els, and those forecasts are used to predict the wind power output of a wind turbine
32
or a wind farm [9, 10] using turbine or farm production curves. In another group
33
of models, the NWP forecasts, or self-generated climate variables forecasts, are
34
fed into secondary time series models to predict wind power output for a turbine,
35
a farm or system level wind power production[11, 12, 13, 14, 15, 16, 17, 18].
36
The time series models may be built based on statistical approaches [11, 12] or
37
artificial intelligence techniques [13, 14, 15, 16, 17, 18]. In a third group of mod-
38
els, only past power production values are used in univariate models to predict
39
future wind power values [19, 20, 21, 22, 23, 24, 25]. Despite improvements
40
in wind power forecasting methods, wind power forecasts still suffer from rela-
41
tively high errors, ranging from 8% to 22% (in terms of normalized mean squared
42
error) depending on several factors, such as, forecasting horizon, type of forecast-
43
ing model, size of wind farm and geographic location[26]. The contribution of the 3
44
present paper is to propose a new forecasting technique for short-term wind power
45
forecasting. In particular, we develop a wind power forecasting engine based on
46
Wavelet Neural Network (WNN) with multi-dimensional Morlet wavelets as the
47
activation functions of the hidden neurons and maximum correntropy criterion as
48
the error measure of the training phase. We propose a stochastic search technique,
49
which is an improved version of Clonal search algorithm, for training the fore-
50
casting engine. The significance of the proposed forecasting technique is that the
51
combination of the WNN and the proposed training strategy is capable of captur-
52
ing highly non-linear patterns in the data and result in improved forecast accuracy.
53
Particularly, high exploitation capability of the proposed training strategy enables
54
it to find more optimal solutions for the optimization problem of WNN training.
55
The remaining sections of the paper are organized as follows. In Section 2,
56
after a brief literature review on wind power forecasting models, the architecture
57
of the wind forecasting model is introduced in subsection 2.1. The proposed train-
58
ing strategy is then presented in subsection 2.2. The results of the proposed wind
59
forecasting method, obtained for the real-world test cases, are compared with the
60
results of several other prediction approaches in section 3. Section 4 concludes
61
the paper.
62
2. Methodology
63
In this section, a literature review of the existing wind power forecasting mod-
64
els with a focus on artificial intelligence methods is firstly provided. We then
65
provide the details of the proposed forecasting engine and its training strategy. 4
66
Artificial intelligence techniques, especially artificial neural networks, have
67
been used in several papers to predict wind power generation. Recurrent neural
68
network [13], Radial Basis Function (RBF) neural network [15], and Multi-Layer
69
Perceptron (MLP) neural networks [20], have been proposed for wind power fore-
70
casting. Although neural networks can model nonlinear input/output mapping
71
functions, a single neural network with traditional training mechanisms has lim-
72
ited learning capability and may not be able to correctly learn the complex behav-
73
ior of wind signal. To remedy this problem, combinations of neural networks with
74
each other and with fuzzy inference systems such as Adaptive Neuro-Fuzzy Infer-
75
ence System (ANFIS) [17, 21], and Hybrid Iterative Forecast Method (combining
76
MLP neural networks) [16], have also been suggested for wind speed and power
77
prediction. Another approach to tackle the complex behavior of wind power time
78
series is using wavelet transform. In [27], it has been discussed that wavelets can
79
effectively be used for both stationary and non-stationary time series analysis, and
80
that is one of the reasons for the wide and diverse applications of wavelets. In
81
[22, 24, 25, 23], wind speed and power prediction approaches based on wavelet
82
transform, as a preprocessor to decompose wind speed/power time series, and
83
ANFIS [22], Support Vector Regression (SVR) [24], ARMA [25] and Artificial
84
Neural Network (ANN) [23], as forecast engines, have been presented. In [28],
85
wavelet has been used in the form of wavelet neural network (WNN) for wind
86
speed prediction and the WNN is trained by Extended Kalman Filter (EKF). In
87
[18], a wind power prediction strategy including Modified Hybrid Neural Network
88
(MHNN) and Enhanced Particle Swarm Optimization (EPSO) has been proposed. 5
Morlet function 1
0.5
0.5
(X)
(X)
Mexican hat function 1
0
-0.5
-1 -5 -4 -3 -2 -1
0
-0.5
0
1
2
3
4
5
-1 -5 -4 -3 -2 -1
X
0
1
2
3
4
5
X
Figure 1: Mexican hat and Morlet wavelet functions
89
Briefly, the proposed forecasting technique is composed of a WNN structure
90
with Morlet wavelet functions as activation functions in the hidden layer and a
91
new training strategy. These components are described next.
92
2.1. The Developed Wavelet Neural Network (WNN)
93
Wavelet transform has been used in some recent research works for wind fore-
94
casting, as a preprocessor to decompose wind speed/power time series to a set of
95
sub-series [22, 24, 25, 23]. The future values of the sub-series are predicted by
96
ANFIS [22], SVR [24], ARMA [25] and ANN [23] and then combined by the
97
inverse WT to form the forecast value of wind power/speed. Another approach
98
to utilize wavelet in a forecast process is through constructing wavelet neural net-
99
work in which a wavelet function is used as the activation function of the hidden
100
neurons of an ANN. For instance, WNNs with Mexican hat and Morlet wavelets,
101
shown in Fig. 1, as the activation function of the hidden neurons have been ap-
102
plied for another application, i.e. price forecast of electricity markets, in [29]
103
and [36], respectively. Due to the local properties of wavelets and the ability of
6
104
adapting the wavelet shape according to the training data set instead of adapting
105
the parameters of the fixed shape activation function, WNNs offer higher gener-
106
alization capability compared to the classical feed forward ANNs [29]. Recently,
107
a WNN using Mexican hat mother wavelet function is proposed for wind speed
108
forecast [28]. However, Morlet wavelet has vanishing mean oscillatory behavior
109
with more diverse oscillations with respect to Mexican hat wavelet, which can
110
be seen from Fig. 1, and so it can better localize high frequency components in
111
frequency domain and various changes in time domain of severely non-smooth
112
time series, e.g. wind power. In [30], it is mentioned that Morlet leads to a better
113
electricity price forecast compared with Mexican hat. In this paper, we propose
114
a WNN with multi-dimensional Morlet wavelet as the activation function of the
115
hidden neurons for wind power forecasting. In the proposed method, we imple-
116
ment a new training algorithm which can efficiently search for the global optimum
117
solution, while a simple gradient method is used as the training algorithm of the
118
WNN in [30]. In addition, since the performance of these two activation functions
119
has not been illustrated in [30], we compare these functions in numerical results
120
in order to demonstrate the effectiveness of Morelet wavelet over Mexican hat in
121
wind power forecast.
122
Architecture of the WNN is shown in Fig. 2, which is a three-layer feed-
123
forward structure. In Fig. 2, X = [x1 , x2 , ..., xm ] is the input vector of the forecast
124
process and y is the target variable. The inputs x1 , x2 , ..., xm of the forecasting
125
engine can be from the past values of the target variable and past and forecast
126
values of the related exogenous variables. For instance, past values of wind power 7
Hidden Layer Input Layer
x1
F1
I
w1
F2 x2
I
Output Layer
w2 v1 v2
xm
+
y
vm
I
wn Fn
Figure 2: Architecture of the proposed wind forecasting engine (WNN with Morlet wavelet)
127
along with the past and forecast values of wind speed, wind direction, temperature
128
and humidity can be considered for wind power prediction, provided that their
129
data is available [14]. A feature selection technique can be used to refine these
130
candidate features and select the most effective inputs for the forecast process. In
131
this research work, we use the feature selection method of [18]. This method is
132
based on the information theoretic criterion of mutual information and selects the
133
most informative inputs for the forecast process by filtering out the irrelevant and
134
redundant candidate features. However, since this method is not the focus of this
135
paper, it is not further discussed here. The interested reader can refer to [18] for
136
details of this feature selection method. Here, the target variable is wind power
137
of the next time interval that the forecasting engine presents a prediction for it.
138
Multi-period forecast, e.g. prediction of wind power for the next forecast steps, is
139
reached via recursion, i.e. by feeding input variables with the forecaster’s outputs.
140
For instance, forecasted wind power for the first hour is used as y(t − 1) for wind 8
141
power prediction of the second hour provided that y(t − 1) is among the selected
142
candidate inputs of the feature selection technique.
143
The forecasting engine should construct the input/output mapping function of
144
X ⇒ y. The activation function of the input layer nodes is identity function, i.e.
145
I(x) = x. In other words, the input layer only propagates the inputs of the WNN
146
to the next layers. The activation function of the hidden layer nodes of the WNN,
147
i.e. multi-dimensional Morlet wavelet, is constructed as follows:
Fi (x1 , x2 , ..., xm ) =
m Y
ψai ,bi (xj ),
∀i = 1, 2, ..., n
(1)
j=1 148
ψai ,bi (xj ) = ψ(
x j − bi ) ai
(2)
149
where n indicates the number of hidden neurons of WNN, and one-dimensional
150
Morlet wavelet ψ(.) is defined as follow: 2
ψ(x) = e−0.5x cos(5x)
(3)
151
In (1) and (2), ψai ,bi is the scaled and shifted version of ψ(.) with ai and bi as the
152
scale and shift parameters, respectively. Each activation function Fi (.) has its own
153
ai and bi . Based on (1), Fi (.) is m-dimensional wavelet function of x1 , x2 , ..., xm
154
constructed by the tensor product of one-dimensional Morlet wavelets ψai ,bi . Fi-
9
155
nally, the output of the WNN, denoted by y in Fig. 2, is computed as follows:
y=
n X
wi Fi (x1 , x2 , ..., xm ) +
i=1
m X
v j xj
(4)
j=1
156
where, wi is the weight between ith hidden neuron and the output node, and vj
157
is the direct input weight between j th input and the output node. In other words,
158
the output of the WNN is obtained by a combination of multi-dimensional wavelet
159
functions, i.e. Fi (x1 , x2 , ..., xm ) , as well as a combination of inputs, i.e. xj . Thus,
160
the proposed WNN not only can benefit from the capabilities of wavelet functions,
161
such as their ability to capture cyclical behaviors, but also can capture trends of
162
the signal. Based on the above formulation, the vector of the free parameters of
163
the WNN, denoted by Z, is as follows:
Z = [v1 , ..., vm , w1 , ..., wn , a1 , ..., an , b1 , ..., bn ]
(5)
164
Therefore, the WNN has N P = m + 3n free parameters, which should be deter-
165
mined by the proposed training strategy.
166
2.2. The Proposed Training Strategy
167
We propose new training strategy to train the developed WNN. This strategy is
168
based on improved Clonal selection algorithm. In the following, at first, the Clonal
169
Selection Algorithm (CSA) is briefly introduced. Then, the proposed Improved
170
CSA (ICSA) is presented and adapted as the training strategy to optimize the free
171
parameters of WNN. 10
172
As an antigen (e.g., a virus) invades the human body, the biological immu-
173
nity system selects the antibodies that can effectively recognize and destroy the
174
antigen. The selection mechanism of the immunity system operates based on the
175
affinity of the antibodies with relation to the invading antigen. CSA is an efficient
176
optimization method, inspired by the biological immunity system selection mech-
177
anism, proposed by De Castro and Van Zuben [31]. This method has successfully
178
been applied to optimization and pattern recognition domains [31, 32] and also
179
unit commitment of power systems [33] in recent years. The performance of CSA
180
can be summarized as the following step by step procedure [31]:
181
Step 1: Randomly produce the initial population of CSA within the allowable
182
ranges. Each individual of the population, so called antibody in CSA, is a candi-
183
date solution for the optimization problem including its decision variables, called
184
genes in CSA [31]. Here, each antibody of the population of CSA includes N P
185
free parameters of WNN, shown in (5). The number of antibodies in the popula-
186
tion is denoted by N . The generation number g is set to zero (g = 0). In general,
187
antibody k in generation g is as follows:
g g g Zkg = [Zk,1 , Zk,2 , ..., Zk,N P ],
∀k = 1, 2, ..., N
(6)
188
g (l = 1, ..., N P ) can be from v1 , ..., vm where the lth gene or decision variable Zk,l
189
or w1 , ..., wn or a1 , ..., an or b1 , ..., bn as shown in (5). The four sets of vj , wi , ai
190
and bi of each individual are randomly initialized with uniform distribution in the
191
intervals [-1, 1], [-1, 1], [0.5, 2] and [-3, 3], respectively. Step 2: Determine the
11
192
affinity of the antibodies with respect to the antigen. In the optimization problems,
193
usually there is no explicit antigen population to be recognized, but an objective
194
function to be optimized (maximized or minimized). Thus, in optimization tasks,
195
an antibody affinity corresponds to the evaluation of the objective function for
196
the given antibody [33]. Here, the proposed WNN is trained, i.e. its free param-
197
eters are optimized, by CSA. Thus, training error of the WNN is considered as
198
the objective function of CSA, which should be minimized. Since training error
199
of ANN-based forecasting engines is widely measured in terms of Mean Squared
200
Error (MSE), it is considered as the measure of WNN training error for the present.
201
Step 3: Sort antibodies based on their training error values in terms of MSE (i.e.,
202
the objective function values) such that the best antibody with the lowest MSE
203
ranks the first.
204
Step 4: Copy the selected antibodies based on their position in the sorted popula-
205
tion:
nck = Round(
βN ), k
∀k = 1, 2, ..., N
(7)
206
where nck is the number of antibodies copied from k th antibody; Round(.) func-
207
tion rounds up/down its real argument to the nearest integer value; β is a constant
208
coefficient which indicates rate of copy. Thus, an antibody with lower MSE and
209
higher rank (lower k) will be copied more than those with higher MSE. At the end
12
210
of this step, the number of copied antibodies will be N C as follows:
NC =
N X
Round(
k=1
βN ) k
(8)
211
Performance of this step is graphically shown in Fig. 3. As seen, the first antibody
212
with the highest rank is copied nc1 times, the second one nc2 times and so on.
213
Step 5: Mutate the N C antibodies, produced in step 4, using the hypermutation
214
operator [32].
215
Step 6: Determine MSE value for each mutated antibody. Among the N C mu-
216
tated and N original antibodies, select N S antibodies (N S < N ) with the lowest
217
MSE values. These N S antibodies enter directly to the next generation.
218
Step 7: Randomly generate N − N S new antibodies for the next generation.
219
These randomly generated antibodies enhance search diversity of CSA, and con-
220
sequently, the algorithm takes the chance to escape from the local optima.
221
Step 8: Increment the generation number (g = g + 1). If the termination criterion,
222
such as maximum number of generations, is satisfied, the algorithm is terminated
223
and the best antibody of the last generation, owning the lowest MSE, is determined
224
as the final solution of CSA; otherwise, go back to step 2 and repeat this cycle.
225
The termination criterion used for the training of the WNN will be described later.
226
In the above algorithm, N , N S, and β are user-defined settings of CSA. In this
227
algorithm, antibodies are evolved through the mutation, which is the key operator
228
of CSA. The proposed Improved CSA (ICSA) includes two enhancements for this
229
operator as follows. De Castro and Van Zuben [31] discuss that the mutation rate
13
Figure 3: Representation of step 4 (copy operator) of CSA
230
should be inversely proportional to the antigenic affinity: the higher the affinity,
231
the smaller the mutation rate. Thus, in an optimization problem, more optimal
232
candidate solutions should be less mutated. This general idea is modeled in the
233
proposed ICSA based on the following relation: (−ρ
NkM ut = Round[e
M SEmin ) M SEk
× N P ],
∀k = 1, ..., N
(9)
234
where M SEk is MSE of k th antibody and M SEmin is the lowest MSE among
235
the antibodies of the current population; NkM ut represents number of genes of the
236
antibodies copied from k th antibody, produced in step 4, that should be mutated
237
(NkM ut ≤ N P ); the coefficient ρ controls the mutation rate such that higher ρ leads
238
to lower values of NkM ut . Thus, fewer decision variables are mutated in antibodies
239
with lower MSE (more optimal candidate solutions) and more decision variables
14
240
are mutated in antibodies with higher MSE (less optimal candidate solutions), and
241
consequently, the individuals of ICSA population are mutated in a coordinated
242
manner. As a result, not only ICSA does allow antibodies with lower MSE to be
243
copied more than those with higher MSE, but also controls the mutation process
244
of copied antibodies in accordance with their MSE using (9). Note that although
245
the mutation operation is applied to the N C copied antibodies as shown in step
246
5, only N values of NkM ut should be computed, since the antibodies copied from
247
one individual have the same NkM ut .
248
After determining NkM ut for the antibodies of ICSA, the set of genes that
249
should be mutated are randomly selected among the decision variables of each
250
antibody. The hypermutation operator of step 5 adds a normal random variable
251
with zero mean and constant variance to each decision variable of the mutating
252
antibody [39]. Here, a more effective mutation operation inspired from Differen-
253
tial Evolution (DE) algorithm [34] is proposed for ICSA as follows:
g+1 Zk,l = [1 − e
−ρ
M SEmin M SEk
g ]Zk,l +e
−ρ
M SEmin M SEk
g g (Zk1,l − Zk2,l ),
(10)
1 ≤ k 6= k1 6= k2 ≤ N C, 1 ≤ l ≤ N P
254
g+1 g where Zk,l and Zk,l represent gene l of antibody k in two successive generations
255
g and g +1. As seen, to mutate k th antibody, this mutation operator uses the values
256
of decision variables in two other antibodies (i.e. Zkg1 ,l and Zkg2 ,l ) as it is applied
257
in DE mutation operator.
258
In [34], it has been discussed that DE mutation operator by computation of
15
259
difference between two randomly chosen individuals from the population (here,
260
antibodies k1 and k2 ), determines a function gradient in a given area (not in a
261
single point), and therefore, prevents trapping the solution in a local optimum of
262
the objective function. Moreover, to enhance the search diversity of the proposed
263
mutation operation, it is separately applied to each gene of the mutating antibod-
264
ies. Additionally, the exp(.) term of (9) and its complement 1 − exp(.) are used
265
g in (10). If k th antibody Zk,l is a good candidate solution, the exp(.) term and its
266
complement become close to 0 and 1, respectively. Thus, the next generation de-
267
cision variables mainly take their values from the current generation decision vari-
268
ables and small gradient terms are added to them. Consequently, ICSA can search
269
promising areas of the solution space with small steps or high resolution, which
270
is known as exploitation capability. Based on this capability, a stochastic search
271
technique can extract optimal solutions of the solution space from its promising
272
areas. High exploitation capability of the proposed ICSA allows it to find more
273
optimal solutions for the optimization problem of WNN training. In other words,
274
the WNN using ICSA can better learn the complex input/output mapping function
275
of the wind power forecast process and so predict its future values with higher ac-
276
curacy. However, if k th antibody is a poor candidate solution, the exp(.) term and
277
its complement become close to 1 and 0, respectively, and so the next generation
278
decision variables mainly obtain their values from large gradient terms. Hence,
279
ICSA can move out from the non-promising areas of the solution space by large
280
steps. Finally, note that due to the effect of the mutation operation of (10), even
281
the copied antibodies of the same antibody lead to different individuals after the 16
282
mutation operation. Thus, at the end of step 5 of the proposed ICSA, N C diverse
283
candidate solutions are added to N original antibodies (Fig. 3), which further
284
enhance the search ability of ICSA.
285
The termination criterion is an important aspect of the proposed training strat-
286
egy, as it can affect the performance of the proposed forecasting engine. A low
287
number of ICSA generations may lead to insufficient training of the forecasting
288
engine and cause that the WNN cannot correctly learn the input/output mapping
289
function of the forecast process, i.e. X ⇒ y. On the other hand, a large number
290
of ICSA generations increases the computation burden and more importantly may
291
lead to over-fitting problem of the WNN. When over-fitting occurs in a neural net-
292
work based forecast engine, it memorizes the training samples instead of learning
293
them, and thus, while the neural network obtains very low training error, its gener-
294
alization capability (i.e. its ability to reply to unseen forecast samples) degrades.
295
To avoid these problems, a termination criterion based on cross-validation tech-
296
nique is used for the training phase of the proposed forecasting engine. In this
297
technique, the whole gathered historical data is divided to training and validation
298
samples. The WNN is trained by the proposed ICSA trying to minimize MSE
299
of training samples in each generation. However, at the end of each generation
300
(step 8), the WNN is tested on the unseen validation samples. For instance, val-
301
idation samples can be some samples at the end of the historical data interval,
302
i.e. the closest historical data to the forecasting horizon. When MSE of the un-
303
seen validation samples, as a measure of the WNN’s forecast error, begins to rise,
304
the generalization capability of the WNN begins to degrade, and therefore, the 17
305
training phase should be terminated. Then, the best individual of the WNN in the
306
generation leading to the minimum validation error, which is expected to yield the
307
maximum generalization capability of the WNN, is selected as the final solution
308
of the training phase indicating the free parameters of the WNN.
309
Although MSE has widely been used in forecasting models as a training error
310
measure, its applicability to train a neural network is optimal only if the prob-
311
ability distribution of the prediction errors is Gaussian. However, wind power
312
forecast error presents a non-Gaussian shape [35]. Minimizing the squared error
313
is equivalent to minimizing the variance of the error distribution. Accordingly, the
314
higher moments (e.g., skewness, kurtosis, etc.) are not captured, but they contain
315
information that should be passed to the free parameters of the neural network in-
316
stead of remaining in the error distribution. Ricardo Bessa et al. in [35] proposed
317
some new training error criteria based on minimizing the information content of
318
the error distribution (instead of minimizing the variance in MSE). Maximum
319
Correntropy Criterion is defined as follow: Ne 1 X G(i , σ 2 )} M CC = max{ Ne i=1
(11)
320
where G is the Gaussian kernel, Ne is the number of training samples, i is the
321
error for ith training sample, and σ 2 is the variance. Therefore, MCC approxi-
322
mates a non-Gaussian shape by summation of Gaussian functions corresponding
323
the errors of training samples. For more detailed information relating to MCC,
324
refer to [35]. It should be noted that MCC is a maximization problem, while the
18
325
proposed training algorithm in section 2.2, i.e. ICSA, was described as a min-
326
imization problem since it was based on MSE. Hence, to adapt this criterion to
327
the proposed training algorithm, minimization of
328
ered instead of maximization of MCC. It is noted that the value of correntropy is
329
always positive [36].
330
3. Numerical results
1 M CC
or −M CC can be consid-
331
In order to generate forecasts, two parameters must be decided first, namely,
332
forecast interval and forecast horizon. Forecast interval determines the length of
333
each time step into the future (e.g., 10 minutes or one hour), and forecast hori-
334
zon determines how many time steps into the future are of interests. Both factors
335
depend on the application of the forecasts. For instance, if the forecasts are used
336
for very short-term adjustments in operation schedules, the forecast horizon could
337
be as short as a few hours. Furthermore, the forecast interval may also vary de-
338
pending on the application. For example, while a unit commitment algorithm may
339
consider hourly intervals, an economic dispatch algorithm may look into shorter
340
intervals (e.g., 5 minutes). Note that in generating the forecasts, selecting the fore-
341
cast interval is sometimes limited by the availability of the data. For example, in
342
Alberta, the meteorological towers that measure weather factors at wind farm sites
343
are set to collect the data for every 10-minute interval. Thus, the forecast interval
344
cannot be less than 10 minutes if the meteorological data are of interests in the
345
models.
346
In this paper, we have selected to generate hourly wind power production fore19
347
casts for Alberta’s power system for up to 6 hours into the future as our test case.
348
Alberta’s electricity market is a real-time market with an hourly settlement inter-
349
val. Although the system marginal price is determined every minute, the supply
350
and demand offers must be submitted to the system operator for hourly intervals.
351
The bids and offers may be changed up to two hours before the operation hour,
352
which is refereed to as the T-2 window. The majority of slower generators do
353
not strategically adjust their prices and bid at $0/MWh. Faster units, on the other
354
hand, actively watch the supply-demand balance in the market and adjust their
355
strategies. These units normally act based on the developments in the market in
356
the short-term, usually the next few hours. In particular, for any given opera-
357
tion hour, forecasts of wind power generated before the start of the T-2 Window,
358
when the market participants can still change their bids and offers, is of impor-
359
tant value. In addition, given the real-time nature of the market, and considering
360
the fact that only short lead-time units behave strategically, the system operator is
361
mainly concered with the supply-demand balance for the next hour or two. Thus,
362
while forecasts into longer horizons are important, the ones for the short forecast
363
horizons are particularly valuable for the system operator and market participants
364
in Alberta.
365
The forecasts of meteorological variables, e.g., wind speed, through NWP
366
models are known to be useful in wind power forecasts for longer forecast hori-
367
zons [15]. Thus, we choose not to include such data into our model since we
368
are focusing on short-term prediction, and generate forecasts solely based on past
369
power production values. More specifically, 100 hourly lagged values of wind 20
370
power are considered as the candidate inputs, which are processed by the feature
371
selection technique to select a minimum subset of the most informative features
372
for the proposed forecasting engine. Furthermore, 60 days prior to each forecast
373
day are considered as the historical data divided to 59 days as the training set and
374
one day before the forecast day as the validation set. The second week of March,
375
June, September, and December of year 2012 are considered as test weeks. For
376
each test week, the forecasts are updated in two different ways in this paper. In
377
the first part of the numerical results, we update the forecasts every 6 hours, and
378
thus, 28 sets of forecasts are generated for each of the representative weeks. In
379
this part, the error measures are evaluated based on the average of errors for the
380
individual forecasts over the entire week. In the second part of the results, we test
381
the model by updating the forecasts every hour, i.e., for every test hour, there will
382
be six versions of forecasts produced at previous hours. For these forecasts, we
383
evaluate the error measures for each forecast horizon, as further discussed later in
384
this section.
385
To show an example for selection of inputs using the feature selection tech-
386
nique, the selected features for the third forecasting window of September test
387
week, i.e. the third 6-hours of September 8, 2012 , are as follows:
388
X = [x1 , x2 , ..., xm ] = [W P (t − 1), W P (t − 2), W P (t − 3), W P (t − 4), W P (t −
389
5), W P (t − 6), W P (t − 7), W P (t − 8), W P (t − 10), W P (t − 11), W P (t −
390
18), W P (t − 19), W P (t − 20), W P (t − 21), W P (t − 22), W P (t − 23), W P (t −
391
24), W P (t − 25), W P (t − 26), W P (t − 28)]
392
These 20 features are selected from 100 candidate inputs W P (t − 1), W P (t − 21
393
2), ..., W P (t − 100). Two error criteria are used in this paper to evaluate forecast errors: normalized Root Mean Square Error (nRMSE) and normalized Mean Absolute Error (nMAE) defined as follows: v u NH u 1 X WPACT(t) − WPFOR(t) 2 t ( ) × 100 nRMSE = NH t=1 WPCap
(12)
NH
1 X WPACT(t) − WPFOR(t) nMAE = | | × 100 NH t=1 WPCap
(13)
394
where WPACT(t) and WPFOR(t) indicate the actual and forecast values of wind power
395
for hour t. Also, NH indicates number of hours, which is 168 for test weeks, and
396
WPCap is the total wind power capacity of aggregated wind farms, which are 861,
397
941, 941 and 1087 MW in March, June, September and December of year 2012,
398
respectively, due to growth of wind power capacity in Alberta.
399
3.1. Numerical Results with 6-hour Updates
400
For these forecasts, at the end of each 6-hour window, when the wind power
401
values of 6 hours become available, the historical data is updated to perform the
402
wind power prediction of the next 6 hours. Thus, each forecast horizon includes
403
6 forecast steps. However, one week or 168 hours are considered as the eval-
404
uation period for the error criteria of nRMSE and nMAE to better evaluate the
405
performance of the method over a longer period.
22
406
The results obtained from the proposed forecasting method with the two train-
407
ing error criteria, i.e., MSE and MCC, are shown in Table 1. We have also gen-
408
erated the forecasts based on some other popular models, i.e., the persistence
409
method, and RBF and MLP neural networks. For the sake of a fair compari-
410
son, all of these methods have the same historical data except the persistence
411
method that does not require training samples. Observe from Table 1 that WNN
412
with MSE as the training measure outperforms the three other forecast methods.
413
For instance, the average nRMSE and average nMAE of WNN with MSE are
414
(14.95-14.17)/14.95=5.2% and (10.71-10.26)/10.71=4.2% lower than those of the
415
persistence method.
416
Moreover, as seen from the last two columns of Table 1, considering MCC
417
as the training error measure can significantly improve the forecasting accuracy
418
of wind power prediction. For instance, the average nRMSE and average nMAE
419
for WNN with MCC are respectively 6.6% and 5.5% lower than those for WNN
420
with MSE, and 11.4% and 9.4% lower than those for the persistence method,
421
which clearly show the advantage of using MCC error measure in training phase
422
of a wind power forecasting model. Note that while persistent forecasts are a
423
useful benchmark, specially for one-step-ahead forecasts, they do not provide any
424
information on variations and ramps in a multi-step-ahead forecasting practice.
425
Thus, despite their average errors being close to the proposed method, they do not
426
contain ramping information, and thus, less useful.
427
To illustrate the effectiveness of Morlet wavelet function as the activation func-
428
tion of WNN, it is compared with a WNN based on Mexican hat wavelet function, 23
Table 1: nRMSE (%) and nMAE (%) results for the four test weeks of year 2012
Model Persistence RBF MLP WNN with MSE WNN with MCC
Error nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE
Mar. 13.71 10.08 18.32 13.32 15.36 12.42 12.38 9.36 12.23 9.22
Test week Jun. Sep. 15.14 18.44 10.79 13.11 14.57 18.62 10.45 13.77 15.62 19.80 11.56 14.54 14.99 17.66 10.64 12.49 12.48 16.68 9.64 11.73
Dec. 12.49 8.84 14.11 10.24 12.32 9.02 11.65 8.53 11.58 8.22
Average 14.95 10.71 16.40 11.95 15.78 11.89 14.17 10.26 13.24 9.70
Table 2: Comparison of the proposed WNN with Morlet wavelet and a WNN with Mexican hat wavelet
Model
Error
Mar. WNN with nRMSE 13.77 Mexican hat nMAE 10.13 WNN with nRMSE 12.23 Morlet nMAE 9.22
Test week Jun. Sep. Dec. 13.23 17.12 12.22 9.80 12.54 8.71 12.48 16.68 11.58 9.64 11.73 8.22
Average 14.08 10.30 13.24 9.70
429
similar to that used in [28] for wind forecasting, in Table 2. Improved performance
430
of the proposed WNN can be seen from Table 2 such that wind power forecast ac-
431
curacy of the proposed WNN is better than the WNN with Mexican hat wavelet
432
in all test weeks.
433
In the next numerical experiment, the effectiveness of the proposed train-
434
ing strategy is evaluated. For this purpose, the proposed ICSA is replaced with
435
several other well-known stochastic search techniques including Simulated An-
436
nealing (SA), Particle Swarm Optimization (PSO), Differential Evolution (DE),
437
and CSA, while the other parts of the suggested wind power forecasting engine,
438
i.e. WNN with Morlet wavelet function and MCC training criterion, are kept un24
Table 3: Comparison of the proposed training strategy, i.e. ICSA, with SA, PSO, DE and CSA
Algorithm SA PSO DE CSA ICSA
Error nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE
Mar. 19.95 14.73 18.33 14.18 17.51 13.39 13.79 10.49 12.23 9.22
Test week Jun. Sep. 16.23 16.96 11.57 13.53 17.69 20.70 13.38 14.37 17.43 17.94 13.34 13.08 13.47 18.29 10.29 12.80 12.48 16.68 9.64 11.73
Dec. 13.70 10.18 14.01 9.88 13.97 10.25 13.22 9.21 11.58 8.22
Average 16.71 12.50 17.68 12.95 16.72 12.52 14.69 10.70 13.24 9.70
439
changed. As seen from Table 3, the proposed ICSA leads to the lowest wind fore-
440
casting errors in all test weeks among all stochastic search techniques including
441
SA, PSO, DE and CSA by 20.7, 25.1, 20.8 and 9.8% improvement of nRMSE, and
442
by 22.4, 25.0, 22.5 and 9.4% improvement of nMAE, respectively, demonstrating
443
the effectiveness of the proposed training strategy.
444
3.2. Numerical Results with Hourly Updates
445
Prediction errors can also be calculated for each look-ahead forecast distinctly
446
(i.e., each forecast hour). In some of the mainly academic literature, the average
447
of prediction errors for 1-hour ahead up to the last hour of the forecast horizon
448
is calculated. For instance, in 6-hour ahead prediction, each forecasting window
449
includes 1-hour to 6-hour ahead forecasts, and the error is the average of the pre-
450
diction errors of these 6 forecast values, as performed in subsection 3.1. How-
451
ever, in most commercial forecasting tools, the forecasts are updated after each
452
forecast interval, and forecast accuracy is evaluated for each look-ahead interval 25
453
individually. In other words, the values of wind power for the first 6 hours, i.e.,
454
W P (t + 1), ..., W P (t + 6), are predicted at time t. When the observed value of
455
wind power for the hour t + 1 becomes available, the data is updated and forecasts
456
for the next 6 hours, i.e., W P (t + 2), ..., W P (t + 7), are predicted at time t + 1.
457
In other words, the data is updated every hour, while the forecast horizon is
458
still 6-hour-ahead. In the following numerical experiment, the proposed method
459
is applied for aggregated wind power forecast of Alberta in 10 test weeks, in-
460
cluding the second weeks of March to December 2012, such that the prediction
461
accuracy of different look-ahead forecasts within the 6-hour window is evaluated.
462
Fig. 4 demonstrates the results of this numerical experiment including the curves
463
of average nRMSE for the six look-ahead forecasts, obtained from the proposed
464
approach. We also present the same measures of accuracy for the third-party fore-
465
casts currently employed by the Alberta Electric Systems Operator [2]. Observe
466
from this figure that forecast accuracy for both models decreases as the look ahead
467
forecast increases due to higher cumulative errors. For instance, the average fore-
468
cast error of the proposed method for 1-hour-ahead forecast, i.e. forecast values
469
generated 1 hour before the real time, is 5.04%, while for 6-hour-ahead, i.e. fore-
470
cast values generated 6 hours before the real time, is 15.94%. Compared to the
471
third-party forecasts, the proposed method results in better or comparable forecast
472
accuracy up to 3 hours ahead. For instance, the proposed model improves the ac-
473
curacy for 1-hour-ahead forecast by (7.23-5.04)/7.23=30.29%. Note that at any
474
settlement interval, the next three hours are particularly important to the system
475
operator in Alberta. This is because, given the real-time nature of the market, 26
16
14
nRMSE (%)
12
10
Third-party Forecasts
8
Proposed Method 6
4 1
2
3 4 Look-ahead Forecast (Hour)
5
6
Figure 4: Average nRMSE errors of two forecasting models in different look ahead forecasts
476
the next hour or two need to be monitored properly to ensure system security and
477
supply-demand balance. Strategic market participants also need to watch the next
478
three hours because it is just outside the T-2 Window. For the three longer look-
479
ahead windows, the third party forecasts outperforms the ones generated by our
480
model. This is mainly because the third party forecasts include NWP data. Thus,
481
a combination of the forecasts generated by our model for the first three hours and
482
the third-party forecasts for the last three hours would provide a more accurate
483
picture of future wind generation.
484
4. Conclusion
485
In this paper, a new wind power prediction strategy is proposed. A WNN
486
with multi-dimensional Morlet wavelet as the activation function of the hidden
487
neurons and MCC as the training criterion is applied as the forecasting engine
488
to implement the input/output mapping function of wind prediction process. A 27
489
new stochastic search technique, named ICSA, which is the improved version
490
of Clonal selection algorithm, is proposed and adapted as the training procedure
491
to optimize the free parameters of the forecasting engine. Effectiveness of the
492
whole proposed wind power forecasting strategy as well as the effectiveness of
493
its main components including the suggested WNN and training procedure is ex-
494
tensively evaluated by real-world data for wind power prediction. As regards the
495
training algorithm, the proposed ICSA outperforms the other stochastic search al-
496
gorithms including SA, PSO, DE and CSA in terms of both nRMSE and nMAE,
497
illustrating the effectiveness of the proposed training strategy. Moreover, the sug-
498
gested Morelet wavelet function results in more accurate wind power forecast than
499
Mexican-hat wavelet function, and MCC training criterion leads to lower wind
500
power prediction errors than the traditional training error measure of MSE.
501
References
502
[1] www.wwindea.org.
503
[2] www.aeso.ca.
504
[3] Y. Xu, Z.-Y. Dong, Z. Xu, K. Meng, K. P. Wong, An intelligent dynamic
505
security assessment framework for power systems with wind power, IEEE
506
Transactions on Industrial Informatics 8 (2012) 995–1003.
507
[4] P. Hu, R. Karki, R. Billinton, Reliability evaluation of generating systems
508
containing wind power and energy storage, Generation, Transmission Dis-
509
tribution, IET 3 (2009) 783–791. 28
510
[5] K. Rohrig, B. Lange, Improvement of the power system reliability by predic-
511
tion of wind power generation, Power Engineering Society General Meeting,
512
2007. IEEE (2007) 1–8.
513
[6] J. Cardell, L. Anderson, C. Y. Tee, The effect of wind and demand un-
514
certainty on electricity prices and system performance, Transmission and
515
Distribution Conference and Exposition, 2010 IEEE PES (2010) 1–4.
516
517
[7] M. Black, G. Strbac, Value of bulk energy storage for managing wind power fluctuations, IEEE Transactions on Energy Conversion 22 (2007) 197–205.
518
[8] www.nyiso.com.
519
[9] N. Chen, Z. Qian, I. Nabney, X. Meng, Wind power forecasts using gaussian
520
processes and numerical weather prediction, IEEE Transactions on Power
521
Systems (2013) 1–10.
522
[10] M. Khalid, A. Savkin, A method for short-term wind power prediction with
523
multiple observation points, IEEE Transactions on Power Systems 27 (2012)
524
579–586.
525
[11] I. J. Ramirez-Rosado, L. A. Fernandez-Jimenez, C. Monteiro, J. Sousa,
526
R. Bessa, Comparison of two new short-term wind-power forecasting sys-
527
tems, Renewable Energy 34 (2009) 1848 – 1854.
528
529
[12] R. G. Kavasseri, K. Seetharaman, Day-ahead wind speed forecasting using f-arima models, Renewable Energy 34 (2009) 1388 – 1393. 29
530
531
[13] T. Barbounis, J. Theocharis, Locally recurrent neural networks for long-term wind speed and power prediction, Neurocomputing 69 (2006) 466 – 496.
532
[14] S. Fan, J. Liao, R. Yokoyama, L. Chen, W. jen Lee, Forecasting the wind
533
generation using a two-stage network based on meteorological information,
534
IEEE Transactions on energy conversion 24 (2009) 474–482.
535
[15] G. Sideratos, N. Hatziargyriou, Using radial basis neural networks to esti-
536
mate wind power production, Power Engineering Society General Meeting,
537
2007. IEEE (2007) 1–7.
538
[16] N. Amjady, F. Keynia, H. Zareipour, A new hybrid iterative method for
539
short-term wind speed forecasting, European Transactions on Electrical
540
Power 21 (2011) 581–595.
541
[17] C. Potter, M. Negnevitsky, Very short-term wind forecasting for tasmanian
542
power generation, IEEE Transactions on Power Systems 21 (2006) 965–972.
543
[18] N. Amjady, F. Keynia, H. Zareipour, Wind power prediction by a new fore-
544
cast engine composed of modified hybrid neural network and enhanced parti-
545
cle swarm optimization, Sustainable Energy, IEEE Transactions on 2 (2011)
546
265–276.
547
[19] M. Milligan, M. Schwartz, Y. Wan, Statistical wind power forecasting mod-
548
els: results for u.s. wind farms, 2003. URL: http://www.nrel.gov/
549
docs/fy03osti/33956.pdf.
30
550
[20] K. Methaprayoon, C. Yingvivatanapong, W. jen Lee, J. Liao, An integration
551
of ann wind power estimation into unit commitment considering the fore-
552
casting uncertainty, IEEE Transactions on Industry Applications 43 (2007)
553
1441–1448.
554
[21] P. Johnson, M. Negnevitsky, K. Muttaqi, Short term wind power forecast-
555
ing using adaptive neuro-fuzzy inference systems, Australasian Universities
556
Power Engineering Conference, AUPEC 2007 (2007) 1–6.
557
[22] J. Catal˜ao, H. M. I. Pousinho, V. Mendes, Hybrid intelligent approach for
558
short-term wind power forecasting in portugal, Renewable Power Genera-
559
tion, IET 5 (2011) 251–257.
560
[23] R. R. B. de Aquino, M. M. S. Lira, J. de Oliveira, M. Carvalho, O. Neto,
561
G. de Almeida, Application of wavelet and neural network models for wind
562
speed and power generation forecasting in a brazilian experimental wind
563
park, Neural Networks, 2009. IJCNN 2009. International Joint Conference
564
on (2009) 172–178.
565
[24] P. Chen, H. Chen, R. Ye, Chaotic wind speed series forecasting based on
566
wavelet packet decomposition and support vector regression, IPEC, 2010
567
Conference Proceedings (2010) 256–261.
568
[25] D. Faria, R. Castro, C. Philippart, A. Gusmao, Wavelets pre-filtering in wind
569
speed prediction, Power Engineering, Energy and Electrical Drives, 2009.
570
POWERENG ’09. International Conference on (2009) 168–173. 31
571
[26] C. Monteiro, R. Bessa, V. Miranda, A. Botterud, J. Wang, G. Conzelmann,
572
Wind power forecasting: State-of-the-art 2009, ???? URL: http://www.
573
dis.anl.gov/pubs/65613.pdf.
574
[27] G. Nason, Wavelet methods in Statistics with R, Springer, 2008.
575
[28] L. J. Ricalde, G. Catzin, A. Y. Alanis, E. N. Sanchez, Higher order wavelet
576
neural networks with kalman learning for wind speed forecasting, Compu-
577
tational Intelligence Applications In Smart Grid (CIASG), 2011 IEEE Sym-
578
posium on (2011) 1–6.
579
[29] N. Pindoriya, S. Singh, S. Singh, An adaptive wavelet neural network-based
580
energy price forecasting in electricity markets, IEEE Transactions on Power
581
Systems 23 (2008) 1423–1432.
582
583
[30] L. Wu, M. Shahidehpour, A hybrid model for day-ahead price forecasting, IEEE Transactions on Power Systems 25 (2010) 1519–1530.
584
[31] L. de Castro, F. Von Zuben, Learning and optimization using the clonal se-
585
lection principle, IEEE Transactions on Evolutionary Computation 6 (2002)
586
239–251.
587
[32] Q. Wang, C. Wang, X. Gao, A hybrid optimization algorithm based on clonal
588
selection principle and particle swarm intelligence, Intelligent Systems De-
589
sign and Applications, 2006. ISDA ’06. Sixth International Conference on 2
590
(2006) 975–979.
32
591
[33] G. C. Liao, Application of an immune algorithm to the short-term unit com-
592
mitment problem in power system operation, Generation, Transmission and
593
Distribution, IEE Proceedings- 153 (2006) 309–320.
594
[34] A. Slowik, M. Bialko, Training of artificial neural networks using differen-
595
tial evolution algorithm, Human System Interactions, 2008 Conference on
596
(2008) 60–65.
597
[35] R. Bessa, V. Miranda, J. Gama, Entropy and correntropy against minimum
598
square error in offline and online three-day ahead wind power forecasting,
599
IEEE Transactions on Power Systems 24 (2009) 1657–1666.
600
[36] W. Liu, P. P. Pokharel, J. C. Principe, Correntropy: Properties and appli-
601
cations in correntropy: Properties and applications in non-gaussian signal
602
processing, IEEE Transactions on Signal Processing 55 (2007) 5286–5298.
33