Wind Power Forecast Using Wavelet Neural Network Trained by ...

2 downloads 0 Views 331KB Size Report
Jun 13, 2014 - improved Clonal selection algorithm, which optimizes the free ... Keywords: Wind Power Forecasting, Wavelet Neural Network, Clonal.
Wind Power Forecast Using Wavelet Neural Network Trained by Improved Clonal Selection Algorithm Hamed Chitsaza,∗, Nima Amjadyb , Hamidreza Zareipoura a

Schulich School of Engineering, University of Calgary, Alberta, Canada b Semnan University, Semnan, Iran

Abstract With the integration of wind farms into electric power grids, an accurate wind power prediction is becoming increasingly important for the operation of these power plants. In this paper, a new forecasting engine for wind power prediction is proposed. The proposed engine has the structure of Wavelet Neural Network (WNN) with the activation functions of the hidden neurons constructed based on multi-dimensional Morlet wavelets. This forecast engine is trained by a new improved Clonal selection algorithm, which optimizes the free parameters of the WNN for wind power prediction. Furthermore, Maximum Correntropy Criterion (MCC) has been utilized instead of mean squared error as the error measure in training phase of the forecasting model. The proposed wind power forecaster is tested with real-world hourly data of system level wind power generation in Alberta, Canada. In order to demonstrate the efficiency of the proposed method, it is compared with several other wind power forecast techniques. The obtained results confirm the validity of the developed approach. ∗ Corresponding Author: Hamed Chitsaz; Email: [email protected]; Phone: +1-587-8996-8388; Fax:+1-403-282-6855

Preprint submitted to Elsevier

June 13, 2014

Keywords: Wind Power Forecasting, Wavelet Neural Network, Clonal Optimization

1

1. Introduction

2

In recent years, wind power has been the fastest growing renewable electric-

3

ity generation technology in the world. The worldwide wind capacity reached

4

approximately 300 GW by the end of June 2013, out of which 13.9 GW were

5

added in the first six months of 2013 [1]. In particular, Canada installed 377 MW

6

during the first half of 2013, which is 50 % more than in the previous period of

7

2012 [1]. In the Province of Alberta, Canada, in particular, the installed capacity

8

reached 1087 MW in late 2012, and is expected to grow to 2388 MW by 2016 [2].

9

Despite the environmental benefits of wind power, it has an intermittent nature,

10

which could affect power systems security [3] and reliability [4].

11

One approach to deal with wind power intermittency in operation time scale

12

is to forecast it over an extended period of time. Accurate wind power forecasting

13

can improve the economical and technical integration of large capacities of wind

14

energy into the existing electricity grid. Wind forecasts are important for system

15

operators to control balancing, operation and safety of the grid [5]. On the other

16

hand, wind power forecast errors might sometimes require system operators to re-

17

dispatch the system in real time. The costs of re-dispatch affect electricity prices

18

and system performance [6]. Moreover, reserve requirements are connected to

19

wind forecast uncertainty [7]. Hence, reducing the costs of re-dispatch and con-

20

tribution of spinning reserves by more accurate wind power prediction can effec2

21

tively increase system operation efficiency. For instance, the economic benefits

22

of accurate wind forecasting were assessed by GE Energy for the New York State

23

Energy Research and Development Authority (NYSERDA) and the NYISO. In

24

that study, it was estimated that $125 Million, or 36%, of the cost reduction is

25

associated with state-of-the-art wind power forecasting. It was about 80% of the

26

estimated cost reduction that could be achieved with a perfect wind power pro-

27

duction forecast [8].

28

Hence, various approaches have been proposed to improve wind power fore-

29

casting accuracy in the literature. In one group of models, wind speed and other

30

climate variables are predicted using Numerical Weather Prediction (NWP) mod-

31

els, and those forecasts are used to predict the wind power output of a wind turbine

32

or a wind farm [9, 10] using turbine or farm production curves. In another group

33

of models, the NWP forecasts, or self-generated climate variables forecasts, are

34

fed into secondary time series models to predict wind power output for a turbine,

35

a farm or system level wind power production[11, 12, 13, 14, 15, 16, 17, 18].

36

The time series models may be built based on statistical approaches [11, 12] or

37

artificial intelligence techniques [13, 14, 15, 16, 17, 18]. In a third group of mod-

38

els, only past power production values are used in univariate models to predict

39

future wind power values [19, 20, 21, 22, 23, 24, 25]. Despite improvements

40

in wind power forecasting methods, wind power forecasts still suffer from rela-

41

tively high errors, ranging from 8% to 22% (in terms of normalized mean squared

42

error) depending on several factors, such as, forecasting horizon, type of forecast-

43

ing model, size of wind farm and geographic location[26]. The contribution of the 3

44

present paper is to propose a new forecasting technique for short-term wind power

45

forecasting. In particular, we develop a wind power forecasting engine based on

46

Wavelet Neural Network (WNN) with multi-dimensional Morlet wavelets as the

47

activation functions of the hidden neurons and maximum correntropy criterion as

48

the error measure of the training phase. We propose a stochastic search technique,

49

which is an improved version of Clonal search algorithm, for training the fore-

50

casting engine. The significance of the proposed forecasting technique is that the

51

combination of the WNN and the proposed training strategy is capable of captur-

52

ing highly non-linear patterns in the data and result in improved forecast accuracy.

53

Particularly, high exploitation capability of the proposed training strategy enables

54

it to find more optimal solutions for the optimization problem of WNN training.

55

The remaining sections of the paper are organized as follows. In Section 2,

56

after a brief literature review on wind power forecasting models, the architecture

57

of the wind forecasting model is introduced in subsection 2.1. The proposed train-

58

ing strategy is then presented in subsection 2.2. The results of the proposed wind

59

forecasting method, obtained for the real-world test cases, are compared with the

60

results of several other prediction approaches in section 3. Section 4 concludes

61

the paper.

62

2. Methodology

63

In this section, a literature review of the existing wind power forecasting mod-

64

els with a focus on artificial intelligence methods is firstly provided. We then

65

provide the details of the proposed forecasting engine and its training strategy. 4

66

Artificial intelligence techniques, especially artificial neural networks, have

67

been used in several papers to predict wind power generation. Recurrent neural

68

network [13], Radial Basis Function (RBF) neural network [15], and Multi-Layer

69

Perceptron (MLP) neural networks [20], have been proposed for wind power fore-

70

casting. Although neural networks can model nonlinear input/output mapping

71

functions, a single neural network with traditional training mechanisms has lim-

72

ited learning capability and may not be able to correctly learn the complex behav-

73

ior of wind signal. To remedy this problem, combinations of neural networks with

74

each other and with fuzzy inference systems such as Adaptive Neuro-Fuzzy Infer-

75

ence System (ANFIS) [17, 21], and Hybrid Iterative Forecast Method (combining

76

MLP neural networks) [16], have also been suggested for wind speed and power

77

prediction. Another approach to tackle the complex behavior of wind power time

78

series is using wavelet transform. In [27], it has been discussed that wavelets can

79

effectively be used for both stationary and non-stationary time series analysis, and

80

that is one of the reasons for the wide and diverse applications of wavelets. In

81

[22, 24, 25, 23], wind speed and power prediction approaches based on wavelet

82

transform, as a preprocessor to decompose wind speed/power time series, and

83

ANFIS [22], Support Vector Regression (SVR) [24], ARMA [25] and Artificial

84

Neural Network (ANN) [23], as forecast engines, have been presented. In [28],

85

wavelet has been used in the form of wavelet neural network (WNN) for wind

86

speed prediction and the WNN is trained by Extended Kalman Filter (EKF). In

87

[18], a wind power prediction strategy including Modified Hybrid Neural Network

88

(MHNN) and Enhanced Particle Swarm Optimization (EPSO) has been proposed. 5

Morlet function 1

0.5

0.5

 (X)

 (X)

Mexican hat function 1

0

-0.5

-1 -5 -4 -3 -2 -1

0

-0.5

0

1

2

3

4

5

-1 -5 -4 -3 -2 -1

X

0

1

2

3

4

5

X

Figure 1: Mexican hat and Morlet wavelet functions

89

Briefly, the proposed forecasting technique is composed of a WNN structure

90

with Morlet wavelet functions as activation functions in the hidden layer and a

91

new training strategy. These components are described next.

92

2.1. The Developed Wavelet Neural Network (WNN)

93

Wavelet transform has been used in some recent research works for wind fore-

94

casting, as a preprocessor to decompose wind speed/power time series to a set of

95

sub-series [22, 24, 25, 23]. The future values of the sub-series are predicted by

96

ANFIS [22], SVR [24], ARMA [25] and ANN [23] and then combined by the

97

inverse WT to form the forecast value of wind power/speed. Another approach

98

to utilize wavelet in a forecast process is through constructing wavelet neural net-

99

work in which a wavelet function is used as the activation function of the hidden

100

neurons of an ANN. For instance, WNNs with Mexican hat and Morlet wavelets,

101

shown in Fig. 1, as the activation function of the hidden neurons have been ap-

102

plied for another application, i.e. price forecast of electricity markets, in [29]

103

and [36], respectively. Due to the local properties of wavelets and the ability of

6

104

adapting the wavelet shape according to the training data set instead of adapting

105

the parameters of the fixed shape activation function, WNNs offer higher gener-

106

alization capability compared to the classical feed forward ANNs [29]. Recently,

107

a WNN using Mexican hat mother wavelet function is proposed for wind speed

108

forecast [28]. However, Morlet wavelet has vanishing mean oscillatory behavior

109

with more diverse oscillations with respect to Mexican hat wavelet, which can

110

be seen from Fig. 1, and so it can better localize high frequency components in

111

frequency domain and various changes in time domain of severely non-smooth

112

time series, e.g. wind power. In [30], it is mentioned that Morlet leads to a better

113

electricity price forecast compared with Mexican hat. In this paper, we propose

114

a WNN with multi-dimensional Morlet wavelet as the activation function of the

115

hidden neurons for wind power forecasting. In the proposed method, we imple-

116

ment a new training algorithm which can efficiently search for the global optimum

117

solution, while a simple gradient method is used as the training algorithm of the

118

WNN in [30]. In addition, since the performance of these two activation functions

119

has not been illustrated in [30], we compare these functions in numerical results

120

in order to demonstrate the effectiveness of Morelet wavelet over Mexican hat in

121

wind power forecast.

122

Architecture of the WNN is shown in Fig. 2, which is a three-layer feed-

123

forward structure. In Fig. 2, X = [x1 , x2 , ..., xm ] is the input vector of the forecast

124

process and y is the target variable. The inputs x1 , x2 , ..., xm of the forecasting

125

engine can be from the past values of the target variable and past and forecast

126

values of the related exogenous variables. For instance, past values of wind power 7

Hidden Layer Input Layer

x1

F1

I

w1

F2 x2

I

Output Layer

w2 v1 v2

xm

+

y

vm

I

wn Fn

Figure 2: Architecture of the proposed wind forecasting engine (WNN with Morlet wavelet)

127

along with the past and forecast values of wind speed, wind direction, temperature

128

and humidity can be considered for wind power prediction, provided that their

129

data is available [14]. A feature selection technique can be used to refine these

130

candidate features and select the most effective inputs for the forecast process. In

131

this research work, we use the feature selection method of [18]. This method is

132

based on the information theoretic criterion of mutual information and selects the

133

most informative inputs for the forecast process by filtering out the irrelevant and

134

redundant candidate features. However, since this method is not the focus of this

135

paper, it is not further discussed here. The interested reader can refer to [18] for

136

details of this feature selection method. Here, the target variable is wind power

137

of the next time interval that the forecasting engine presents a prediction for it.

138

Multi-period forecast, e.g. prediction of wind power for the next forecast steps, is

139

reached via recursion, i.e. by feeding input variables with the forecaster’s outputs.

140

For instance, forecasted wind power for the first hour is used as y(t − 1) for wind 8

141

power prediction of the second hour provided that y(t − 1) is among the selected

142

candidate inputs of the feature selection technique.

143

The forecasting engine should construct the input/output mapping function of

144

X ⇒ y. The activation function of the input layer nodes is identity function, i.e.

145

I(x) = x. In other words, the input layer only propagates the inputs of the WNN

146

to the next layers. The activation function of the hidden layer nodes of the WNN,

147

i.e. multi-dimensional Morlet wavelet, is constructed as follows:

Fi (x1 , x2 , ..., xm ) =

m Y

ψai ,bi (xj ),

∀i = 1, 2, ..., n

(1)

j=1 148

ψai ,bi (xj ) = ψ(

x j − bi ) ai

(2)

149

where n indicates the number of hidden neurons of WNN, and one-dimensional

150

Morlet wavelet ψ(.) is defined as follow: 2

ψ(x) = e−0.5x cos(5x)

(3)

151

In (1) and (2), ψai ,bi is the scaled and shifted version of ψ(.) with ai and bi as the

152

scale and shift parameters, respectively. Each activation function Fi (.) has its own

153

ai and bi . Based on (1), Fi (.) is m-dimensional wavelet function of x1 , x2 , ..., xm

154

constructed by the tensor product of one-dimensional Morlet wavelets ψai ,bi . Fi-

9

155

nally, the output of the WNN, denoted by y in Fig. 2, is computed as follows:

y=

n X

wi Fi (x1 , x2 , ..., xm ) +

i=1

m X

v j xj

(4)

j=1

156

where, wi is the weight between ith hidden neuron and the output node, and vj

157

is the direct input weight between j th input and the output node. In other words,

158

the output of the WNN is obtained by a combination of multi-dimensional wavelet

159

functions, i.e. Fi (x1 , x2 , ..., xm ) , as well as a combination of inputs, i.e. xj . Thus,

160

the proposed WNN not only can benefit from the capabilities of wavelet functions,

161

such as their ability to capture cyclical behaviors, but also can capture trends of

162

the signal. Based on the above formulation, the vector of the free parameters of

163

the WNN, denoted by Z, is as follows:

Z = [v1 , ..., vm , w1 , ..., wn , a1 , ..., an , b1 , ..., bn ]

(5)

164

Therefore, the WNN has N P = m + 3n free parameters, which should be deter-

165

mined by the proposed training strategy.

166

2.2. The Proposed Training Strategy

167

We propose new training strategy to train the developed WNN. This strategy is

168

based on improved Clonal selection algorithm. In the following, at first, the Clonal

169

Selection Algorithm (CSA) is briefly introduced. Then, the proposed Improved

170

CSA (ICSA) is presented and adapted as the training strategy to optimize the free

171

parameters of WNN. 10

172

As an antigen (e.g., a virus) invades the human body, the biological immu-

173

nity system selects the antibodies that can effectively recognize and destroy the

174

antigen. The selection mechanism of the immunity system operates based on the

175

affinity of the antibodies with relation to the invading antigen. CSA is an efficient

176

optimization method, inspired by the biological immunity system selection mech-

177

anism, proposed by De Castro and Van Zuben [31]. This method has successfully

178

been applied to optimization and pattern recognition domains [31, 32] and also

179

unit commitment of power systems [33] in recent years. The performance of CSA

180

can be summarized as the following step by step procedure [31]:

181

Step 1: Randomly produce the initial population of CSA within the allowable

182

ranges. Each individual of the population, so called antibody in CSA, is a candi-

183

date solution for the optimization problem including its decision variables, called

184

genes in CSA [31]. Here, each antibody of the population of CSA includes N P

185

free parameters of WNN, shown in (5). The number of antibodies in the popula-

186

tion is denoted by N . The generation number g is set to zero (g = 0). In general,

187

antibody k in generation g is as follows:

g g g Zkg = [Zk,1 , Zk,2 , ..., Zk,N P ],

∀k = 1, 2, ..., N

(6)

188

g (l = 1, ..., N P ) can be from v1 , ..., vm where the lth gene or decision variable Zk,l

189

or w1 , ..., wn or a1 , ..., an or b1 , ..., bn as shown in (5). The four sets of vj , wi , ai

190

and bi of each individual are randomly initialized with uniform distribution in the

191

intervals [-1, 1], [-1, 1], [0.5, 2] and [-3, 3], respectively. Step 2: Determine the

11

192

affinity of the antibodies with respect to the antigen. In the optimization problems,

193

usually there is no explicit antigen population to be recognized, but an objective

194

function to be optimized (maximized or minimized). Thus, in optimization tasks,

195

an antibody affinity corresponds to the evaluation of the objective function for

196

the given antibody [33]. Here, the proposed WNN is trained, i.e. its free param-

197

eters are optimized, by CSA. Thus, training error of the WNN is considered as

198

the objective function of CSA, which should be minimized. Since training error

199

of ANN-based forecasting engines is widely measured in terms of Mean Squared

200

Error (MSE), it is considered as the measure of WNN training error for the present.

201

Step 3: Sort antibodies based on their training error values in terms of MSE (i.e.,

202

the objective function values) such that the best antibody with the lowest MSE

203

ranks the first.

204

Step 4: Copy the selected antibodies based on their position in the sorted popula-

205

tion:

nck = Round(

βN ), k

∀k = 1, 2, ..., N

(7)

206

where nck is the number of antibodies copied from k th antibody; Round(.) func-

207

tion rounds up/down its real argument to the nearest integer value; β is a constant

208

coefficient which indicates rate of copy. Thus, an antibody with lower MSE and

209

higher rank (lower k) will be copied more than those with higher MSE. At the end

12

210

of this step, the number of copied antibodies will be N C as follows:

NC =

N X

Round(

k=1

βN ) k

(8)

211

Performance of this step is graphically shown in Fig. 3. As seen, the first antibody

212

with the highest rank is copied nc1 times, the second one nc2 times and so on.

213

Step 5: Mutate the N C antibodies, produced in step 4, using the hypermutation

214

operator [32].

215

Step 6: Determine MSE value for each mutated antibody. Among the N C mu-

216

tated and N original antibodies, select N S antibodies (N S < N ) with the lowest

217

MSE values. These N S antibodies enter directly to the next generation.

218

Step 7: Randomly generate N − N S new antibodies for the next generation.

219

These randomly generated antibodies enhance search diversity of CSA, and con-

220

sequently, the algorithm takes the chance to escape from the local optima.

221

Step 8: Increment the generation number (g = g + 1). If the termination criterion,

222

such as maximum number of generations, is satisfied, the algorithm is terminated

223

and the best antibody of the last generation, owning the lowest MSE, is determined

224

as the final solution of CSA; otherwise, go back to step 2 and repeat this cycle.

225

The termination criterion used for the training of the WNN will be described later.

226

In the above algorithm, N , N S, and β are user-defined settings of CSA. In this

227

algorithm, antibodies are evolved through the mutation, which is the key operator

228

of CSA. The proposed Improved CSA (ICSA) includes two enhancements for this

229

operator as follows. De Castro and Van Zuben [31] discuss that the mutation rate

13

Figure 3: Representation of step 4 (copy operator) of CSA

230

should be inversely proportional to the antigenic affinity: the higher the affinity,

231

the smaller the mutation rate. Thus, in an optimization problem, more optimal

232

candidate solutions should be less mutated. This general idea is modeled in the

233

proposed ICSA based on the following relation: (−ρ

NkM ut = Round[e

M SEmin ) M SEk

× N P ],

∀k = 1, ..., N

(9)

234

where M SEk is MSE of k th antibody and M SEmin is the lowest MSE among

235

the antibodies of the current population; NkM ut represents number of genes of the

236

antibodies copied from k th antibody, produced in step 4, that should be mutated

237

(NkM ut ≤ N P ); the coefficient ρ controls the mutation rate such that higher ρ leads

238

to lower values of NkM ut . Thus, fewer decision variables are mutated in antibodies

239

with lower MSE (more optimal candidate solutions) and more decision variables

14

240

are mutated in antibodies with higher MSE (less optimal candidate solutions), and

241

consequently, the individuals of ICSA population are mutated in a coordinated

242

manner. As a result, not only ICSA does allow antibodies with lower MSE to be

243

copied more than those with higher MSE, but also controls the mutation process

244

of copied antibodies in accordance with their MSE using (9). Note that although

245

the mutation operation is applied to the N C copied antibodies as shown in step

246

5, only N values of NkM ut should be computed, since the antibodies copied from

247

one individual have the same NkM ut .

248

After determining NkM ut for the antibodies of ICSA, the set of genes that

249

should be mutated are randomly selected among the decision variables of each

250

antibody. The hypermutation operator of step 5 adds a normal random variable

251

with zero mean and constant variance to each decision variable of the mutating

252

antibody [39]. Here, a more effective mutation operation inspired from Differen-

253

tial Evolution (DE) algorithm [34] is proposed for ICSA as follows:

g+1 Zk,l = [1 − e

−ρ

M SEmin M SEk

g ]Zk,l +e

−ρ

M SEmin M SEk

g g (Zk1,l − Zk2,l ),

(10)

1 ≤ k 6= k1 6= k2 ≤ N C, 1 ≤ l ≤ N P

254

g+1 g where Zk,l and Zk,l represent gene l of antibody k in two successive generations

255

g and g +1. As seen, to mutate k th antibody, this mutation operator uses the values

256

of decision variables in two other antibodies (i.e. Zkg1 ,l and Zkg2 ,l ) as it is applied

257

in DE mutation operator.

258

In [34], it has been discussed that DE mutation operator by computation of

15

259

difference between two randomly chosen individuals from the population (here,

260

antibodies k1 and k2 ), determines a function gradient in a given area (not in a

261

single point), and therefore, prevents trapping the solution in a local optimum of

262

the objective function. Moreover, to enhance the search diversity of the proposed

263

mutation operation, it is separately applied to each gene of the mutating antibod-

264

ies. Additionally, the exp(.) term of (9) and its complement 1 − exp(.) are used

265

g in (10). If k th antibody Zk,l is a good candidate solution, the exp(.) term and its

266

complement become close to 0 and 1, respectively. Thus, the next generation de-

267

cision variables mainly take their values from the current generation decision vari-

268

ables and small gradient terms are added to them. Consequently, ICSA can search

269

promising areas of the solution space with small steps or high resolution, which

270

is known as exploitation capability. Based on this capability, a stochastic search

271

technique can extract optimal solutions of the solution space from its promising

272

areas. High exploitation capability of the proposed ICSA allows it to find more

273

optimal solutions for the optimization problem of WNN training. In other words,

274

the WNN using ICSA can better learn the complex input/output mapping function

275

of the wind power forecast process and so predict its future values with higher ac-

276

curacy. However, if k th antibody is a poor candidate solution, the exp(.) term and

277

its complement become close to 1 and 0, respectively, and so the next generation

278

decision variables mainly obtain their values from large gradient terms. Hence,

279

ICSA can move out from the non-promising areas of the solution space by large

280

steps. Finally, note that due to the effect of the mutation operation of (10), even

281

the copied antibodies of the same antibody lead to different individuals after the 16

282

mutation operation. Thus, at the end of step 5 of the proposed ICSA, N C diverse

283

candidate solutions are added to N original antibodies (Fig. 3), which further

284

enhance the search ability of ICSA.

285

The termination criterion is an important aspect of the proposed training strat-

286

egy, as it can affect the performance of the proposed forecasting engine. A low

287

number of ICSA generations may lead to insufficient training of the forecasting

288

engine and cause that the WNN cannot correctly learn the input/output mapping

289

function of the forecast process, i.e. X ⇒ y. On the other hand, a large number

290

of ICSA generations increases the computation burden and more importantly may

291

lead to over-fitting problem of the WNN. When over-fitting occurs in a neural net-

292

work based forecast engine, it memorizes the training samples instead of learning

293

them, and thus, while the neural network obtains very low training error, its gener-

294

alization capability (i.e. its ability to reply to unseen forecast samples) degrades.

295

To avoid these problems, a termination criterion based on cross-validation tech-

296

nique is used for the training phase of the proposed forecasting engine. In this

297

technique, the whole gathered historical data is divided to training and validation

298

samples. The WNN is trained by the proposed ICSA trying to minimize MSE

299

of training samples in each generation. However, at the end of each generation

300

(step 8), the WNN is tested on the unseen validation samples. For instance, val-

301

idation samples can be some samples at the end of the historical data interval,

302

i.e. the closest historical data to the forecasting horizon. When MSE of the un-

303

seen validation samples, as a measure of the WNN’s forecast error, begins to rise,

304

the generalization capability of the WNN begins to degrade, and therefore, the 17

305

training phase should be terminated. Then, the best individual of the WNN in the

306

generation leading to the minimum validation error, which is expected to yield the

307

maximum generalization capability of the WNN, is selected as the final solution

308

of the training phase indicating the free parameters of the WNN.

309

Although MSE has widely been used in forecasting models as a training error

310

measure, its applicability to train a neural network is optimal only if the prob-

311

ability distribution of the prediction errors is Gaussian. However, wind power

312

forecast error presents a non-Gaussian shape [35]. Minimizing the squared error

313

is equivalent to minimizing the variance of the error distribution. Accordingly, the

314

higher moments (e.g., skewness, kurtosis, etc.) are not captured, but they contain

315

information that should be passed to the free parameters of the neural network in-

316

stead of remaining in the error distribution. Ricardo Bessa et al. in [35] proposed

317

some new training error criteria based on minimizing the information content of

318

the error distribution (instead of minimizing the variance in MSE). Maximum

319

Correntropy Criterion is defined as follow: Ne 1 X G(i , σ 2 )} M CC = max{ Ne i=1

(11)

320

where G is the Gaussian kernel, Ne is the number of training samples, i is the

321

error for ith training sample, and σ 2 is the variance. Therefore, MCC approxi-

322

mates a non-Gaussian shape by summation of Gaussian functions corresponding

323

the errors of training samples. For more detailed information relating to MCC,

324

refer to [35]. It should be noted that MCC is a maximization problem, while the

18

325

proposed training algorithm in section 2.2, i.e. ICSA, was described as a min-

326

imization problem since it was based on MSE. Hence, to adapt this criterion to

327

the proposed training algorithm, minimization of

328

ered instead of maximization of MCC. It is noted that the value of correntropy is

329

always positive [36].

330

3. Numerical results

1 M CC

or −M CC can be consid-

331

In order to generate forecasts, two parameters must be decided first, namely,

332

forecast interval and forecast horizon. Forecast interval determines the length of

333

each time step into the future (e.g., 10 minutes or one hour), and forecast hori-

334

zon determines how many time steps into the future are of interests. Both factors

335

depend on the application of the forecasts. For instance, if the forecasts are used

336

for very short-term adjustments in operation schedules, the forecast horizon could

337

be as short as a few hours. Furthermore, the forecast interval may also vary de-

338

pending on the application. For example, while a unit commitment algorithm may

339

consider hourly intervals, an economic dispatch algorithm may look into shorter

340

intervals (e.g., 5 minutes). Note that in generating the forecasts, selecting the fore-

341

cast interval is sometimes limited by the availability of the data. For example, in

342

Alberta, the meteorological towers that measure weather factors at wind farm sites

343

are set to collect the data for every 10-minute interval. Thus, the forecast interval

344

cannot be less than 10 minutes if the meteorological data are of interests in the

345

models.

346

In this paper, we have selected to generate hourly wind power production fore19

347

casts for Alberta’s power system for up to 6 hours into the future as our test case.

348

Alberta’s electricity market is a real-time market with an hourly settlement inter-

349

val. Although the system marginal price is determined every minute, the supply

350

and demand offers must be submitted to the system operator for hourly intervals.

351

The bids and offers may be changed up to two hours before the operation hour,

352

which is refereed to as the T-2 window. The majority of slower generators do

353

not strategically adjust their prices and bid at $0/MWh. Faster units, on the other

354

hand, actively watch the supply-demand balance in the market and adjust their

355

strategies. These units normally act based on the developments in the market in

356

the short-term, usually the next few hours. In particular, for any given opera-

357

tion hour, forecasts of wind power generated before the start of the T-2 Window,

358

when the market participants can still change their bids and offers, is of impor-

359

tant value. In addition, given the real-time nature of the market, and considering

360

the fact that only short lead-time units behave strategically, the system operator is

361

mainly concered with the supply-demand balance for the next hour or two. Thus,

362

while forecasts into longer horizons are important, the ones for the short forecast

363

horizons are particularly valuable for the system operator and market participants

364

in Alberta.

365

The forecasts of meteorological variables, e.g., wind speed, through NWP

366

models are known to be useful in wind power forecasts for longer forecast hori-

367

zons [15]. Thus, we choose not to include such data into our model since we

368

are focusing on short-term prediction, and generate forecasts solely based on past

369

power production values. More specifically, 100 hourly lagged values of wind 20

370

power are considered as the candidate inputs, which are processed by the feature

371

selection technique to select a minimum subset of the most informative features

372

for the proposed forecasting engine. Furthermore, 60 days prior to each forecast

373

day are considered as the historical data divided to 59 days as the training set and

374

one day before the forecast day as the validation set. The second week of March,

375

June, September, and December of year 2012 are considered as test weeks. For

376

each test week, the forecasts are updated in two different ways in this paper. In

377

the first part of the numerical results, we update the forecasts every 6 hours, and

378

thus, 28 sets of forecasts are generated for each of the representative weeks. In

379

this part, the error measures are evaluated based on the average of errors for the

380

individual forecasts over the entire week. In the second part of the results, we test

381

the model by updating the forecasts every hour, i.e., for every test hour, there will

382

be six versions of forecasts produced at previous hours. For these forecasts, we

383

evaluate the error measures for each forecast horizon, as further discussed later in

384

this section.

385

To show an example for selection of inputs using the feature selection tech-

386

nique, the selected features for the third forecasting window of September test

387

week, i.e. the third 6-hours of September 8, 2012 , are as follows:

388

X = [x1 , x2 , ..., xm ] = [W P (t − 1), W P (t − 2), W P (t − 3), W P (t − 4), W P (t −

389

5), W P (t − 6), W P (t − 7), W P (t − 8), W P (t − 10), W P (t − 11), W P (t −

390

18), W P (t − 19), W P (t − 20), W P (t − 21), W P (t − 22), W P (t − 23), W P (t −

391

24), W P (t − 25), W P (t − 26), W P (t − 28)]

392

These 20 features are selected from 100 candidate inputs W P (t − 1), W P (t − 21

393

2), ..., W P (t − 100). Two error criteria are used in this paper to evaluate forecast errors: normalized Root Mean Square Error (nRMSE) and normalized Mean Absolute Error (nMAE) defined as follows: v u NH u 1 X WPACT(t) − WPFOR(t) 2 t ( ) × 100 nRMSE = NH t=1 WPCap

(12)

NH

1 X WPACT(t) − WPFOR(t) nMAE = | | × 100 NH t=1 WPCap

(13)

394

where WPACT(t) and WPFOR(t) indicate the actual and forecast values of wind power

395

for hour t. Also, NH indicates number of hours, which is 168 for test weeks, and

396

WPCap is the total wind power capacity of aggregated wind farms, which are 861,

397

941, 941 and 1087 MW in March, June, September and December of year 2012,

398

respectively, due to growth of wind power capacity in Alberta.

399

3.1. Numerical Results with 6-hour Updates

400

For these forecasts, at the end of each 6-hour window, when the wind power

401

values of 6 hours become available, the historical data is updated to perform the

402

wind power prediction of the next 6 hours. Thus, each forecast horizon includes

403

6 forecast steps. However, one week or 168 hours are considered as the eval-

404

uation period for the error criteria of nRMSE and nMAE to better evaluate the

405

performance of the method over a longer period.

22

406

The results obtained from the proposed forecasting method with the two train-

407

ing error criteria, i.e., MSE and MCC, are shown in Table 1. We have also gen-

408

erated the forecasts based on some other popular models, i.e., the persistence

409

method, and RBF and MLP neural networks. For the sake of a fair compari-

410

son, all of these methods have the same historical data except the persistence

411

method that does not require training samples. Observe from Table 1 that WNN

412

with MSE as the training measure outperforms the three other forecast methods.

413

For instance, the average nRMSE and average nMAE of WNN with MSE are

414

(14.95-14.17)/14.95=5.2% and (10.71-10.26)/10.71=4.2% lower than those of the

415

persistence method.

416

Moreover, as seen from the last two columns of Table 1, considering MCC

417

as the training error measure can significantly improve the forecasting accuracy

418

of wind power prediction. For instance, the average nRMSE and average nMAE

419

for WNN with MCC are respectively 6.6% and 5.5% lower than those for WNN

420

with MSE, and 11.4% and 9.4% lower than those for the persistence method,

421

which clearly show the advantage of using MCC error measure in training phase

422

of a wind power forecasting model. Note that while persistent forecasts are a

423

useful benchmark, specially for one-step-ahead forecasts, they do not provide any

424

information on variations and ramps in a multi-step-ahead forecasting practice.

425

Thus, despite their average errors being close to the proposed method, they do not

426

contain ramping information, and thus, less useful.

427

To illustrate the effectiveness of Morlet wavelet function as the activation func-

428

tion of WNN, it is compared with a WNN based on Mexican hat wavelet function, 23

Table 1: nRMSE (%) and nMAE (%) results for the four test weeks of year 2012

Model Persistence RBF MLP WNN with MSE WNN with MCC

Error nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE

Mar. 13.71 10.08 18.32 13.32 15.36 12.42 12.38 9.36 12.23 9.22

Test week Jun. Sep. 15.14 18.44 10.79 13.11 14.57 18.62 10.45 13.77 15.62 19.80 11.56 14.54 14.99 17.66 10.64 12.49 12.48 16.68 9.64 11.73

Dec. 12.49 8.84 14.11 10.24 12.32 9.02 11.65 8.53 11.58 8.22

Average 14.95 10.71 16.40 11.95 15.78 11.89 14.17 10.26 13.24 9.70

Table 2: Comparison of the proposed WNN with Morlet wavelet and a WNN with Mexican hat wavelet

Model

Error

Mar. WNN with nRMSE 13.77 Mexican hat nMAE 10.13 WNN with nRMSE 12.23 Morlet nMAE 9.22

Test week Jun. Sep. Dec. 13.23 17.12 12.22 9.80 12.54 8.71 12.48 16.68 11.58 9.64 11.73 8.22

Average 14.08 10.30 13.24 9.70

429

similar to that used in [28] for wind forecasting, in Table 2. Improved performance

430

of the proposed WNN can be seen from Table 2 such that wind power forecast ac-

431

curacy of the proposed WNN is better than the WNN with Mexican hat wavelet

432

in all test weeks.

433

In the next numerical experiment, the effectiveness of the proposed train-

434

ing strategy is evaluated. For this purpose, the proposed ICSA is replaced with

435

several other well-known stochastic search techniques including Simulated An-

436

nealing (SA), Particle Swarm Optimization (PSO), Differential Evolution (DE),

437

and CSA, while the other parts of the suggested wind power forecasting engine,

438

i.e. WNN with Morlet wavelet function and MCC training criterion, are kept un24

Table 3: Comparison of the proposed training strategy, i.e. ICSA, with SA, PSO, DE and CSA

Algorithm SA PSO DE CSA ICSA

Error nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE nRMSE nMAE

Mar. 19.95 14.73 18.33 14.18 17.51 13.39 13.79 10.49 12.23 9.22

Test week Jun. Sep. 16.23 16.96 11.57 13.53 17.69 20.70 13.38 14.37 17.43 17.94 13.34 13.08 13.47 18.29 10.29 12.80 12.48 16.68 9.64 11.73

Dec. 13.70 10.18 14.01 9.88 13.97 10.25 13.22 9.21 11.58 8.22

Average 16.71 12.50 17.68 12.95 16.72 12.52 14.69 10.70 13.24 9.70

439

changed. As seen from Table 3, the proposed ICSA leads to the lowest wind fore-

440

casting errors in all test weeks among all stochastic search techniques including

441

SA, PSO, DE and CSA by 20.7, 25.1, 20.8 and 9.8% improvement of nRMSE, and

442

by 22.4, 25.0, 22.5 and 9.4% improvement of nMAE, respectively, demonstrating

443

the effectiveness of the proposed training strategy.

444

3.2. Numerical Results with Hourly Updates

445

Prediction errors can also be calculated for each look-ahead forecast distinctly

446

(i.e., each forecast hour). In some of the mainly academic literature, the average

447

of prediction errors for 1-hour ahead up to the last hour of the forecast horizon

448

is calculated. For instance, in 6-hour ahead prediction, each forecasting window

449

includes 1-hour to 6-hour ahead forecasts, and the error is the average of the pre-

450

diction errors of these 6 forecast values, as performed in subsection 3.1. How-

451

ever, in most commercial forecasting tools, the forecasts are updated after each

452

forecast interval, and forecast accuracy is evaluated for each look-ahead interval 25

453

individually. In other words, the values of wind power for the first 6 hours, i.e.,

454

W P (t + 1), ..., W P (t + 6), are predicted at time t. When the observed value of

455

wind power for the hour t + 1 becomes available, the data is updated and forecasts

456

for the next 6 hours, i.e., W P (t + 2), ..., W P (t + 7), are predicted at time t + 1.

457

In other words, the data is updated every hour, while the forecast horizon is

458

still 6-hour-ahead. In the following numerical experiment, the proposed method

459

is applied for aggregated wind power forecast of Alberta in 10 test weeks, in-

460

cluding the second weeks of March to December 2012, such that the prediction

461

accuracy of different look-ahead forecasts within the 6-hour window is evaluated.

462

Fig. 4 demonstrates the results of this numerical experiment including the curves

463

of average nRMSE for the six look-ahead forecasts, obtained from the proposed

464

approach. We also present the same measures of accuracy for the third-party fore-

465

casts currently employed by the Alberta Electric Systems Operator [2]. Observe

466

from this figure that forecast accuracy for both models decreases as the look ahead

467

forecast increases due to higher cumulative errors. For instance, the average fore-

468

cast error of the proposed method for 1-hour-ahead forecast, i.e. forecast values

469

generated 1 hour before the real time, is 5.04%, while for 6-hour-ahead, i.e. fore-

470

cast values generated 6 hours before the real time, is 15.94%. Compared to the

471

third-party forecasts, the proposed method results in better or comparable forecast

472

accuracy up to 3 hours ahead. For instance, the proposed model improves the ac-

473

curacy for 1-hour-ahead forecast by (7.23-5.04)/7.23=30.29%. Note that at any

474

settlement interval, the next three hours are particularly important to the system

475

operator in Alberta. This is because, given the real-time nature of the market, 26

16

14

nRMSE (%)

12

10

Third-party Forecasts

8

Proposed Method 6

4 1

2

3 4 Look-ahead Forecast (Hour)

5

6

Figure 4: Average nRMSE errors of two forecasting models in different look ahead forecasts

476

the next hour or two need to be monitored properly to ensure system security and

477

supply-demand balance. Strategic market participants also need to watch the next

478

three hours because it is just outside the T-2 Window. For the three longer look-

479

ahead windows, the third party forecasts outperforms the ones generated by our

480

model. This is mainly because the third party forecasts include NWP data. Thus,

481

a combination of the forecasts generated by our model for the first three hours and

482

the third-party forecasts for the last three hours would provide a more accurate

483

picture of future wind generation.

484

4. Conclusion

485

In this paper, a new wind power prediction strategy is proposed. A WNN

486

with multi-dimensional Morlet wavelet as the activation function of the hidden

487

neurons and MCC as the training criterion is applied as the forecasting engine

488

to implement the input/output mapping function of wind prediction process. A 27

489

new stochastic search technique, named ICSA, which is the improved version

490

of Clonal selection algorithm, is proposed and adapted as the training procedure

491

to optimize the free parameters of the forecasting engine. Effectiveness of the

492

whole proposed wind power forecasting strategy as well as the effectiveness of

493

its main components including the suggested WNN and training procedure is ex-

494

tensively evaluated by real-world data for wind power prediction. As regards the

495

training algorithm, the proposed ICSA outperforms the other stochastic search al-

496

gorithms including SA, PSO, DE and CSA in terms of both nRMSE and nMAE,

497

illustrating the effectiveness of the proposed training strategy. Moreover, the sug-

498

gested Morelet wavelet function results in more accurate wind power forecast than

499

Mexican-hat wavelet function, and MCC training criterion leads to lower wind

500

power prediction errors than the traditional training error measure of MSE.

501

References

502

[1] www.wwindea.org.

503

[2] www.aeso.ca.

504

[3] Y. Xu, Z.-Y. Dong, Z. Xu, K. Meng, K. P. Wong, An intelligent dynamic

505

security assessment framework for power systems with wind power, IEEE

506

Transactions on Industrial Informatics 8 (2012) 995–1003.

507

[4] P. Hu, R. Karki, R. Billinton, Reliability evaluation of generating systems

508

containing wind power and energy storage, Generation, Transmission Dis-

509

tribution, IET 3 (2009) 783–791. 28

510

[5] K. Rohrig, B. Lange, Improvement of the power system reliability by predic-

511

tion of wind power generation, Power Engineering Society General Meeting,

512

2007. IEEE (2007) 1–8.

513

[6] J. Cardell, L. Anderson, C. Y. Tee, The effect of wind and demand un-

514

certainty on electricity prices and system performance, Transmission and

515

Distribution Conference and Exposition, 2010 IEEE PES (2010) 1–4.

516

517

[7] M. Black, G. Strbac, Value of bulk energy storage for managing wind power fluctuations, IEEE Transactions on Energy Conversion 22 (2007) 197–205.

518

[8] www.nyiso.com.

519

[9] N. Chen, Z. Qian, I. Nabney, X. Meng, Wind power forecasts using gaussian

520

processes and numerical weather prediction, IEEE Transactions on Power

521

Systems (2013) 1–10.

522

[10] M. Khalid, A. Savkin, A method for short-term wind power prediction with

523

multiple observation points, IEEE Transactions on Power Systems 27 (2012)

524

579–586.

525

[11] I. J. Ramirez-Rosado, L. A. Fernandez-Jimenez, C. Monteiro, J. Sousa,

526

R. Bessa, Comparison of two new short-term wind-power forecasting sys-

527

tems, Renewable Energy 34 (2009) 1848 – 1854.

528

529

[12] R. G. Kavasseri, K. Seetharaman, Day-ahead wind speed forecasting using f-arima models, Renewable Energy 34 (2009) 1388 – 1393. 29

530

531

[13] T. Barbounis, J. Theocharis, Locally recurrent neural networks for long-term wind speed and power prediction, Neurocomputing 69 (2006) 466 – 496.

532

[14] S. Fan, J. Liao, R. Yokoyama, L. Chen, W. jen Lee, Forecasting the wind

533

generation using a two-stage network based on meteorological information,

534

IEEE Transactions on energy conversion 24 (2009) 474–482.

535

[15] G. Sideratos, N. Hatziargyriou, Using radial basis neural networks to esti-

536

mate wind power production, Power Engineering Society General Meeting,

537

2007. IEEE (2007) 1–7.

538

[16] N. Amjady, F. Keynia, H. Zareipour, A new hybrid iterative method for

539

short-term wind speed forecasting, European Transactions on Electrical

540

Power 21 (2011) 581–595.

541

[17] C. Potter, M. Negnevitsky, Very short-term wind forecasting for tasmanian

542

power generation, IEEE Transactions on Power Systems 21 (2006) 965–972.

543

[18] N. Amjady, F. Keynia, H. Zareipour, Wind power prediction by a new fore-

544

cast engine composed of modified hybrid neural network and enhanced parti-

545

cle swarm optimization, Sustainable Energy, IEEE Transactions on 2 (2011)

546

265–276.

547

[19] M. Milligan, M. Schwartz, Y. Wan, Statistical wind power forecasting mod-

548

els: results for u.s. wind farms, 2003. URL: http://www.nrel.gov/

549

docs/fy03osti/33956.pdf.

30

550

[20] K. Methaprayoon, C. Yingvivatanapong, W. jen Lee, J. Liao, An integration

551

of ann wind power estimation into unit commitment considering the fore-

552

casting uncertainty, IEEE Transactions on Industry Applications 43 (2007)

553

1441–1448.

554

[21] P. Johnson, M. Negnevitsky, K. Muttaqi, Short term wind power forecast-

555

ing using adaptive neuro-fuzzy inference systems, Australasian Universities

556

Power Engineering Conference, AUPEC 2007 (2007) 1–6.

557

[22] J. Catal˜ao, H. M. I. Pousinho, V. Mendes, Hybrid intelligent approach for

558

short-term wind power forecasting in portugal, Renewable Power Genera-

559

tion, IET 5 (2011) 251–257.

560

[23] R. R. B. de Aquino, M. M. S. Lira, J. de Oliveira, M. Carvalho, O. Neto,

561

G. de Almeida, Application of wavelet and neural network models for wind

562

speed and power generation forecasting in a brazilian experimental wind

563

park, Neural Networks, 2009. IJCNN 2009. International Joint Conference

564

on (2009) 172–178.

565

[24] P. Chen, H. Chen, R. Ye, Chaotic wind speed series forecasting based on

566

wavelet packet decomposition and support vector regression, IPEC, 2010

567

Conference Proceedings (2010) 256–261.

568

[25] D. Faria, R. Castro, C. Philippart, A. Gusmao, Wavelets pre-filtering in wind

569

speed prediction, Power Engineering, Energy and Electrical Drives, 2009.

570

POWERENG ’09. International Conference on (2009) 168–173. 31

571

[26] C. Monteiro, R. Bessa, V. Miranda, A. Botterud, J. Wang, G. Conzelmann,

572

Wind power forecasting: State-of-the-art 2009, ???? URL: http://www.

573

dis.anl.gov/pubs/65613.pdf.

574

[27] G. Nason, Wavelet methods in Statistics with R, Springer, 2008.

575

[28] L. J. Ricalde, G. Catzin, A. Y. Alanis, E. N. Sanchez, Higher order wavelet

576

neural networks with kalman learning for wind speed forecasting, Compu-

577

tational Intelligence Applications In Smart Grid (CIASG), 2011 IEEE Sym-

578

posium on (2011) 1–6.

579

[29] N. Pindoriya, S. Singh, S. Singh, An adaptive wavelet neural network-based

580

energy price forecasting in electricity markets, IEEE Transactions on Power

581

Systems 23 (2008) 1423–1432.

582

583

[30] L. Wu, M. Shahidehpour, A hybrid model for day-ahead price forecasting, IEEE Transactions on Power Systems 25 (2010) 1519–1530.

584

[31] L. de Castro, F. Von Zuben, Learning and optimization using the clonal se-

585

lection principle, IEEE Transactions on Evolutionary Computation 6 (2002)

586

239–251.

587

[32] Q. Wang, C. Wang, X. Gao, A hybrid optimization algorithm based on clonal

588

selection principle and particle swarm intelligence, Intelligent Systems De-

589

sign and Applications, 2006. ISDA ’06. Sixth International Conference on 2

590

(2006) 975–979.

32

591

[33] G. C. Liao, Application of an immune algorithm to the short-term unit com-

592

mitment problem in power system operation, Generation, Transmission and

593

Distribution, IEE Proceedings- 153 (2006) 309–320.

594

[34] A. Slowik, M. Bialko, Training of artificial neural networks using differen-

595

tial evolution algorithm, Human System Interactions, 2008 Conference on

596

(2008) 60–65.

597

[35] R. Bessa, V. Miranda, J. Gama, Entropy and correntropy against minimum

598

square error in offline and online three-day ahead wind power forecasting,

599

IEEE Transactions on Power Systems 24 (2009) 1657–1666.

600

[36] W. Liu, P. P. Pokharel, J. C. Principe, Correntropy: Properties and appli-

601

cations in correntropy: Properties and applications in non-gaussian signal

602

processing, IEEE Transactions on Signal Processing 55 (2007) 5286–5298.

33