Regional Wind Power Forecasting Based on Smoothing ... - IEEE Xplore

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON POWER SYSTEMS

1

Regional Wind Power Forecasting Based on Smoothing Techniques, With Application to the Spanish Peninsular System Miguel G. Lobo and Ismael Sánchez

Abstract—Efficient regional forecasting is a critical task for system operators and utilities that manage the generation of various wind farms spread over a region. This paper proposes an aggregate prediction method based on the search for similarities between the current wind speed predictions in a set of locations, and historical wind speed predictions. The aggregate power prediction is constructed from the measures of aggregate power generated during moments from a historical data set in which the wind speed predictions were similar to the current ones, using smoothing techniques. The methodology is applied to the hourly wind power forecast for the Spanish peninsular system, and compared with the predictions obtained with SIPREOLICO, the wind power prediction tool used by the Spanish system operator, and also with the aggregate predictions provided by another forecasting agency. The proposed methodology shows considerably smaller prediction errors than the competitors. Index Terms—Aggregate wind generation, local models, regional forecasting, smoothing methods, wind power prediction.

I. INTRODUCTION

I

N order to combat climate change and energy dependence on fossil fuels, new regulatory frameworks have been established which encourage renewable energy use. Among the renewable sources of energy, wind energy has experienced the highest growth because of being a mature and competitive technology. Notwithstanding, a drawback of wind energy is that it cannot be dispatched. Given the stochastic nature of wind, the integration of wind generation into the electrical system causes numerous difficulties, and predictions must be used to minimize them. At the wind farm level, wind power forecasting has become necessary to participate in energy markets or to schedule maintenance work. In addition, there may be special interest in knowing the aggregate production of a set of wind farms: Market agents would be interested in aggregate generation in order to adopt market strategies that maximize joint profits [1].

Manuscript received July 20, 2011; revised August 09, 2011, November 30, 2011, and February 21, 2012; accepted February 21, 2012. The work of I. Sanchez was supported in part by the European Commission under the SafeWind project (ENK7-CT2008-213740) and the Anemos.plus project (ENK6-CT2006-038692) as well as MICINN grant SEJ2007-64500. Paper no. TPWRS-00680-2011. M. G. Lobo is with the Department of Electrical Engineering, Universidad Carlos III de Madrid, Madrid, Spain (e-mail: [email protected]). I. Sánchez is with the Department of Statistics, Universidad Carlos III de Madrid, Madrid, Spain (e-mail: [email protected]). Digital Object Identifier 10.1109/TPWRS.2012.2189418

At a higher aggregation level, the transmission system operator (TSO) needs to predict the aggregate energy that will be generated in a region in order to plan the generation of the conventional units and to estimate reserves [2]. The costs of these reserves will depend on the accuracy of the wind power forecasts [3]. During recent years, tremendous effort have been made to develop tools focused on forecasting wind power generation [4]–[6], most of them at the single wind farm level. However, few references are found related to the prediction of wind power at an aggregate level, also known as regional forecasting [7]. There are several regional forecasting methods proposed in the literature. In all of the cases, the total wind power in the region is calculated by extrapolation or up-scaling from the wind power predictions of some reference wind farms [8]. Up-scaling is especially motivated when online measurements of all wind turbines or wind farms are not available. Some of these methods divide the region into sub-areas for which the prediction is calculated. Then, the total wind power prediction in the region is calculated by adding the prediction of the sub-areas. In [9] the region is divided into sub-areas where some reference wind farms are selected, the wind power prediction is calculated for the reference wind farms and then the prediction is calculated for the sub-areas by up-scaling. In a similar way, in [10] the region is also divided into sub-areas with reference wind farms, but in this case a parallel branch of the model calculates the wind power prediction for the sub-areas directly using averaged wind speeds in the sub-areas. The method proposed in [11] shares the idea of selecting some reference wind farms, and a fuzzy neural network is used to relate the wind power at the selected wind farms with the total wind power in the region. In [12] a principal component analysis is used to estimate the total wind power, obtaining results similar to other state-of-the-art prediction tools. One advantage of these aggregate forecasting models is that they do not need a great amount of information, since only some wind farms are used to predict the production of the whole region. This idea is based on the fact that generally there is not enough data available online for all the wind farms, and even when it is available, the computational load to calculate forecasts for all of them can be very high. In addition to this, the variability of the generated power with respect to the installed capacity is much lower at an aggregate level than at the single wind farm level. Since the correlation between prediction errors of distant wind farms is usually quite weak, the relative uncertainty of the regional prediction will be

0885-8950/$31.00 © 2012 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON POWER SYSTEMS

below the uncertainty level of single wind farms. The possibility of reducing the prediction error using an aggregate model is studied in [13]–[16]. It is also shown that predicting for only a small number of the total number of wind farms in the region is enough to estimate the total wind power in the region with a reduced error. In this regard, [17] studied the influence of grouping wind farms in order to minimize the prediction errors, and concluded that there is a limit on the reduction of errors by wind farm aggregation. The rest of the paper is structured as follows. Section II introduces the basics of smoothing by local models. Section III explains the classic Kernel smoothing functions and the difficulties when the problem has a high dimension. Section IV proposes alternative smoothing methods that separate the problem in measures of distance, selection and weighting. Section V includes an evaluation of the proposed alternatives and a comparison with other competing models. Section VI concludes.

II. SMOOTHING BY LOCAL MODELS Let be the total generation of wind power in the region during the hour , and let be the prediction of , for the time horizon given the information available at hour . Let be the vector containing the module of the wind speed forecasts for the hour of interest in , geographical coordinates selected over the region. This paper proposes a method to predict directly from the wind speed prediction vector , without the need to compute the predictions for all the wind farms or to predict from up-scaling from a set of reference wind farms. The only information required for the proposed method is a historical set of data pairs , with the vector containing the module of the wind speed forecasts for the selected coordinates, and the corresponding normalized per-unit (p.u.) wind power generation. Given a meteorological model, the historical vector is given by the shortest lead time weather prediction for that moment . This set of historical data gives us a kind of weather map for every moment of the past, and the total wind power registered in the region under each of those weather maps. The main idea of the proposed method is to find similarities between the predicted weather map, given by , and the weather maps recorded in the historical dataset. With this information, the wind power prediction is computed from the historical measures , weighting them as a function of the similarity between and . Once the measured power is available, the pair will become part of the historical dataset. Given the persistent nature of wind, a quick updating of the historical data will improve short term forecasting. This procedure is, therefore, most efficient if total wind power production is available online, as is the case of the Spanish TSO. In a general setting where too many coordinates were available, a subset of them could be selected by exploiting their spatial correlation structure. The impact of alternative coordinate selection procedures is an interesting issue that is currently being researched.

Given the complex relationship between the predicted wind speed in the selected coordinates and the total production of wind energy, it becomes very difficult to estimate a global parametric model to make predictions. Therefore, the proposed prediction algorithm is based on nonparametric methods for local modeling with memory-based learning [18], [19]. These techniques try to fit a simple local model to the historical data surrounding the new target point, which in our case is the vector , every time there is a new target point. There are alternative methods to make local models. An easy example is the weighted average, also known in the literature as Kernel Regression [20]–[22], where a weighted average of the historical observations in the surroundings of the new target point is fitted. A weighted average for regional wind power forecasting can be written as (1) show the weights that are The smoothing coefficients given to each of the historical observations of aggregate wind power . These coefficients depend on the similarity between the new wind speed forecasting vector and each of the wind speed vectors corresponding to the historical dataset. In this study we evaluate various alternatives to calculate these coefficients, and determine those that best suit the multivariate problem of regional wind power forecasting. III. KERNEL SMOOTHING FUNCTIONS The smoothing coefficients in (1) can be determined using classic univariate Kernel functions. Kernel smoothing starts with a selection window around the target point . The selection window in each dimension is , where is known as bandwidth, and are the dimensions of the problem, which in our case are the selected geographical coordinates. Given a bandwidth , the relative distance of the historical wind speed data in each dimension is given by (2) The Kernel function is chosen so that the largest weights are given to the nearest observations, and it decays as the distance increases. Several alternatives of Kernel functions can be found in the literature [23]. For example, the triangular Kernel function is (3) When the problem is multivariate, the Kernel function is . A multivariate structure can be, however, very complex. For this reason, a multivariate Kernel is usually built in a multiplicative way from univariate Kernels [23], and the weighting coefficients are calculated as (4)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LOBO AND SÁNCHEZ: REGIONAL WIND POWER FORECASTING BASED ON SMOOTHING TECHNIQUES

3

This multiplicative structure usually fails when the dimension is very high. This can be the case of regional forecasting with a high number of coordinates. Since the forecasting problem to be solved is highly complex, multivariate and nonlinear, it is of great interest to propose a different weighting method that adapts better to our problem. IV. PROPOSED SMOOTHING METHODOLOGY As an alternative to classic Kernel smoothing, we propose the use of a smoothing method where the three problems to solve in the smoothing are addressed independently: 1) the way in which the distances between wind speed vectors are measured; 2) the bandwidth that defines the selection window of data according to the distances; and 3) the functions that weight the data according to the distances and bandwidth. A. Measure of Distances The weighting coefficients in the local model (1) can be interpreted as a way to give more importance to the relevant data. The relevance is measured by the similarity between the new target point and the historical data. In our case it is quantified using a measure of distance between the new vector and each of the historical vectors . In the following, different ways to determine the similarity between vectors are proposed. 1) Difference in Average Wind Speed: It is defined by

Fig. 1. Examples of distribution of Euclidean distances between wind speed vectors depending on the expected power level.

3) Euclidean Distance: This distance penalizes large differences of wind speed prediction at any location. It is given by (7) 4) Cubic Speed Distance: It is known that the flow of energy or wind power that is available in the air moving through a wind turbine depends on the cube of the wind speed. Following this, a new distance that depends on the difference of cubic wind speed can be proposed as

(5) (8)

where is the average wind speed in the region in the historical instant , computed as the mean of the wind speeds in the coordinates, and is the average of the wind speed predictions in the region for the hour of interest . This distance is clearly inefficient, since it ignores local differences. It is included here for the sake of completeness and as a reference. 2) Absolute Distance: This distance takes into account the wind speed difference at each coordinate. It is defined by

5) Mahalanobis Distance: This distance is an extension of the Euclidean one, using a scale factor that takes into account the covariance of the wind speeds at the different coordinates of the region. By doing so, the contribution of highly correlated locations is reduced, in order to compensate for their redundant information content. The distance is given by

(6)

(9)

where is the wind speed forecast in the location in the historical instant , and is the current wind speed forecast for the location for the hour of interest . The coefficient is a scale factor applied to the th location. The scaling factors should be higher in those dimensions where more importance must be given to the differences between data, and should be lower in the dimensions where the variations are considered less important [18]. As scale factors we will use terms that compensate the different order of magnitude of wind speeds in each location. By doing so, we keep the distance (6) from being consistently dominated by a reduced set of locations. For example, the inverse of the average wind speed or the standard deviation of the wind speed in each coordinate could be used. These scale factors could also be used to weight the coordinates proportionally to the installed capacity in the influence area of each coordinate. This approach would require more information than the approach followed here, and could be an area for future research.

where is the covariance matrix computed with the historical set of wind speed vectors. B. Selection of Nearest Data When is received, the distance between this vector and each historical vector , is computed. As a result, a vector of distances with elements is obtained. The next step is to apply a criterion to select a number of the total historical vectors that are close enough to . Aggregate prediction (1) will be made using only these selected points. The number is determined by a bandwidth that acts as a threshold. Distances above this threshold lead to null coefficients . Fig. 1 shows an example of three histograms and scatterplots of the vector of distances (x-axis), for and the Euclidean distance, versus the corresponding vector of measured regional powers , (y-axis) for the dataset used below in Section V. The scatterplots correspond to moments



This alternative is more robust, since it avoids the problem caused by distribution of distances with a large positive skewness (see the histograms in Fig. 1). In these cases, the range of distances will have a large variability, which could be translated to the predictions. This criterion implicitly takes into account the shape of the distribution of distances and the concentration of data at low distances. Thus, the larger the skewness, the smaller the median distance, and the smaller the bandwidth , consistently with high concentration of nearby data. C. Weighting Functions Fig. 2. Example of selection of data at a distance below a fixed threshold

.

with different level of expected power . The second row of Fig. 1 shows the histograms of . This figure shows that the concentration of data close to zero distance is not constant, and depends on the expected power level. Consequently, the number of historical data with low enough values of distance is not constant. The range of the distances is also variable and dependent on the expected power level. The interest is then to adapt the size of the bandwidth to the distribution of the data, as seen in Fig. 1. In the following, different selection criteria are proposed. 1) Data at a Distance Below a Fixed Threshold Value: This criterion selects the data at distances lower than some fixed bandwidth value . This will result in a variable number of elements selected in each case, depending on the distribution of the data. Fig. 2 shows an example with . 2) Fixed Percentage of Nearest Data: This criterion is an extension of the classic K-nearest neighbors (K-NN) for the case of a dataset of increasing length. In our case the dataset grows because the pair is incorporated into the dataset as soon as is available. This criterion selects the data corresponding to the closest historical wind speed vectors, with as a percentage from the total historical data. In this case the size of the bandwidth is variable and depends on the concentration of nearby data in each case. The parameter to optimize is the percentage . 3) Data Below a Distance Computed from a Percentage of the Range of Distances: This criterion selects the elements that are below a threshold that takes into account the minimum and the maximum distances obtained at each moment. The higher the range of distances, the larger the threshold . This threshold is defined as

The aim of smoothing by local models is to fit a simple model for wind power forecasting in the vicinity of a target point defined by the new wind speed prediction vector, taking into account the past observations of wind power. The goal of the weighting function in the local model (1) is to assign weights or coefficients to the selected historical data according to their relevance, given by their distances. A weighting function must have a maximum value at zero distance, and its value must decay as the distance grows. In addition, the weighting function is only applied to the selected historical observations, since they are the only data that are taken into account. A common weighting function is obtained by raising the distance to a negative exponent as (12) is th historical data out of the sewhere lected, and is the distance between its wind vector and the new wind vector . This function tends to infinity if any of the data points is too close to the target point to predict, which forces the prediction to take the value of that data point. This exact interpolation is not desirable, especially if the stored data are noisy [18]. Alternatively, a function that limits the weight at zero distance can be used. One example is the negative exponential function, defined by (13) The magnitude of the exponent determines how local the weighting is, so that the larger the exponent , the more localized the model is. It is the parameter to be optimized, and can be optimized with the data used to train the prediction model. D. Weighting Functions With Adaptive Parameter

and are the minimum and the maximum, rewhere spectively, of the distances in and is a percentage of the range of distances. This parameter is the one to be optimized using, for instance, some training data. 4) Data Below a Distance Computed from a Percentage of the Range of Distances up to the Median: As an alternative to the previous criterion, the range of distances is chosen from to the median distance , instead of to ; that is,

The shape of the distribution of distances and the concentration of nearby historical data depends on how similar the new vector of predicted wind speed is to the historical vectors. It is possible to take into account this concentration of nearby data, so that the decaying parameter depends on the concentration. This data-driven parameter can be achieved by dividing at every moment the weighting parameter by a parameter that is representative of the center of the distribution of distances, and therefore representative of the concentration of nearby data. The new parameter would replace the parameter in the proposed weighting functions (12) and (13), leading to the following expressions:

(11)

(14)

(10)


TABLE I COMPARISON OF KERNEL FUNCTIONS

(15) The values of the average and the median of the distribution of distances depend on the shape of their distribution. They will take lower values when the concentration of nearby data is very high, and larger values with low concentration of nearby data. Therefore, both are good candidates for to make the weighting parameter more adaptive. This criterion achieves an adaptive weighting scheme that depends on the distribution of the data in every moment, but using a fixed weighting parameter that has to be optimized.

5

where

is the wind power predicted for horizon at time is the corresponding wind power measure, is the total wind power capacity in the region at time , and is the sample size. A. Evaluation of Smoothing Methods

E. Age Weighting Functions The relationship between the wind speed at the geographical coordinates of the region and the total wind power varies over time, either because of changes in the number of installed wind farms or their characteristics, or because of seasonal variations in the relationship between wind power and wind speed, e.g., due to changes in air density. Therefore, more importance must be given to recent observations than to old observations. To do this, an age-weighting function is used, which will decrease the weight of the observations as they become older. This age-weighting function is especially useful for very short-term predictions, i.e., a few hours ahead, since the recent measures are particularly relevant. The proposed weighting function is (16) where is the age-weighting coefficient, is the age in hours of the selected historical element , and , with , is the forgetting factor to be optimized. Finally, the weights to be used in the local model (1), with values different from zero only for the selected data, will be the product of the distance weights and age weights as (17) V. EVALUATION AND RESULTS In the following, we present an evaluation of the different approaches proposed for calculating the smoothing coefficients to be used with the local model (1). The evaluation will check the error of the hourly aggregate wind power forecasts obtained with these alternative approaches for horizons . Finally, the wind power prediction error obtained with the proposed regional forecasting model will be compared with the error of SIPREOLICO, the wind power prediction tool used by the Spanish TSO, as well as with another wind power forecasting model that provides aggregate predictions to the Spanish TSO. We will use two commonly used error measures to evaluate the predictions: the NMAE and the NRMSE, defined by (18)

(19)

The evaluation of the smoothing methods has been made with meteorological predictions and hourly wind power production for the total Spanish peninsular system using data from the year 2007. The average wind power production over the period, standardized by the installed capacity, was 0.241 p.u., with a standard deviation of 0.136 p.u. The wind data used are the hourly wind speed forecasts for 83 coordinates across the Spanish peninsula, which were selected near some wind farm sites. These coordinates are the ones used by the Spanish TSO for wind power prediction. They reflect the spatial distribution of the wind power capacity, with a higher concentration in the regions of Galicia, Navarra, Aragón, and Castilla-La Mancha. The average wind speed prediction in the coordinates was 4.07 m/s, with a standard deviation between coordinates of 0.93 m/s. The weather forecasts are updated every 12 h whereas the power measures are updated every hour. The model runs every time there is new data available. In our case, it runs every hour, with the arrival of a new wind power measure. As initial historical data, the first half of the year is used. The second half is used to estimate parameters such as bandwidths, forgetting factors, parameters and of the methods to select the nearest data, and parameter of the weighting function. These parameters are estimated for each horizon by searching for the values that minimize the NMAE or NRMSE. The alternative smoothing methods are also evaluated in this second half of the year. After each calculation, the measured power and the wind speed forecasts for that hour become part of the historical record of data pairs , so that the historical record of observations increases with time. In a real situation, some care should be taken to avoid a too large dataset (several years), which would require too many computer resources. Tables I–V show a summary of the NMAE and NRMSE obtained with the different smoothing alternatives. These error values are computed as the average for horizons between 8 and 48 h. Shorter prediction horizons are not considered because their error values are very small for any smoothing method, and hence they have a low discriminating ability to select the best smoothing method. More detailed results of this evaluation of smoothing methods applied to regional wind power forecasting can be found in [24]. Regarding Kernel smoothing, the multivariate Kernel (4) has been used in the computation of the weights in (1) using alternative Kernel functions [23]: triangular, Epanechnikov, quartic, cosine, and Gaussian. The Kernel function that provides the best results is the triangular one (Table I).



TABLE II COMPARISON OF MEASURES OF DISTANCE

Note: The selection method used to compare the measures of distance is a fixed percentage, and a weight of 1 is given to all the selected data.

USE OF SCALE FACTORS

TABLE III IN THE MEASURES OF DISTANCE Fig. 3. Comparison of smoothing methods for estimating the weighting coefficients using a triangular Kernel function and the proposed method.

Note: Results based on the absolute distance, the selection is a fixed percentage, and a weight of 1 to all the selected data. The column “Average” uses the inverse of the average wind speed of each coordinate. The column “Standard dev.” uses the inverse of the standard deviation of the wind speed. TABLE IV COMPARISON OF SELECTION METHODS

Fig. 4. Improvement of NMAE and NRMSE using age weighting functions. Note: The measure of distance used to compare the selection methods is the absolute distance, and a weight of 1 is given to all the selected data. TABLE V COMPARISON OF WEIGHTING FUNCTIONS

Note: The measure of distance used to compare the weighting functions is the absolute distance with scale factor and the selection is a fixed percentage of data. Alternatives for the correction of the parameter with the center of the distributions are shown.

Regarding the proposed method to calculate the weighting coefficients, which separates the processes of distance measurement, data selection and weighting function, it has been found that the measure of distance that provides the best results is the absolute distance (6), as shown in Table II. These results improve if scale factors are used in (6), as seen in Table III, both using the inverse of the average wind speed of each coordinate and also using the inverse of the standard deviation of the wind speed. Table IV shows that similar errors are obtained with two of the proposed selection methods: the selection of a fixed percentage of data, and the selection using expression (11). Given these similar results, and for simplicity, our model will use the selection of a fixed percentage of nearby data. Finally, Table V shows that the distance weighting function that gives the best results is the inverse distance weighting (14), when the parameter is corrected with the median of the distance distribution

in every moment. (See the footnote of each table for details on the computation.) The use of classic Kernel smoothing functions for this multivariate problem is compared with the proposed method of splitting the smoothing problem into measures of distance, selection, and weighting. The triangular Kernel function, which provided the best results, is compared with the use of our proposed method, which will use the absolute distance with the inverse of the average wind speed in each coordinate as scale factors, the selection method of a fixed percentage of nearby data, and the weighting function given by (14) with the median to correct the parameter . Fig. 3 represents the NRMSE for both weighting alternatives, using in each case the values of parameters that minimize the prediction error for each horizon. The prediction error decreases for all horizons using the proposed method. Regarding the influence of including a weighting by age, the weighting function (16) has been applied to the proposed smoothing method after adjusting the forgetting factor that minimizes the prediction error. Fig. 4 shows the improvement or reduction of error compared to the same model without using this age weighting. The results show that the error reduction is greater for short-term horizons, leading to a reduction of up to 6%. B. Comparison With Other Wind Power Prediction Models As a final evaluation, the proposed regional wind power prediction model is compared with the wind power prediction tool SIPREOLICO [25], [26]. This tool, used by the Spanish TSO, makes predictions for every wind farm in the Spanish peninsula, and then computes the aggregate prediction as the sum of


individual wind farms. The comparison also includes the aggregate wind power predictions that another agency provides to the Spanish system operator. This agency updates the wind power forecasts every 12 h. The proposed model uses the wind speed predictions provided by two independent meteorological agencies. The first agency updates the weather forecasts every 12 h, and the second agency updates the forecasts every 6 h. These two sets of wind speed predictions are the same predictions used by SIPREOLICO as inputs to calculate the wind power predictions for the wind farms. Since there are two wind speed forecasting agencies, and therefore two alternative wind power predictions, the final prediction is computed as a combination of both, following the same combination procedure used by SIPREOLICO, and described in [27]. The parameters of the proposed model have been estimated using data from the year 2007 and searching for the optimal values as mention above: using the first half of the year as initial dataset and the second half to find the values of the parameters that minimize the prediction errors of each horizon. The forecasts used in the comparison of the models have been calculated using data from the year 2008, which is outside the training period used to estimate the parameters of the model. The installed capacity by the beginning of year 2007 was approximately 12 GW, whereas by the end of year 2008 it increased up to 16 GW, which is an increase of more than 30%. This variation of the installed capacity shows the ability of the proposed procedure to adapt to changing conditions. In both periods, the distribution of the wind power production and that of the wind speed in the coordinates were similar, with small differences in some coordinates. This result corroborates the idea that one year of training data can be enough as a starting data set for regional forecasting. Fig. 5 shows the NRMSE of the prediction models. The prediction error of the final combination of the proposed model is significantly lower than the error obtained by SIPREOLICO (also based on forecast combination), and is also lower than the error obtained by the other agency that provides aggregate wind power predictions. The other agency updates the wind power forecasts every 12 h. This updating can explain its poor results for short prediction horizons, and illustrates the importance of using recent power measures to make efficient forecasts. The improvement or reduction of error of the proposed model with respect to the predictions provided by SIPREOLICO is between 4% to 8% depending on the prediction horizon. It is important to note that the proposed method is very competitive, despite the large dimension of the smoothing problem (83 coordinates). Regarding the computing time, the proposed model that calculates direct aggregate wind power predictions needs only 2 to 3 s to compute the predictions for the next 48 h when the training data set is of one year. However, SIPREOLICO employs nearly 20 min, since it must calculate the predictions for the more than 600 wind farms installed in the Spanish peninsula and then add them up to obtain the aggregate prediction.

7

Fig. 5. Comparison of NRMSE of aggregate wind power predictions during the year 2008 calculated with the proposed model, SIPREOLICO, and another wind power prediction agency.

agents who manage the generation of several wind farms. Furthermore, the reduction of error due to spatial smoothing effects makes the predictions at the aggregate level of a lower prediction error than at the wind farm level. This paper proposes a method to make direct aggregate wind power forecasts in a region, without using up-scaling or intermediate steps at the sub-area or wind farm level. This can be possible when the measure of total wind power is available, as is the case of the Spanish TSO. The method is based on nonparametric techniques for local modeling especially tailored for this problem, resulting in better performance than classic Kernel methods. Regional predictions are computed using a historical dataset of wind speed predictions at some geographical coordinates and the corresponding measures of total wind power. We have studied alternative procedures to obtain the smoothing coefficients for the local model, and have concluded that better results are obtained if we use a procedure that solves the three steps in the smoothing problem separately: the measures of distances, the selection of nearby data, and the weighting functions that best suit the complex problem of regional forecasting. Although the measure of distance is the key factor, an appropriate selection of nearby data and weighting functions provides a small but significant improvement that should be taken into consideration. We have also found that the use of a weighting function by the age of the observations can improve predictions, especially for short prediction horizons, where the aggregate wind power will be highly influenced by recent observations. As a final evaluation, we have calculated the aggregate wind power predictions for the Spanish peninsular system with the proposed model and compared them with the predictions obtained with the prediction tool SIPREOLICO, as well as by another aggregate prediction agency. The results show that the predictions of the proposed model are more accurate than those obtained with other alternative models. Furthermore, the computing time of the proposed model is much lower than the time used by a traditional forecasting tool such as SIPREOLICO, which must compute the prediction for every wind farm in the region and then add them up to obtain the aggregate prediction.

VI. CONCLUSION

ACKNOWLEDGMENT

The efficient prediction of wind power at a regional level is particularly important for the TSO, utilities and other market

The authors would like to thank the Spanish TSO, Red Eléctrica de España, which has contributed to the development of



this work by providing the data used in this and other related studies. The authors also would like to thank the referees and the editor for valuable comments. REFERENCES [1] A. Fabbri, T. GomezSanRoman, J. RivierAbbad, and V. H. MendezQuezada, “Assessment of the cost associated with wind generation prediction errors in a liberalized electricity market,” IEEE Trans. Power Syst., vol. 20, no. 3, pp. 1440–1446, Aug. 2005. [2] M. A. Ortega-Vazquez and D. S. Kirschen, “Estimating the spinning reserve requirements in systems with significant wind power generation penetration,” IEEE Trans. Power Syst., vol. 24, no. 1, pp. 114–124, Feb. 2009. [3] J. M. Morales, A. J. Conejo, and J. Perez-Ruiz, “Economic valuation of reserves in power systems with high penetration of wind power,” IEEE Trans. Power Syst., vol. 24, no. 2, pp. 900–910, May 2009. [4] G. Giebel, L. Landberg, G. Kariniotakis, and R. Brownsword, “State-of-the-art on methods and software tools for short-term prediction of wind energy production,” in Proc. Eur. Wind Energy Conf., Madrid, Spain, 2003. [5] A. Costa, A. Crespo, J. Navarro, G. Lizcano, H. Madsen, and E. Feitosa, “A review on the young history of the wind power short-term prediction,” Renew. Sustain. Energy Rev., vol. 12, no. 6, pp. 1725–1744, Aug. 2008. [6] A. M. Foley, P. G. Leahy, A. Marvuglia, and E. J. McKeogh, “Current methods and advances in forecasting of wind power generation,” Renew. Energy, to be published. [7] N. Siebert, “Development of methods for regional wind power forecasting,” Ph.D. dissertation, Ecole des Mines de Paris, Paris, France, 2008. [8] N. Siebert and G. Kariniotakis, “Reference wind farm selection for regional wind power prediction models,” in Proc. Eur. Wind Energy Conf., Athens, Greece, 2006. [9] U. Focken, M. Lange, and H. P. Waldl, “Previento-a wind power prediction system with an innovative upscaling algorithm,” in Proc. Eur. Wind Energy Conf., Copenhagen, Denmark, 2001, pp. 2–6. [10] T. S. Nielsen, H. Madsen, H. A. Nielsen, L. Landberg, and G. Giebel, “Prediction of regional wind power,” in Proc. Global Windpower Conf., Paris, France, 2002. [11] P. Pinson, N. Siebert, and G. Kariniotakis, “Forecasting of regional wind generation by a dynamic fuzzy-neural networks based upscaling approach,” in Proc. Eur. Wind Energy Conf., Madrid, Spain, 2003, pp. 16–19. [12] L. von Bremen, N. Saleck, and D. Heinemann, “Enhanced regional forecasting considering single wind farm distribution for upscaling,” J. Phys.: Conf. Series, vol. 75, 2007. [13] H. G. Beyer, H. Heinemann, K. Mellinghoff, K. Mönnich, and H.-P. Waldl, “Forecast of regional power output of wind turbines,” in Proc. Eur. Wind Energy Conf., Nice, France, 1999, pp. 1070–1073. [14] Y. Han and L. Chang, “A study of the reduction of the regional aggregated wind power forecast error by spatial smoothing effects in the Maritimes Canada,” in Proc. IEEE Electric Power and Energy Conf. (EPEC), 2010, pp. 1–6.

[15] U. Focken, M. Lange, and H. P. Waldl, “Reduction of wind power prediction error by spatial smoothing effects,” in Proc. Eur. Wind Energy Conf., Copenhagen, Denmark, 2001. [16] U. Focken, M. Lange, K. Mönnich, H. Waldl, H. G. Beyer, and A. Luig, “Short-term prediction of the aggregated power output of wind farms—A statistical analysis of the reduction of the prediction error by spatial smoothing effects,” J. Wind Eng. Ind. Aerodynam., vol. 90, no. 3, pp. 231–246, Mar. 2002. [17] I. Martí, M. Gastón, and L. Frías, “Exploring the limits of wind farm grouping for prediction error compensation,” in Proc. Eur. Wind Energy Conf., Athens, Greece, 2006. [18] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally weighted learning,” Artif. Intell. Rev., vol. 11, no. 1, pp. 11–73, Feb. 1997. [19] W. Cleveland and C. Loader, “Smoothing by local regression: Principles and methods,” in Statistical Theory and Computational Aspects of Smoothing. New York: Springer, 1996, pp. 10–49. [20] W. Härdle, Applied Nonparametric Regression. Cambridge, U.K.: Cambridge Univ. Press, 1992. [21] E. A. Nadaraya, “On estimating regression,” Theory Probabil. Appl., vol. 9, no. 1, pp. 141–142, 1964. [22] G. S. Watson, “Smooth regression analysis,” Sankhya: Indian J. Statist., Series A, vol. 26, no. 4, pp. 359–372, 1964. [23] W. K. Härdle, M. Müller, S. Sperlich, and A. Werwatz, Nonparametric and Semiparametric Models. New York: Springer, 2004. [24] M. G. Lobo, “Métodos de predicción de la generación agregada de energía eólica,” Ph.D. dissertation, Universidad Carlos III de Madrid, Madrid, Spain, 2010. [25] I. Sánchez, J. Usaola, J. O. Ravelo, C. Velasco, J. Domínguez, J. , M. G. Lobo, G. González, and F. Soto, “SIPREOLICO—A wind power prediction system based on a flexible combination of dynamic models. Application to the Spanish power system,” in Proc. World Wind Energy Conf. Exhib., Berlin, Germany, 2002. [26] I. Sánchez, “Short-term prediction of wind energy production,” Int. J. Forecast., vol. 22, no. 1, pp. 43–56, 2006. [27] I. Sánchez, “Adaptive combination of forecasts with application to wind energy,” Int. J. Forecast., vol. 24, no. 4, pp. 679–693, Oct. 2008. Miguel G. Lobo was born in Madrid, Spain, in 1977. He received the degree in industrial engineering and the Ph.D. degree from Universidad Carlos III de Madrid, Madrid, Spain, in 2002 and 2010, respectively. He has been a researcher and Assistant Professor in the Department of Electrical Engineering of Universidad Carlos III de Madrid, and is co-developer of the wind power prediction tool SIPREOLICO. His research interests include integration of renewable energy sources and wind power forecasting.

Ismael Sánchez received the degree in industrial engineering from the Universidad Politécnica de Madrid, Madrid, Spain, and the Ph.D. degree in industrial engineering from Universidad Carlos III de Madrid. He is an Associate Professor in the Department of Statistics of Universidad Carlos III de Madrid, and is co-developer of the wind power prediction tool SIPREOLICO. His main research areas are time series, forecasting, and nonlinear dynamic models.