Solar Power Probabilistic Forecasting by Using Multiple Linear Regression Analysis Mohamed Abuella, Student Member, IEEE
Badrul Chowdhury, Senior Member, IEEE
Department of Electrical and Computer Engineering University of North Carolina at Charlotte Charlotte, USA Email:
[email protected]
Department of Electrical and Computer Engineering University of North Carolina at Charlotte Charlotte, USA Email:
[email protected]
Abstract— Variable energy generation, particularly from renewable energy resources such as wind and solar energy plants have created operational challenges for the electric power grid because of the uncertainty involved in their output in the short term. Energy forecasting can be used to mitigate some of the challenges that arise from the uncertainty in the resource. W h i l e wind energy forecasting has already undergone extensive research efforts, solar power forecasting is only recently witnessing an increased amount of attention. This paper proposes a multiple linear regression analysis model to generate probabilistic forecasts of solar energy. Keywords—Probabilistic forecasting, multiple linear regression, solar energy forecasts.
I.
INTRODUCTION
In recent years, the rapid boost of variable energy generations particularly from wind and solar energy resources in the power grid has led to these generations becoming a noteworthy source of uncertainty, with load behavior still being the main source of variability. Generation and load balance is required in the economic scheduling of the generating units and in electricity market trades. When the penetration level of the variable generation is high, the intermittency of these resources impacts the operation of the electric grid. Thus, wherever the variable generation resources are used, the operating reserve and efficient storage systems are inevitably needed to keep the power balance of the system. The operating reserve which uses fossil fuel generating units should be kept as low as possible to get the highest benefit from the deployment of the variable generations [1]. Therefore, the forecast of these renewable resources becomes a vital tool in the operation of the power systems and electricity markets. A. Variable Generation Forecasting The practical implementation of variable generations forecasts started in the late 1990s, when rapidly increasing
levels of wind power penetration took place in some national electric grids in Europe, such as in Denmark and Germany [2]. Currently, the U.S. is witnessing continuous increase of renewable portfolios in some states, such as California and Texas. Thus, renewable energy forecast is critically needed since these accurate prediction tools would lead to scheduling and dispatching the conventional power plants more efficiently and improving the performance of the entire electrical system [3]. Updates in policies, such as FERC Order 764 [4], which requires ISOs/RTOs to offer intra-hourly scheduling, helps to grow the penetration level of renewable resources in addition to introducing more flexibility for the implementation of variable generation forecasts into power system operations electricity market trades [5]. The timeline and operating mechanism of variable generation forecasts vary between ISOs and electric utilities. Generally, the forecasting time frame varies according to the scheduling and market operating intervals. For instance, NYISO redispatches the entire system every five minutes, which reduces the variability of the wind resources from one operating interval to the next. While the day-ahead (DA) forecasting is used for reliability assessment and allows NYISO of making day-ahead unit commitment decisions. The real-time forecasts are used in NYISO’s real-time securityconstrained economic dispatch [6]. B. Variable Generation Forecasting Models The forecasting models are continuously being improved to generate more accurate forecasts of solar and wind power.
1) The physical approach describes the physical conditions, topography, power outputs of the
model: The physical approach relationships between weather solar irradiance, and the solar plant. The input data include
between these input data (NWP predictions and output power). For both the statistical and AI approach, long and high quality time series of weather predictions and power outputs from the past are of essential importance. 4) The combined approach: Modern practical renewable power forecasting models are usually a combination of physnumerical weather predictions (NWP), local ical and statistical models. Since physical approach need meteorological measurements as the sky imagers for the statistics to adjust for more accurate forecasts, while tracking the clouds movement, and SCADA (the user) the statistical approach for better forecasts need the physical data for the observed output The power, andperformance additional relations of output power production. optimal information about the characteristics of the nearby of the combined models is achieved by optimal shifting of terrain and topography of the site. The satellite weights between physical approach forecasts and the statistical systems[8],and forecasts [9]. sky imagers are used for tracking the clouds and forecast the solar irradiance up to 3 hours in advance, further ENERGY than that NWP is usually used to II. GLOBAL FORECASTING projectCOMPETETION the irradiance [7].2014 (GEFC OM 2014)
078.128 Total column liquid water of cloud (tclw) (kg/m2). 2 • 079.128 Total column ice water of cloud (tciw) - (kg/m ) • 134.128 surface pressure (SP) - (Pa). • 157.128 Relative humidity at 1000 mbar (r) - (%). • 164.128 total cloud cover (TCC) - (0-1) • 10 meter V wind component (V) - (m/s). • 165.128 10 metre U wind component (10u) - (m/s). • 2 meter temperature (2T◦) - (K) • 166.128 10 metre V wind component (10V) - (m/s). ◦ Surface solartemperature radiation (2T down (SSRD) - (J/m2) • • 167.128 2 metre ) - (K) accumulated • 169.128 surface field. solar rad down (SSRD) - (J/m2 ) field. radiation down (STRD) - (J/m2)• Accumulated Surface thermal accumulated • 175.128 surface field. thermal rad down (STRD) - (J/m2 )field. radiation (TSR) - (J/m2) -accumulated • Accumulated Top net solar 2 • 178.128 field top net solar rad (TSR) - (J/m ) -Accumulated field. • Total precipitation (TP) - (m) - accumulated field. • 228.128 total precipitation (TP) - (m) - Accumulated field. 2) The The Global statistical approach model: Describes isthetheconnecB. Evaluation Criteria Energy Forecasting Competition second between predicted and C. Evaluation Creteria of tion its time, since the firstsolar was irradiance in 2012. from It wasNWP for load A Pinball loss function is used to evaluate the accuracy of solar power production directly by statistical analysis of and wind point forecasting. While the current competition Pinprobabilistic ball loss function is used the linear accuracy of the forecasts. It istoaevaluate piecewise function time series from datatracks: in the Probabilistic past withoutElectric considering GEFCom2014 has four Load probabilistic forecasts. piecewisethe linear functionofis quantile often which is often usedIt’s to a evaluate accuracy the physics of the system. This Price connection can beProbaused used Forecasting, Probabilistic Electricity Forecasting, to evaluate forecasts [10]. the accuracy of quintile forecasts [11]. for forecasts in the future plant bilistic Wind Power Forecasting andoutcomes. Probabilistic Solar Power ! Forecasting to provide techniques energy q(Yˆ − Y ), if Y ≤ Yˆ 3) The learning model: state-of-the-art Artificial intelligence (AI)of methods ˆ Lq (Y , Y ) = (1) forecasting ˆ are used [10]. to learn the relation between predicted weather (1 − q)(Y − Y ), if Y > Yˆ . conditions and t h e power output generated as time Where Lq (Yˆ , Y ) is the loss function to the probabilistic A. series Probabilistc Forecasting Objective of theSolar past.Power Unlike statistical approaches, AI Where Lq(𝑌, Y) is the loss function to the probabilistic forecasts for each hour. Yˆ is the forecasted value at the certain methods use algorithms that are able to implicitly forecasts each hour. 𝑌 is the forecasted value at the certain In this track of competition, the conants are required to q quantile for of the probabilistic solar power forecasts and Y is describe nonlinear and highlyofcomplex q quantile of the probabilistic solar power forecasts and Y is submit the probabilistic forecasts the solarrelations power inbetween hourly the The quantile q isq also input data (NWP and outputIt’s power) instead of theobserved observedvalue valueofofthethesolar solarpower. power. The quantile is also steps through a monthpredictions of forecasts horizon. the track of the called the percentile and it has discrete values [0.01 - 0.99]. an explicit statistical analysis. For both the statistical and the percentile and it has discrete values [0.01 - 0.99]. 𝑌 competition where the proposed model has been implemented Yˆcalled andYYare arenormalized normalized values values of capacity high quality time series of weather and of the thenominal nominalpower power capacity. andAItheapproach, data come from. ˆ , Y ) for of each zone. The average of loss function L ( Y all q predictions and power outputs from the past are of The average of loss function Lq(𝑌, Y) for all forecasting hours forecasting hours should be minimized to yield more accurate should be minimized to yield more accurate forecasts. importance. B. essential Data Descreption forecasts. Therefore, the average loss function is adopted Therefore, the average loss function valuevalue is adopted as a 4) Each The combined approach: Modern practical renewable as a score to evaluate the model performance. score to evaluate the model performance. submitted task of probabilistic distribution (in quanpower forecasting models are usually tiles) of the solar power generation is fora 3combination solar systemsof III. SOLAR FORECASTING physical and statistical models. The physical approach III. SOLAR FORECAST MMODELING ODELING (zones). needs statistics to adjust for more accurate forecasts, while The basic skeleton of the methodology andand theoretical The basic skeleton of the methodology theoretical 1) The three solar systems are adjacent. farmsof t h eZones: statistical approach needs the physical These relations background of building the proposed solar power forecasting background of building the proposed solar power forecasting areoutput also called zones, they have a relatively low output power power production for better forecasts. The optimal model in this paper is adopted from reference [12], where capacities. Their of installation parameters are: is achieved by model in this paper is adopted from reference [11], where the performance the combined models the multiple linear regression(MLR) (MLR)analysis analysisisisdeployed deployedfor multiple linear regression R optimal shifting (595m) of weights between theSF160-24physical for short-term load forecasting. SAS Enterprise Guide⃝ • Zone1: Altitude Panel type(Solarfun is short-term load forecasting. A flowchart for the solar approach-based the statistical forecasts [8], used as the environment where the forecasting model is built 1M195) Panel forecasts number (8)and Nominal power (1560W) Panel forecasting model building steps is shown in Fig. 1. [9].Orientation (38◦ Clockwise from North) Panel Tilt (36◦ ). in. SAS is a statistical analysis software is suitable for the A. Data Preparation • Zone2: Altitude(602m) Panel type(Suntech STP190Sproposed statistical forecasting model. The flowchart of the II. PROBABILISTIC SOLAR FORECASTING OBJECTIVE 24/Ad+) Panel number (26) Nominal power (4940W) solar Itforecasting steps shown of in Fig. 1. is always amodel good building idea to get theisanalysis the historical ◦ The objective is to (327 determine probabilistic solarPanel forecasts Panel Orientation Clockwise from North) Tilt data before setting up the forecasting model. The available ◦ of probabilistic distribution (in quantiles) in hourly A. Data Preparation in the (35 form ). historical data contain the solar power and all 12 weather steps through month of forecast horizon. • Zone3: aAltitude(951m) Panel type(Suntech STP200variables. It always a good idea to get the analysis of the historical 18/ud) Panel number (20) Nominal power (4000W) Panel data before setting up the forecasting model. The available A. Data Description Orientation (31◦ Clockwise from North) Panel Tilt (21◦ ) historical data contains the solar power and all 12 weather The target variable is the solar power. There are 12 independent variables available from European Centre for Fig. 1. Flowchart diagram of the solar forecasting modeling Medium-Range Weather Forecasts (ECMWF) that were used to produce solar forecasts. These are:
• • • • • •
Total column liquid water of cloud (tclw) - (kg/m2). Total column ice water of cloud (tciw) - (kg/m2) Surface pressure (SP) - (Pa). Relative humidity at 1000 mbar (r) - (%). Total cloud cover (TCC) - (0-1) 10 meter U wind component (U) - (m/s).
•
The data preparation is an important step for treating the data to be ready for the analysis and modeling steps. The various steps of the data preparation are shown in Fig. 2. The box plot of the distribution of the historical data of solar power is shown in Fig. 3. Note, the order of months that appears in the box plot does not necessary mean the same order of months in the calendar year.
"*
nt nd
2. al wo ar he es
data (i.e.2012) power for of zone1. The year other two Since t is theirradiance time in hour and steps, power. Avg and Acc the scatter average plot is for zon not necessary mean of thesolar same order actual months. Theareleft zones are not different a lot since they are adjacent solar systems. Note, the order of months as it is provided from the competition benchmark data and appears in the box plot does not necessary mean the same order of actual year months.
and accumulated values of the data respectively. For comparison of the three zones, from Fig 5, zone3 P =0.91 Zone1 has a strongest positive linear relationship between the Zone2 solarP irradiance and power. The left scatter plot is for zone1. max
Zone1
Fig. 2. Flowchart diagram of data preparation Fig. 2. Flowchart diagram of data preparation Fig. 2. Flowchart diagram of data preparation
Pmax=0.91
Pmax=0.97
max=0.97
Zo
Zone3 Pmax=0.99
unstable duration of solar panel. By experiment, data cleansing is conducted which led to a tiny improvement in forecasts. However, this does not mean one should underestimate the data cleansing stage in data preparation before the modeling stage since sometimes, outliers could be generated from data entry issues. Fig. 5. Scatter plot of the solar irradiance and power for all 5. Scatter plot of the solar irradiance and power for all zones B.Fig.The Model Building Zone2
!
!
!
FromThe the scatter plots, the outliers general main steps of scatter buildingdon’t thechange forecasting model are change the From the plots, the the outliers don’t data trends.inThe majority of theofextreme points in the observed shown Fig 5. Analysis variance (ANOVA) is used for the data trends. Theperiods majority of the extreme points in the data occur atlinear the sunregression rise and setanalysis. which are the unstable multiple occurByatexperiment the sun rise andcleansing set periods which are the duration of data solar panel. the data is conducted and led to a tiny improvement in forecasts. This duration of solar panel. By experiment the data clea doesn’t mean to underestimate the data cleansing stage in data and since led to a tinytheimprovement in foreca preparation conducted before the modeling sometimes outliers could be generated from data entry issues. doesn’t mean to underestimate the data cleansing stag Fig. 3. Box plot of the distribution of observed solar power
Fig. 5. Flowchart diagram of the model building
preparation B. The Model Building
before the modeling since sometimes the could be generated from model data isentry The main steps of building the forecasting shownissues.
The scatter plot is also useful to get sense of the relation- in Fig Selection 6. ANOVAof is the of variance and as it’s independent available theanalysis predictor variables or Fig. 3. Box plot of the distribution of observed solar power ships between the predictor variables (the weather variables) as aexplanatory statistical tool in SAS where the multiple linear regression variable by plotting is cumbersome since there are B. The Model Building Fig. 3. Box plot of the distribution of observed solar power and response variable (the solar power). Fig. 4 presents the analysis running. variables to choose from. So the correlation twelveis predictor advantage of plotting the data since the sense scatterofplots the Theanalysis main issteps ofout building the forecasting model The scatter plot is also useful to get the for relationand sensitivity carried for the historical data to observedplot poweriswith to the asthe it isrelationTheships scatter alsorespect useful to solar get sense 56/.(-(/(-7'8% between the predictor variables (the irradiance weatherofvariables) investigate the most effective variables. Fig. 6 is the outcome in Fig,-.'/0% 6. ANOVA is the analysis of variance and it’s 94/33-:-;/(-*.8% called surface solar radiation down (SSRD) for zone1. ?*+'4%#6(;*&'3% !"#$!% correlation andalso response (the solar power).(the Fig. weather 4 presentsvariables) the of the It is obvious that the solar 120'33-*.% analysis. ships between thevariable predictor variables /.+%'4';(-*.% advantage of plotting the data in scatter plots for the observed irradiance, in addition to the time (hours), the surface % and response variable solar power). Fig. 4 presents the SSRD169.128 as it(the is provided from the NWP which is analysis power with respect to the solar irradiance and net istoprunning. solar irradiance variables and their 2 irradiance, also called surface in accumulated values (J/m while the plot on theplots right for the advantage of plotting data ),The since theplot scatter solar radiation downthe (SSRD). scatter on the left of second order polynomial or quadratic terms have the highest for the average values of solar irradiance SSRD (W/m2 ).2 (VAR178) and their second order or quadratic After the Fig. 6. Flowchart the modelpolynomial building Fig. 4power is for the SSRD which istoin the accumulated values (J/m ),as it correlation with solar diagram power.of The relative humidity has some observed with respect solar irradiance is The relationships between the variables on the right plot is terms have the highest correlation with solar 56/.(-(/(-7'8% power. The vestigated, th while the plot on the right is for the average values of SSRD impact on solar power since the water vapor particles work as more obvious and it can tell it is down a positive relationship The relative predictor variables selectionhas for,-.'/0% forecasting solar power efficient humidity (VAR157) some impact the on94/33-:-;/(-*.8% solar perf also called solar radiation (SSRD) for zone1. ?*+'4%#6(;* 2 surface !"#$!% increasing (W/m ). The relationships between the variables on the right smallas mirrors thereby the reflection level of the 120'33-*.% /.+%