Improving the calibration of the best member method using quantile ...

1 downloads 0 Views 1MB Size Report
ture temperatures using ensemble prediction systems (EPSs). However, the .... case of ensemble fore- casts, when the size of the sample is sufficiently large, the.
Natural Hazards and Earth System Sciences

Open Access

Atmospheric Chemistry and Physics

Open Access

Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013 www.nat-hazards-earth-syst-sci.net/13/1161/2013/ doi:10.5194/nhess-13-1161-2013 © Author(s) 2013. CC Attribution 3.0 License.

cess

Geosciences

G

Measurement Techniques

A. Gogonel1,2 , J. Collet1 , and A. Bar-Hen2 1 EDF

Biogeosciences

Correspondence to: J. Collet ([email protected])

Open Access

R&D Division, OSIRIS Department, 1, avenue du G´en´eral de Gaulle, 92141 Clamart cedex, France UFR de Math´ematiques et Informatique, Universit´e Paris Descartes, 45 rue des Saints-P`eres, 75270 Paris cedex 06, France

2 MAP5,

Open Access

Improving the calibration of the best member method using quantile Atmospheric regression to forecast extreme temperatures

Received: 10 July 2012 – Published in Nat. Hazards Earth Syst. Sci. Discuss.: – Revised: 13 March 2013 – Accepted: 9 April 2013 – Published: 3 May 2013

Open Access Open Access Open Access

The uncertainty of future temperatures is a major risk factor for an electric utility company such as Electricit´e de France (EDF). The demand for heating increases when the temperature is lower than 18 ◦ C, and the demand for cooling increases when the temperature exceeds 18 ◦ C. Moreover, high temperatures also create cooling problems for thermal plants.

Open Access

Introduction

To fulfill the risk management needs of the company, the of the Past ensemble prediction systems (EPSs) provided by weather forecasting institutes such as European Centre for MediumRange Weather Forecasts (ECMWF) (ECMWF, 2002, 2006) provide an indispensable source of information. Ensemble Earth System forecasting is a numerical prediction method that is used to generate a representative sample of the possible future Dynamics states of a dynamical system. Ensemble forecasting is a form of Monte Carlo analysis: multiple numerical predictions are computed using slightly different initial conditions, Geoscientific all of which are plausible given past and current observations. The available observations are combined to obtain esInstrumentation timates of the future temperatures as welland as their uncertainMethods ties (Whitaker and Loughe, 1998) and predictive densities. Data Systemsof future temHowever, the probabilistic representations peratures provided by EPSs suffer some lack of reliability, especially for extreme probabilities (for example 1 % quantile forecast is inaccurate), notGeoscientific only because the number of ensemble members is limited by computing resources, but also Model Development because EPS forecasts can be biased and typically do not display enough variability, thus leading to an underestimation of the uncertainty (Buizza et al., 2005). As a risk manager, and also because of its size and market power, Hydrology andEDF must use the most reliable information available (Diebold et al., 1998; System Gneiting et al., 2007), so Earth lack of reliability is prohibitive. EDF faces also a regulatory Sciences constraint: the French technical system operator imposes that the probability of employing exceptional means (e.g., load shedding) to meet the demand for electricity must be lower than 1 % for each week (RTE, 2004), so EDF has to manage carefully risk at Ocean Science this level. Now, in France, most of the demand variability Open Access

1

Climate

Open Access

Abstract. Temperature influences both the demand and supply of electricity and is therefore a potential cause of blackouts. Like any electricity provider, Electricit´e de France (EDF) has strong incentives to model the uncertainty in future temperatures using ensemble prediction systems (EPSs). However, the probabilistic representations of the future temperatures provided by EPSs are not reliable enough for electricity generation management. This lack of reliability becomes crucial for extreme temperatures, as these extreme temperatures can result in blackouts. A proven method to solve this problem is the best member method (BMM). This method improves the representation as a whole, but there is still room for improvement in the tails of the distribution. The idea of the BMM is to model the probability distribution of the difference between the forecast and realization. We improve the error modeling in BMM using quantile regression, which is more efficient than the usual two-stage ordinary least squares (OLS) regression. To achieve further improvement, the probability that a given forecast is the best one can be modeled using exogenous variables.

Solid Earth

Open Acces

Published by Copernicus Publications on behalf of the European Geosciences Union.

M

1162of Best Member Method A. using Gogonel et al.: Improving of Collet, best member method using quantile regression ng Calibration Quantile Regression calibration Gogonel, Bar-Hen: Improving Calibration of Best Member

mer: in reases, ability supply. quanat the n sumprocess but the

2.2 155

160

moothy proems in

165

mooth g from 2011)]. it exe nonGneite have hod, it 2004)], 04)] or 002a)]. t these shows e foreore, we et al. 120 esentadeling. of the uantile y least 125

BMM sion to MM. In mprove- 130

135

ember

ication 140 ssing » all posndently

Fig. 2. Correlation between ensemble prediction system (EPS) interquartile range and forecast error,Ensemble using EPS Prediction mean as a determinFig. 2. Correlation between System istic forecast. (EPS) interquartile range and forecast error, using EPS

mean as a deterministic forecast Fig. 1. Weather stations used to define global temperature for Fig. 1.names, Weather stations towith define globalproportional temperature France: locations and used circles surfaces to fro weight France: names, locations and circles with surfaces prothe of each station. portional to the weight of each station.

is a consequence of temperature variability, with opposite 2.1 The data effects in winter and summer: in winter demand increases when temperature decreases; the degrees) reverse isthat true we in summer. The temperatures (in Celsius analyze Furthermore, variability in demand largelyThe exceeds variare spatial and temporal averages. timetheextent ability the supply. So, day, whenthe the spatial supply–demand balance is of the inaverage is the extent is France: at 1 % quantile, the temperature is nearly, in but26 notcities exactly, wetheaverage the temperatures measured in at the 1 %the quantile in winter, and at theis99chosen % quantile in sumFrance, vector of 26 weights in order to mer. Needless to say, the EDF process takes estimate the electricity loaddecision-making in France [Dordonnat et al. account othermap quantiles and quantities, but the regulatory (2008)].ofThe in figure 1 shows the names, locaconstraint is the 1 % quantile. tions of the cities, as well as circles with surfaces proA more technical limitation of EPSs lack of smoothportional to the weight of each city.is aBecause we use ness: the predictive cumulative distribution they provide is aa temperatures measured at specific observation sites, step function, which can stage lead toisproblems in in generation manstatistical adaptation included the forecastagement tools.This statistical adaptation is performed by ing process. Many methodsThe haveforecast been developed a smooth Météo-France. that we to useobtain are EPSs proand unbiased representation of theinrisk arising fromensemble tempervided by ECMWF. As shown figure 2, the ature variations (Hagedorn, 2010; 2011). However, as spread is actually a proxy for Wilks, the forecast error. Furwe are interested in extremes, it excludes methods assumthermore, the skill itself is seasonal, as demonstrated in ing Gaussianity, such as the non-homogeneous Gaussian refigure 3. et al. (2005) and Unger al. gression developed in Gneiting The data consists of two different arrays. The etfirst (2009). We then have to use one of the ensemble “dressarray is 3-dimensional and contains the forecasts. ing” methods: Bayesian model averaging (Raftery et al., – The dates of the forecasts range from 2007-03-27 to145 2004), Bayesian processor of output (Krzysztofowicz, 2004) 2011-04-30 (for a total of 1473 or best member method (Roulston anddates). Smith, 2002a). The best member method is the simplest amongst these three – The and horizons of the forecasts provided ECMWF methods, Gogonel-Cucu et al. (2011a) showbythat it gives range from 1 to 14 days in the future. better results on ECMWF ensemble forecasts than Bayesian model averaging. Furthermore, weby suspect that using ranks,0150 – The member is identified a number between Fortin et al. (2006), could be very useful as proposed in and 50. For a given date, a member corresponds to to improve the representation a given initial state. of the tails, so it will be the basis Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013

yt ∼ Ft 170

of our modeling. In this paper, the key point is that the fitting of the dispersion of the “dressing” is improved when quantile regression is used in place of a two-stage ordinary least squares (OLS) regression. 175 In this paper, we first briefly review the use of the best member method (BMM) and then demonstrate the use of quantile regression to improve the error modeling required for the BMM. In the final section, we suggest several further improvements of the method. 180 2

An example application of the best member method

In this section, we briefly describe a simple application of the BMM to ECMWF forecasts, with a “dressing” depending on 185 the rank of the forecast. We study all possible horizons, with each horizon treated independently of the others. 2.1

On ca for exa is risk the qu we use et al. ( and ea from B the pr than t PIT is note yt sequen we hav

The s the re down t form d straigh functio We a (CRPS CRPS observ the siz CDF i definit the be tions.T the rel tem h potent forecas

The data

2.3 Fig. 3. Mean forecast error, using Ensemble Prediction The temperatures (in degrees Celsius) that we analyzeSysare 190 tem mean a deterministic forecast, for all of horizons and spatial and as temporal averages. The time extent the average seasons is the day, and the spatial extent is France: we average the

2.3.1

temperatures measured in 26 cities in France; the vector of 26 weights is chosen in order to estimate the electricity load in France (Dordonnat et al., 2008). The map in Fig. 1 shows The secondlocations array contains the as realizations of the the names, of the cities, well as circles withtemsurperature variable.to the This array has 1city. dimension correfaces proportional weight of each Because we use 195 sponding to measured the date, at with approximately extemperatures specific observation the sites,same a statistent the forecasts. tical as adaptation stage is included in the forecasting process. The is different from others, initial conThis member statistical0adaptation is performed by as M´eits t´eo-France. ditions are unperturbed. we consider the The forecasts that we use Despite are EPSs this, provided by ECMWF. member tothe be ensemble uninformative, the members As shownnumber in Fig. 2, spread and is actually a proxy 200 of are error. assumed to be exchangeable according to forthe the EPS forecast Furthermore, the skill itself is seasonal, the definition provided as demonstrated in Fig. 3.in Bernardo (1996) (in contrast to multi-model ensembles [Gneiting et al. (2005)]).

The B (2002a Bishop et al. ( of a m in this it is us The B ing th all cite and la stated

www.nat-hazards-earth-syst-sci.net/13/1161/2013/

Fig. 2. Correlation between Ensemble Prediction System (EPS) interquartile range and forecast error, using EPS 170 mean as a deterministic forecast

145

150

yt ∼ Ft ⇒ Ft (yt ) ∼ U [0,1].

The statistic Ft (yt ) is called the PIT. So, measuring the reliability of a regression sequence of density forecasts comes A. Gogonel et al.: Improving calibration of best member method using quantile 1163 down to compare the distribution of the PIT to the uniform distribution. This condition can be checked in a straightforward manner using the empirical distribution 175 function (EDF). yt ∼ Ft ⇒ Ft (yt ) ∼ U [0, 1]. We also use continuous ranked probability score The statistic Ft (ythis called theisPIT. measuring the t ) is criterion (CRPS), because wellSo,known. The reliability of a sequence of density forecasts comes down CRPS measures the difference between the forecast andto comparingCDFs. the distribution PIT to the uniformwhen distriobserved In the caseofofthe ensemble forecasts, bution. This condition can be checked in a straightforward 180 the size of the sample is sufficiently large, the forecasted manner using the empirical distribution function (EDF). CDF is approximated by the EDF of the ensemble. This We also use continuous ranked probability score (CRPS), definition has two consequences: The smaller the CRPS, because thisand criterion TheasCRPS measures the better, CRPS is haswell the known. same units the observathe difference between the forecast and observed cumulative tions.The CRPS can be decomposed in two components: distribution functions (CDFs). the casethe of ensemble fore185 the reliability component testsInwhether forecast syscasts, when the size of the sample is sufficiently large, the tem has the correct statistical properties, whereas the forecasted CDF is approximated by the EDF of the ensempotential CRPS measures the residual uncertainty of the ble. This definition has two consequences: The smaller the forecast. CRPS, the better, and CRPS has the same units as the obserFig. 3. Mean forecast error, using ensemble prediction system mean 2.3 The CRPS best can member method applied to vations.The be decomposed in two components: asFig. a deterministic forecast,error, for allusing horizons and seasons. 3. Mean forecast Ensemble Prediction Sys- 190 the reliability ECMWF data tests whether the forecast system component tem mean as a deterministic forecast, for all horizons and has the correct statistical properties, whereas the potential seasons 2.3.1 Principle CRPS measures the residual uncertainty of the forecast. The data consist of two different arrays. The first array is 3-dimensional and contains the forecasts. The wasmember first proposed Roulston and Smith 2.3 BMM The best method by applied to ECMWF data – The dates of the forecasts range from 27 March 2007 to (2002a) and subsequently improved by Wang and April 2011 a total the of 1473 dates). of the temThe30 second array(for contains realizations Bishop Fortin et al. (2006) and in Gogonel-Cucu 2.3.1 (2005), Principle perature variable. This array has 1 dimension corre- 195 et al. (2011b,a). The last three references use the rank – The horizons of the forecasts provided by ECMWF The wasinfirst by Roulston sponding to the date, with approximately the same exof a BMM member theproposed sorted ensemble. Asand weSmith state(2002a) later range from 1 to 14 days in the future. and subsequently improved by Wang and Bishop (2005), tent as the forecasts. in this section, this discrete variable is too detailed, so et al. to (2006) and in Gogonel-Cucu et al. (2011a,b). The The member 0 is isdifferent from as its initial0conitFortin is useful aggregate some of its values. – The member identified by others, a number between and last three use the rank of a member in thesosorted ditions are aunperturbed. Despite corresponds this, we consider the The BMMreferences is a statistical post-processing method, us50. For given date, a member to a given ensemble. As we state later in this section, this discrete varimember number to be uninformative, and the members 200 ing this method implies choosing a training period. In initial state. of the EPS are assumed to be exchangeable according to all works, and so in itour, the training periodsome is fixed, ablecited is too detailed, is useful to aggregate of its The arrayprovided contains in theBernardo realizations of the(in temperature the second definition (1996) contrast and large (some months). We used cross-validation, and values. variable. This array has 1 dimension corresponding to multi-model ensembles [Gneiting et al. (2005)]).to the stated there likely no-effect of the training period The that BMM is a isstatistical post-processing method, so usdate, with approximately the same extent as the forecasts. ing this method implies choosing a training period. In all The member 0 is different from others, as its initial condicited works, and in our work, the training period is fixed and tions are unperturbed. Despite this, we consider the member large (some months). We used cross-validation and stated number to be uninformative, and the members of the EPS that there is likely no effect of the training period on the perare assumed to be exchangeable according to the definition formances of the method. provided in Bernardo (1996) (in contrast to multi-model enFor each date and each horizon, the “best member” is the sembles; Gneiting et al., 2005). member closest to the realization (with the smallest absolute difference). The principle of the method is to model the prob2.2 The criteria ability distribution of the difference between the “best member” and the realization. The probabilistic forecast is then a On can find many verification criteria for forecasts (see for convolution of the error distribution and the discrete distribuexample Jolliffe and Stephenson (2012)). As our goal is risk tion of the EPS members. Because all members of the EPS management, we will focus on criteria measuring the quality are exchangeable, all of the error models and weights are the of uncertainty representation. That is why we use the probasame in the first step. An alternative idea, proposed by Fortin bility integral transform (PIT) (Diebold et al., 1998; Hamill et al. (2006), is to use the rank of a member in the sorted enand Thomas, 2001). For each forecast horizon h and each semble. In this case, we first rank the forecasts for each date, date, we have a set of temperature simulations from BMM, according to the forecast value. The error model and weights and a realization. The PIT is defined as the proportion of are then functions of the rank. In order to write down some the simulated values that are lower than the realization. If equations, we will use notations inspired from Fortin et al. the forecast is reliable, then the PIT is uniformly distributed (2006). For each date t ∈ [1, T ], we have the following: on [0, 1]. Formally, if we note y the sequence of realized t

temperatures, and Ft the sequence of density forecasts (if the Ft are continuous), we have www.nat-hazards-earth-syst-sci.net/13/1161/2013/

– (xt,(k) )k=1,··· ,51 the sorted forecasts (using usual notations of order statistics) and Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013

1164 4

205

210

215

220

225

230

A. Gogonel et Improving al.: Improving calibration of best member methodusing usingQuantile quantileRegression regression Gogonel, Collet, Bar-Hen: Calibration of Best Member Method

yt the realized temperature. on– the performances of the method. For each date and each horizon, « best |x member » is Then the best member is the x ∗ that the minimizes t,(k) − yt |: the member closest to the trealization (with the small∗ absolute difference). The principle of the method xest t = arg xmin |xt,(k) − yt |. t,(k) the probability distribution of the differis to model ence between theof«the bestbest member » is and the realization. Similarly, the rank member The probabilistic forecast is then a convolution of the ∗ distribution and the discrete distribution of the kerror t = arg min |xt,(k) − yt |, k EPS members. Because all members of the EPS are exchangeable, allthe of best the member error models and weights are the and the error of is same in the first step. ∗ alternative ∗ idea, proposed by Fortin et al. (2006), An t = yt − xt . is to use the rank of a member in the sorted ensemFor the weight assigned to the rank is for each ble.each In rank this k,case, we first rank the forecasts  date,according to the forecast value. The error model X  pand 1 kt∗are = kthen /Tfunctions . of the rank. In order k = weights to write down some equations, we will use notations inBecause we prefer parametric error], spired from Fortintoetuse al.a(2006). Forframework, each date tthe ∈ [1,T model consists of the following: we have: 1.– a(x model for the mean µt of the error of the bestusual member, forecasts (using not,(k) )k=1,···,51 the sorted tations of order statistics), 2. a model for the variance σt2 of the error of the best memandrealized temperature. – ber, y the t

3. a parametric distribution for  ∗ . Then the best member is the x∗t t that minimizes |xt,(k) − 250 yt |: the errors are unbounded, and approximately symmetSince ric, we will use a normal distribution. x∗tMany = argvariables min |xt,(k) − be yt | used to model the mean and varican xt,(k) ance of the error: similarly the rank of the best member is – variables summarizing features of the current EPS, such 255 the mean, square kt∗ =asargmin |xt,(k) − yt |of the mean (to take account for the k effect of extreme temperatures) and spread (here defined interquartile range) of the ensemble; and as thetheerror of the best member is ∗t –=variables yt − x∗t . to take account for a smooth influence of the date (the date t itself, cos(2πt/365) and sin(2πt/365)); 260 For each rank k, the weight assigned to the rank is: – the rank of the best member kt∗ , considered as a discrete X  1{kt∗ = k} /T. pk =variable.

235

240

245

Furthermore, we may assume that the various discrete and Because we prefer interact to use awith parametric framework, the continuous variables one another: for example, error model consists of: the mean temperature of the EPS may have a different slope 265 for each rank in the model. 1. a model for the mean µt of the error of the best An important point is that it is impossible to estimate all member, of the parameters simultaneously for all horizons as the variance the residuals is variance approximately times larger for hori2. of a model for the σt2 of3the error of the best zon 14member. than for horizon 1. All models are built separately for 270 each horizon. ∗ 3. a parametric distribution for t . 2.3.2 Rank aggregation Since the errors are unbounded, and approximately symIn this type modeling, the rank of the member is a useful metric, we of will use a normal distribution. variable but is toocan detailed, at to least in the of and ECMWF Many variables be used model thecase mean vari- 275 ensembles, shownincluding: in Fig. 4. ance of theaserror, Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013

Fig. 4. Best member error (t∗ ) standard deviation, with respect to best rank (kt∗ ), forerror 3 horizons: 3, 8 and 14 days. Fig.member 4. Best member (∗t ) standard deviation, w.r.t. best member rank (kt∗ ), for 3 horizons: 3, 8 and 14 days.

All of the members with central ranks have nearly the same standard deviation in their errors;ofhowever, the EPS:, errors – variables summarizing features the current of thesuch members with extreme ranks behave very differently. as the mean, square of the mean (to take acWe therefore propose to aggregate all of the central ranks.and We count for the effect of extreme temperatures) definespread a new variable edge that is equal to 0 when the rank is (here defined as the interquartile range) of central and equal to the rank otherwise. We must specify the the ensemble; definition of a “central rank”, which will be made thereafter. – variables to takevariable account for a method, smoothsuch influWe apply automated selection as ence of the date (the date t itself, cos(2πt/365) the “stepwise” selection method (SAS, 2006), which providesand somesin(2πt/365)); information on significant variables. So, for each possible definition of “central ranks”, itk ∗is, possible to select – the rank of the best member t considered as a the significant variables, estimate their coefficient, and finally discrete variable. compute Akaike information criterion (AIC) or adjusted R 2 . Furthermore, may the assume that the various discrete Following thesewe criteria, optimal choice is to consider the and continuous interact and withtheone another:ranks for ranks {1, 2, 50, 51}variables to be non-central remaining example, theTherefore, mean temperature of the EPS may have a to be central. the value set of the variable edge different forFinally, each rank in thethe model. is {0, 1, 2,slope 50, 51}. we obtain following informaAn important point is that it is impossible to estimate tion. all of the parameters simultaneously for all horizons as The rank of never in is theapproximately model; the relevant inthe– variance the appears residuals 3 times formation carried by the rank is provided by the varilarger for horizon 14 than for horizon 1. All models are edge. for each horizon. builtable separately – The mean temperature is significant in modeling the 2.3.2 Rank aggregation mean of the error. In–this of ismodeling, rank of the member is a Thetype spread significantthe in modeling the error variance, useful variable but is too detailed, at least in the case of as are the mean temperature and its square. ECMWF ensembles, as shown in figure 4. Nothing is gainedwith by taking account the date. All– of the members central ranksofhave nearly the same standard deviation in their errors; however, the The following models were finally selected: errors of the members with extreme ranks behave very differently. therefore propose to aggregate all of the t∗ =We a(edge t ) + b(edget ) · mt + ηt central ranks. We define a new )variable edge that is ∗ ∗ 2 (t − ˆt )t = α(edget ) + β(edge t · mt + equal to 0 when the rank is central and equal 2 γ (edget ) · mt + δ(edget ) · IQRt to + the ξt , rank otherwise. We must specify the definition of a « central rank », it will be made thereafter. where

www.nat-hazards-earth-syst-sci.net/13/1161/2013/

A. Gogonel et al.: Improving calibration of best member method using quantile regression – t∗ is the error of the best member, – mt (respectively IQRt ) is the mean (respectively interquartile range) of the ensemble forecast, – edget is equal to the rank kt∗ ∗ if kt∗ ∗ ∈ {1, 2, 50, 51}, to 0 elsewhere, and

Table 1. The continuous ranked probability score (CRPS) and its decomposition, for raw ensemble prediction system (column “Raw”) and usual best member method (column “BMM”), for all horizons (column “h.”). “BMM Score” is computed assuming that a perfect forecast has reliability component equal to 0. Temperature unit is degrees Celsius. Potential CRPS

– ηt and ηt are the disturbances. So, we may assign values to the parameters of the best member error model: µt = ˆt∗ \ σt2 = (t∗ − ˆt∗ )2 . To account for possible evolution in the data, we used cross-validation, with a partitioning into two equal parts: we first estimated the BMM model on the period from 27 March 2007 to 12 April 2009, and used it on the period from 13 April 2009 to 30 April 2011, and we exchanged the roles in a second stage. The BMM yields a substantial improvement in the reliability of the temperature density forecast, compared to raw EPS. In Fig. 5, we plot the difference between the EDF of the PITs and the function FU (x) = x for horizon 3 for three temperature density forecasts: raw EPS, EPS post-processed using BMM, and EPS post-processed using BMM, with error modeling using quantile regression (which will be described in Sect. 3). We see the improvement in the reliability using the BMM is substantial for this horizon. Note we must plot the difference between the PIT and its theoretical distribution to observe the improvement; otherwise, we would see only two very close lines. In order to compare raw EPS and BMM for all horizons, we need a numerical criterion: we will use continuous ranked probability score (CRPS) (Hersbach, 2000), and more specifically the reliability component of CRPS. We only performed statistical tuning, and we observe in Table 1 that the potential CRPS does not vary substantially. On the contrary, the reliability component is substantially decreased, except for horizon 5. Furthermore, this reliability improvement substantially affects some practically relevant quantities. For example, the variance of the temperature 1 day ahead can be used as a rough risk indicator. Using the raw EPS, we obtain (on average) a value of 0.2 for this indicator, while we obtain a value of 0.44 using the BMM. For horizon 14, the values are 2.6 and 2.9. Note that this example does not prove that BMM is better than raw EPS; this proof is provided by comparison of the reliability component of CRPS. What it proves is the usefulness of the reliability improvement provided by BMM. Despite the substantial reliability improvement using the BMM, the tails are still not well represented, as shown in Fig. 5. www.nat-hazards-earth-syst-sci.net/13/1161/2013/

1165

Reliability component

h.

Raw

BMM

Raw

BMM

BMM Score

1 2 3 4 5 6 7 8 9 10 11 12 13 14

0.24 0.31 0.38 0.49 0.63 0.78 0.96 1.10 1.24 1.35 1.43 1.49 1.54 1.56

0.25 0.32 0.39 0.50 0.64 0.80 0.98 1.12 1.26 1.38 1.47 1.52 1.56 1.59

2.6 × 10−2 1.6 × 10−2 6.9 × 10−3 4.7 × 10−3 3.5 × 10−3 3.5 × 10−3 4.6 × 10−3 5.4 × 10−3 4.7 × 10−3 4.8 × 10−3 4.0 × 10−3 4.2 × 10−3 4.2 × 10−3 3.6 × 10−3

4.4 × 10−3 4.3 × 10−3 3.9 × 10−3 3.2 × 10−3 4.0 × 10−3 2.2 × 10−3 2.2 × 10−3 2.4 × 10−3 2.4 × 10−3 2.7 × 10−3 2.3 × 10−3 2.2 × 10−3 3.0 × 10−3 3.1 × 10−3

83 % 73 % 43 % 32 % −16 % 39 % 51 % 56 % 48 % 44 % 41 % 47 % 29 % 15 %

3

Improving the tail representation using quantile regression

3.1

Criteria to measure tail representation improvement

As we aim to model the tails, it is important to account for the relative errors in the extreme quantiles. Indeed, if we estimate the blackout risk is 2 %, while it is actually 1 %, we would plan much more emergency power supply than necessary. Therefore, we should neither employ the CRPS measure nor consider the PIT graph globally. If accounting for the relative errors were the only issue, then we could employ measures based on likelihood ratio, e.g., the ignorance score (Roulston and Smith, 2002b) and its decompositions (Weijs et al., 2010; T¨odter, 2011), or even statistical tests of the goodness-of-fit (Jager and Wellner, 2005). However, in this study, we know the range of probabilities that are of interest: around the 1 % quantile and around the 99 % quantile. Therefore, to measure the dissimilarity between the PIT distribution and the uniform distribution, we use a χ 2 distance with the classes [0; 0.01], [0.01; 0.02] and [0.02; 0.05] for the lower tail and the symmetric classes for the upper tail. This strategy will be used to assess all of the improvements. More formally, we will compute

D=

 2 X (FˆPIT (Mi ) − FˆPIT (mi )) − (Mi − mi ) i

Mi − mi

,

Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013

Gogonel, Collet, Bar-Hen: Improving Calibration of Best Member Method using Quantile Regression 7 1166 A. Gogonel et al.: Improving calibration of best member method using quantile regression

Fig. 5. Comparison of probability integral transform deviations from theoretical distribution for horizon 3. The point labeled “1”, with coordinates (0.02, 0.095),ofshows that, for Integral raw ensemble prediction system (“raw EPS”), the realized temperature is lower3.than of the Fig. 5. Comparison Probability Transform deviations from theoretical distribution for horizon The2 % point simulated on 2 + 9.5 = 11.5 % (rather than 2shows %) of that the dates. For best memberPrediction method, denoted by (« “Usual best labeled «values 1 », with coordinates (0.02,0.095), for raw Ensemble System raw BMM” EPS »),(respectively the realized member methodiswith error modeling quantile regression, by “BMM with QR”), (respectively realized temperature lower than 2% of using the simulated values ondenoted 2+9.5=11.5% (rather than point 2%) labeled of the “2” dates. For Best“3”) Member Method, denoted « Usual » (resp.values Best on Member Method with temperature is lowerby than 2 % of BMM the simulated 2.9 % (respectively 2.2error %) ofmodeling the dates. using quantile regression, denoted by « BMM with QR »), point labeled « 2 » (resp. « 3 ») realized temperature is lower than 2% of the simulated values on 2.9% (resp. 2.2%) of the dates.

where

3.3

Implementation

– mi = (0; 0.01; 0.02; 0.95; 0.98; 0.99) is the sequence of440 are we modified theQ(2/3) BMM of improveWe under-dispersive, modeled the quantiles Q(1/3) and the error the classes lower bounds, and ment proposed (2006).edge All of the cenof best memberby t∗ ,Fortin using et theal. variable together with tral ranks behave similarly, therefore ensemble mean and spread. and For we bothcan quantiles, theaggrechosen – Mi = (0.01; 0.02; 0.05; 0.98; 0.99, 1) is the sequence of gate them, model is resulting in a far more parsimonious model. the classes upper bounds. This strategy is most likely applicable to other underQ(i/3)t =ensemble α(edgetforecasts ) + β(edge ) · mt + 445 dispersive as twell. 2 + δ(edge ) · IQR , 3.2 Grounds for using quantile regression γ (edget ) · m t Another improvement introduced int this paper is the use of quantile regression to determine the location and We are interested in the tail representation, so we have to whereof the error distribution. This method is widely scale focus on the error of the extreme members. The dispersion applicable, theoflocation and scale of a distri– i = 1, 2whenever is the index the tercile, of distribution of these errors is much larger (approximately 450 bution must be determined for simulation purposes. 5 times) than the errors of central members. So, a first pos– mt (respectively IQR ) istail therepresentations mean (respectively Moderate biases remain in tthe of theinsibility to improve the quality of tail representation is to imterquartile range) of the ensemble forecast, and temperature density forecasts. The following strategies prove the estimation of the dispersion. may provide further improvement∗in future work. The error distribution of the extreme members is highly – edget is equal to the rank kt ∗ if kt∗ ∗ ∈ {1, 2, 50, 51}, – The asymmetric, with some very large values. That is why using to 0 distribution elsewhere. of the errors of the extreme members is highly asymmetric, with some very large mean and standard deviation to locate and scale a normal dis-455 As we aim to errors usingthe theextreme normal distribuvalues, it simulate suggeststhe using using value tribution ground. We theoretical therefore propose to Fig. 6. has Plotnooftheoretical χ2 distances between and emtion, the normal parameters are those required to obtain the distribution to model it. model different quantiles (Q1/3Integral and Q2/3 ) of the error in piricaltwo distributions of Probability Transform, w.r.t. estimated third and first terciles. We have for Best Member Method (BMM) and BMM and with Basahorizon, single stage, using quantile regression (Koenker – In our modeling, the probability that a member of error1978), modeling quantile regression QR). Q(2/3) +Q(1/3)t sett, withusing the same regressors as (BMM in OLSwith regression. µt = a given t rank is the best member does not depend 2 Then, we choose the mean and variance of the normal dis-460 Q(2/3) t −Q(1/3) on any exogenous σt = 2×φ −1 (2/3) t , variables. Testing and possibly tribution such that its quantiles Q1/3 and Q2/3 are equal to rejecting this independence assumption may help the empirical key point isensemble that this estimation Because the quantiles. ECMWFThe temperature forecasts to improve tail estimation. where φ is thethe cumulative normal distribution function does not rely on an assumed distribution. and φ −1 (2/3) = 0.431. The rest of the model remains the same as in Sect. 2.3. Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013

www.nat-hazards-earth-syst-sci.net/13/1161/2013/

Method, denoted by « Usual BMM » (resp. Best Member Method with error modeling using quantile regression, denoted by « BMM with QR »), point labeled « 2 » (resp. « 3 ») realized temperature is lower than 2% of the simulated values on 2.9% (resp. 2.2%) of the dates.

A. Gogonel et al.: Improving calibration of best member method using quantile regression 440

445

450

1167

are under-dispersive, we modified the BMM improve– The distribution of the errors the extreme is ment proposed by Fortin et al.of(2006). All members of the cenhighly asymmetric, with some very large values, and it tral ranks behave similarly, and we can therefore aggresuggests using the extreme value distribution to model gate them, resulting in a far more parsimonious model. Thisit.strategy is most likely applicable to other underdispersive as well.that a member of a – In our ensemble modeling,forecasts the probability Another improvement introduced in not thisdepend paperon is any the given rank is the best member does use of quantile regression to determine the location and exogenous variables. Testing and possibly rejecting this scaleindependence of the errorassumption distribution. method is the widely may This help to improve tail applicable, whenever the location and scale of a distriestimation. bution must be determined for simulation purposes. Moderate biases remain in the tail representations of the temperature density forecasts. The following strategies Acknowledgements. Authors are pleased in to future thank the reviewers may provide further improvement work. for their comments and suggestions, which have been essential for – The the distribution the errors of the extreme memimproving manuscript, of its clarity and readability.

bers is highly asymmetric, with some very large Fig. 6. Plot of χ 2 distances between theoretical and empirical distri- 455 Edited by: A. Mugnai 2 values, it suggests using using the extreme value butions Probability Transform, respect toand horizon, Fig. 6.of Plot of χ Integral distances betweenwith theoretical emReviewed by: three anonymous distribution to modelreferees it. for Best Member Method (BMM) and BMM with error modeling pirical distributions of Probability Integral Transform, w.r.t. using quantile regression (BMM with QR). horizon, for Best Member Method (BMM) and BMM with – In our modeling, the probability that a member of error modeling using quantile regression (BMM with QR).

References a given rank is the best member does not depend on any exogenous variables. Testing and possibly Bernardo, J.-M.: The concept of exchangeability and its applicarejecting this independence assumption may help tions, Far East J. Math. Sci., Special Volume, 111–121, 1996. to R., improve theG., tailToth, estimation. Buizza, Pellerin, Z., Zhu, Y., and Wei, M.: A

The results are as follows. First, we consider the third PIT 460 plot of Fig. 5: this third plot is a bit closer to the horizontal Because thesecond ECMWF ensemble forecasts axis than the one, temperature which represents the usual BMM. We also need to compare BMM and BMM with error modComparison of the ECMWF, MSC, and NCEP Global Ensemeling using quantile regression for all horizons, so we use ble Prediction Systems, Mon. Weather Rev., 133, 1076–1097, doi:10.1175/MWR2905.1, 2005. χ 2 distances, plotted in Fig. 6. The improvement is substanDiebold, F. X., Gunther, T. A., and Tay, A. S.: Evaluating Density tial, and there is a slight degradation of the forecast for only Forecasts with Applications to Financial Risk Management, Int. 3 of the horizons (out of 14), implying that the tails are repEconomic Rev., 39, 863–883, 1998. resented much more accurately. 4

Conclusions

In this study, we used ECMWF temperature ensemble forecasts in order to improve the tails of temperature density forecasts, which are an important input in electrical blackout risk management. Any improvement enables a more accurate estimate of the required power supplies, resulting in substantial cost reductions. Because the ECMWF temperature ensemble forecasts are under-dispersive, we modified the BMM improvement proposed by Fortin et al. (2006). All of the central ranks behave similarly, and we can therefore aggregate them, resulting in a far more parsimonious model. This strategy is most likely applicable to other under-dispersive ensemble forecasts as well. Another improvement introduced in this paper is the use of quantile regression to determine the location and scale of the error distribution. This method is widely applicable, whenever the location and scale of a distribution must be determined for simulation purposes. Moderate biases remain in the tail representations of the temperature density forecasts. The following strategies may provide further improvement in future work.

www.nat-hazards-earth-syst-sci.net/13/1161/2013/

Dordonnat, V., Koopman, S.-J., Ooms, M., Dessertaine, A., and Collet, J.: An hourly periodic state space model for modelling French national electricity load, Int. J. Forecast., 24, 566–587, doi:10.1016/j.ijforecast.2008.08.010, 2008. ECMWF: The Ensemble Prediction System, Tech. rep., ECMWF, 2002. ECMWF: Part V: The Ensemble Prediction Systems, Tech. rep., ECMWF, 2006. Fortin, V., Favre, A.-C., and Sa¨ıd, M.: Probabilistic forecasting from ensemble prediction systems: Improving upon the bestmember method by using a different weight and dressing kernel for each member, Q. J. Roy. Meteorol. Soc., 132, 1349–1369, doi:10.1256/qj.05.167, 2006. Gneiting, T., Raftery, A. E., Westveld III, A. H., and Goldman, T.: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation, Mon. Weather Rev., 133, 1098–1118, doi:10.1175/MWR2904.1, 2005. Gneiting, T., Balabdaoui, F., and Raftery, A. E.: Probabilistic forecasts, calibration and sharpness, J. Roy. Stat. Soc. Ser. B, 69, 243–268, doi:10.1111/j.1467-9868.2007.00587.x, 2007. Gogonel-Cucu, A., Collet, J., and Bar-Hen, A.: Implementation of two Statistic Methods of Ensemble Prediction Systems for Electric System Management, Case Studies in Business, Industry and Government Statistics (CSBIGS), accepted, 2011a. Gogonel-Cucu, A., Collet, J., and Bar-Hen, A.: Implementation of Ensemble Prediction Systems Post-Processing Methods, for Electric System Management, in: 58th Congress of the International Statistical Institute, 2011b.

Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013

1168

A. Gogonel et al.: Improving calibration of best member method using quantile regression

Hagedorn, R.: Post-Processing of EPS Forecasts, available at: http://www.ecmwf.int/newsevents/training/ meteorological presentations/pdf/PR/Calibration.pdf (last access: 26 April 2013), 2010. Hamill, T. M. and Thomas, M.: Interpretation of Rank Histograms for Verifying Ensemble Forecasts, 129, 550–560, doi:10.1175/1520-0493(2001)1292.0.CO;2, 2001. Hersbach, H.: Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems, Weather Forecast., 15, 559–570, doi:10.1175/15200434(2000)0152.0.CO;2, 2000. Jager, L. and Wellner, J. A.: A new goodness-of-fit test: the reversed Berk-Jones statistic, Tech. Rep. TR443, Department of Statistics, University of Washington, available at: http://www.stat. washington.edu/www/research/reports/2004/tr443.pdf (last access: 26 April 2013), 2005. Jolliffe, I. T. and Stephenson, D. B.: Forecast Verification: A Practitioner’s Guide in Atmospheric Science, Wiley, 2012. Koenker, R. and Bassett, G.: Regression Quantiles, Econometrica, 46, 33–50, 1978. Krzysztofowicz, R.: Bayesian Processor of Output: a new technique for probabilistic weather forecasting, in: 17th Conference on Probability and Statistics in the Atmospheric Sciences, available at: https://ams.confex.com/ams/pdfpapers/69608.pdf (last access: 26 April 2013), 2004. Raftery, A. E., Gneiting, T., Balabdaoui, F., and Polakowski, M.: Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Mon. Weather Rev., 133, 1155–1174, doi:10.1175/MWR2906.1, 2004. Roulston, M. and Smith, L.: Combining dynamical and statistical ensembles, Tellus, 55A, 16–30, doi:10.1034/j.16000870.2003.201378.x, 2002a.

Nat. Hazards Earth Syst. Sci., 13, 1161–1168, 2013

Roulston, M. and Smith, L.: Evaluating probabilistic forecasts using information theory, Mon. Weather Rev., 130, 1653–1660, doi:10.1175/1520-0493(2002)1302.0.CO;2, 2002b. RTE: M´emento de la sˆuret´e du syst`eme e´ lectrique, e´ dition ´ 2004, Tech. rep., R´eseau de Transport d’Electricit´ e, available at: http://www.rte-france.com/uploads/media/pdf zip/ publications-annuelles/memento surete 2004 complet .pdf (last access: 26 April 2013), 2004. SAS: The GLMSelect Procedure: Stepwise Selection(STEPWISE), available at: http://support.sas.com/ documentation/cdl/en/statug/63347/HTML/default/viewer.htm# statug glmselect a0000000241.htm (last access: 26 April 2013), 2006. T¨odter, J.: New Aspects of Information Theory in Probabilistic Forecast Verification, Ph.D. thesis, MS thesis, Institute of Theoretical Physics, Goethe-University Frankfurt, 138 pp., 2011. Unger, D., van den Dool, H., O’Lenic, E., and Collins, D.: Ensemble regression, Mon. Weather Rev., 137, 2365–2379, doi:10.1175/2008MWR2605.1, 2009. Wang, X. and Bishop, C.: Improvement of ensemble reliability with a new dressing kernel., Q. J. R. Meteorol. Soc., 131, 965–986, 2005. Weijs, S., Van Nooijen, R., and Van De Giesen, N.: KullbackLeibler divergence as a forecast skill score with classic reliability-resolution-uncertainty decomposition, Mon. Weather Rev., 138, 3387–3399, doi:10.1175/2010MWR3229.1, 2010. Whitaker, J. and Loughe, A.: The Relationship between Ensemble Spread and Ensemble Mean Skill, Mon. doi:10.1175/1520Weather Rev., 126, 3292–3302, 0493(1998)1262.0.CO;2, 1998. Wilks, D. S.: Statistical Methods in the Atmospheric Sciences, Academic Press, Elsevier, 2nd Edn., 2011.

www.nat-hazards-earth-syst-sci.net/13/1161/2013/

Suggest Documents