Short Term Electricity Load Forecasting on Varying Levels of ... - arXiv

1

Short Term Electricity Load Forecasting on Varying Levels of Aggregation

arXiv:1404.0058v2 [stat.AP] 7 Apr 2014

Raffi Sevlian, Ram Rajagopal

Abstract—We propose a simple empirical scaling law that describes load forecasting accuracy at different levels of aggregation. The model is justified based on a simple decomposition of individual consumption patterns. We show that for different forecasting methods and horizons, aggregating more customers improves the relative forecasting performance up to specific point. Beyond this point, no more improvement in relative performance can be obtained.

I. I NTRODUCTION To meet the challenges posed by significant increase in renewable energy, novel architectures for organizing the transmission and distribution of power have been proposed. The emergence of scalable and low cost communication as well as new distribution system technologies enable the organization of microgrids at various scales. Typical designs include local generation such as solar and wind as well as various forms of local storage and demand response. Such systems aim to utilize resources internal to the microgrid to minimize external power draws. Optimal operations of these microgrids typically require some form of forecasting of aggregated load demands. The focus of this paper is load forecasting on these varying scales of aggregation. The field of load forecasting is very mature with numerous methodologies having been proposed throughout the years. They have focused primarily on the level of large substations servicing tens of megawatts of load or an entire power system which has a load of tens of gigawatts. Recently, with advances in communication infrastructure for remote measurement and automated metering, there is an abundance of new metering data from homes and commercial buildings. This has led to an increase in forecasting research at these levels of aggregation. Typical home loads are 1 to 2 kWh, while commercial buildings can be 100 times that amount. The relative forecasting errors typically seen at the level of substations and power systems has been quite low (1% − 2%). Also, recent work showing forecasting performance at the individual level show much higher errors (up to 30%). Clearly, there is a discrepancy in forecasting performance at the level of individuals and of the entire power grid. This is the effect of load aggregation on forecasting loads. Our work aims to quantify the effect of aggregation by proposing R. Sevlian is with the Department of Electrical Engineering and the Stanford Sustainable Systems Lab, Department of Civil and Environmental Engineering, Stanford University, CA, 94305. Email: [email protected]. R. Rajagopal is with the Stanford Sustainable Systems Lab, Department of Civil and Environmental Engineering, Stanford University, CA, 94305. R. Rajagopal is supported by the Powell Foundation Fellowship. Email: [email protected].

a set of scaling laws relating kWh load level and forecasting error. Using data from hundreds of thousands of residential customers and commercial buildings, we construct datasets of varying aggregation levels and verify the proposed model. The laws are obtained by assuming a simple underlying consumption model for each individual consisting and then decomposing the aggregate error into bias and variance terms. We observe that the law holds for different methods and can be utilized to compare methods across datasets. The paper is organized as follows. Section II develops a model for aggregation and proposes a set of scaling laws for two common forecasting metrics. The scaling laws are then verified in Sections III and IV including a review of the relevant literature. An analytical model for the effect of aggregation is presented in Section V. Section VI concludes the paper and discusses some future work. II. M ODELING OF L OAD AGGREGATION A. Qualitative Understanding of Aggregation in Electricity Load 1 user

5 users

20 users

40 users

60 users

80 users

Fig. 1. Hourly electricity consumption for various aggregation levels. Consumption pattern of a single customer generally has little structure to be exploited. Aggregating more and more customers “smoothes” the signal so that it can be more predictable. Aggregation level of 20 or more residential customers shows a predictable pattern. Plots are not in the same scale.

Aggregation reduces the inherent variability in electricity consumption resulting in increasingly smooth load shapes. Figure 1 illustrates this effect where it is clear that the higher

2

aggregation levels are easier to predict. Yet, it is less clear how to quantify the improvements in forecasting. The main goal of this paper is to develop an appropriate scaling model for forecasting performance with respect to aggregation size. In particular, we identify the most appropriate way to measure aggregation and then propose a simple model to explain the scaling phenomena. We start by reviewing typical performance metrics used for quantifying forecasting performance. B. Forecast Accuracy Performance Metrics The two performance metrics most commonly used in forecasting literature are Coefficient of Variation (CV) and Mean Absolute Percentage Error (MAPE). Coefficient of variation measures the ratio of the prediction error standard deviation to the signal mean. Consider two time series x(t) and its forecast x ˆ(t) for t = 1, ..., T . The empirical coefficient of variation (CV) measures the difference between these time series and is computed as

CV(x, xˆ) = 100

q

1 T

ˆ(t))2 ΣTt=1 (x(t) − x 1 T

ΣTt=1 x(t)

(%).

(1)

Likewise the mean absolute percentage error (MAPE) is defined as T 100 X x(t) − x ˆ(t) M AP E(x, xˆ) = (2) (%). T i=1 x(t)

(a) MAPE is the average MAPE over all groups with mean consumption W . (b) The choice of using mean consumption as a proxy for group size is deliberate. An alternative choice would be the number of customers in the group. But we observe that in fact customers with higher absolute consumption tend to be more predictable. (c) α0 /W p measures improvement in relative error from aggregation. It captures the reduction in underlying variance due to aggregation. α1 is the irreducible relative error. (e) The exponent p measures the rate of reduction of relative variance due to aggregation. (f) The ideal aggregation condition occurs when α1 = 0 and p = 1. We can extend our proposed empirical model to CV calculation as well. The population average CV scales as a function of WA according to CV(W ) = E[CV(xA , x Â ) |WA = W ] r β0 + β1 (%). (6) = Wp Assuming p = 1, the proposed scaling law segments the forecasting problem into two regimes: 1) Scaling: When α0 /W p ≫ α1 , relative error improves considerably due to aggregation. Eq. (5) can be approximated as r α0 . MAPE(W ) = Wp

CV and MAPE are relative error metrics traditionally reported in the literature although MAPE is more commonly reported. It is assumed they allow comparison of performance in different datasets. This paper investigates whether this fact is true for the case of electricity consumption.

2) Saturation: When α0 /W ≪ α1 , there is no improvement in forecasting from aggregation and MAPE(W ) ≈ √ α1 . The remainder of the paper validates empirically and theoretically the application of these scaling laws to forecasting load consumption utilizing a variety of methods.

C. Forecasting Scaling Laws

III. E XPERIMENT S ETUP A. Description of Data This study utilizes anonymized electricity data for residential and small and medium business (SMB) customers of Pacific Gas & Electric (PG&E). The data spans a one year period from 08/01/2010 to 07/31/2011. The residential data captures hourly electricity consumption for 116,423 customers. The SMB data is comprised by 15-min electricity consumption measurements for 150,000 customers. The data for each customer is temporally aggregated to form hourly interval measurements. Both datasets cover more than 408 zip codes and a variety of climate zones and business types. Figures 2(a) and 2(c) display the week long consumption of an individual household and SMB, respectively. The mean consumption of the data is of importance for this work. Figures 2(b) and 2(c) show the population distributions of mean hourly consumption for households and SMBs. The hourly population mean consumption for residential customers is 1.05 kWh and for SMB customers is 8.94 kWh. Not only SMB consumption is on average nine times higher than residential, but also jt is much less variable throughout the week.

Consider a set of N customers with consumption given by a time-series xn (t). The mean consumption for each customer is T 1X xn (t). Wn = T t=1

We randomly select a subset A ⊆ {1, 2, . . . , N } of customers and form a group that consumes X xA (t) = xn (t), (3) n∈A

P with mean consumption WA = n∈A Wn . We build a predictor for the aggregate time-series xA (t) that outputs the predicted sequence x Â (t) and evaluate MAPE(xA , x Â ). We postulate that the population average MAPE scales as a function of WA according to MAPE(W ) = E[MAPE(xA , x Â ) |WA = W ] r α0 + α1 (%). = Wp Some key observations about this empirical law are:

(4) (5)

1500

Energy Consumption [kWh] 0 1 2 3

3

# Users 500 1000

so q = 0 and Q = 0. The resulting model for the time-series x(t) is x(t) =

k=1

(b)

0

200

# Users 600

1000

Energy Consumption [kWh] 0 6 13 22 31 40 49

(a)

M

T

W TH F ST

Day of Week

5 10 15 20 25 30 Mean User Consumption [kWh]

(c)

(d)

Fig. 2. (a) 7 days of consumption for single residential customer. (b) Empirical Histogram of yearly mean consumption of all residential customers. (c) 7 days of consumption for single SMB customer. (d) Empirical histogram of yearly mean consumption of all SMB customers.

B. Generating Aggregate Consumption Residential aggregate consumption time series were generated by forming groups of randomly selected customers. 56 group sizes were chosen ranging from a single to 100,000 customers. 50 random groups for each size were generated by uniformly selecting customers. The mean hourly consumption of these groups ranged from 1kWh to 100MWh. The largest mean hourly consumption for each size ranged from 3 kWh to 180 MWh. SMB aggregate consumption time series are generated in a similar way. Forty three group sizes were selected ranging from one to 50,000 customers. The hourly group mean consumption ranges between 10kWh to 400MWh. The group with largest consumption had a 670MWh average hourly load. C. Forecasting models The proposed scaling laws are studied using three commonly used methods for short term load forecasting: Seasonal Auto Regressive Moving Average, Support Vector Regression and Feed Forward Neural Networks. The models are learned adaptively, so they are estimated again for every out of sample forecast. 1) Seasonal Auto Regressive Moving Average (SARMA): SARMA [6] predicts the electricity consumption in the next time step as a linear function of prior consumption values and forecast errors. Seasonality is considered by including additional predictors at a fixed prior period. A model SARMA(p, q)×(P, Q)s has autoregressive (AR) order p and moving average (MA) order q. It uses a seasonal component with a cycle of s time steps, with AR order P and MA order Q. This work considers a restricted class with no MA component

(7)

IV. E XPERIMENTAL R ESULTS A. Empirical MAPE with Aggregation Level 100 %

0.5 1.5 2.5 3.5 Mean User Consumption [kWh]

φi x(t − sk) + ǫ(t).

10 %

ST

k=1

MAPE

W TH F

Day of Week

P X

It is usual to assume ǫ(t) ∼ N (0, σ 2 ) is an independent and identically distributed normal variable. Values of p ≥ 4 are not recommended because it can lead to overfitting [12]. The seasonality is set to s = 24 hours and AR order P = 1. The adaptive SARMA model relearns the parameters θ and φ but keeps the parameters p, P and s fixed. 2) Support Vector Regression (SVR): Define yk (t) to be a length k vector holding samples x(t) to x(t−k+1). SVR (e.g. [8]) builds a nonlinear map between x(t) and yk (t − 1) of the form x(t) = wT Φ(yk (t − 1)) + b + ǫ(t). The kernel function Φ is appropriately chosen. The observations at t − T, ..., t are utilized to determine the parameters w and b by computing a fit minimizing the magnitude of error ǫ(t). The prediction is the output of the fitted model to the input yk (t). 3) Feed Forward Neural Network (FFNN): FFNNs (e.g. [2]) provide a popular alternative to define the nonlinear map between x(t) and yk (t−1) used in the SVR model description. They are subset of artificial neural networks where neurons with a chosen activation function connect to each other in layers without feedback. The number of neurons, layers , choice of activation function and network parameters are learnt from the training set.

Error Fit Ideal

1%

T

θk x(t − k) +

0.1 %

0

M

p X

1k

10 k

100 k 1M Mean Load Wh

10 M

100 M

Fig. 3. Mean absolute percentage error (MAPE) for varying aggregates of residential load. SARMA model M1 is applied to data. MAPE against Mean Load for each experiment is shown in green markers. Best fit (solid line) has p = 0.88. Dashed line indicated error scaling with ideal aggregation effect (no irreducible error and p = 1). Critical load W ⋆ = 2250 kWh. Irreducible √ error is α1 = 1.52.

The first analysis investigates the performance of the SARMA(1, 0)×(1, 0)24 (model M1 ) in the one hour ahead

4

prediction task. Aggregate time series xA were generated according to Section III-B. For each time-series, the corresponding mean load WA , forecast x Â and performance MAPE(xA , xÂ ) were computed. The aggregation-error curve is the plot of the pairs (WA , MAPE(xA , x Â )) for all generated groups A as shown in Figure 3. The scaling law in Eq. (5) was fit to this data using least-squares. The fit (solid line) and the ideal aggregation (dashed line) scaling laws are displayed in Figure 3. Fit parameters are shown in Table II. Note p < 1 indicating the scaling rate is reduced from the ideal. The origin for this phenomena is explained in Section V-A.

M1 . The mean MAPE for an individual residential consumer is 29%. MAPE is highly variable at the individual level, with 25% quantile of 23% error and 75% quantile of 56%. This variability is explained by the variability in the average hourly loads for residential customers, from 1kWh to 4kWh. The fit scaling law (Table II) applied to these kWh group sizes predicts MAPE errors of 24% and 45%, respectively. The closeness of the prediction and the quantiles provides an initial validation of the choice of kWh average as the scaling parameter. D. Comparison of Different Models

B. Critical Load W ⋆ The scaling law can be decomposed into scaling and saturation regimes as shown in Sec. II-C. The transition point between regimes is defined as the load aggregation level W ⋆ where the regime approximations are equal. The critical load ⋆ WMAPE is the positive solution W to r √ α0 (8) = α1 . Wp

⋆ WCV is defined in a similar way. The critical load for model M1 is 2.2MWh (Table II). The scaling regime extends from 1kWh to 2.2MWh aggregate loads, and the saturation regime extends from 2.2MWh to 100MWh.

Means Fit

Model

Description

M1 M2 M3 M4 M5

SARMA(1, 0)×(1, 0)24 SARMA(2, 0)×(1, 0)24 SARMA(3, 0)×(1, 0)24 SVR - radial basis function FFNN - logistic activation function

TABLE II S CALING LAW FIT FOR MAPE

Model

p

√ α0

M1 M2 M3 M4 M5

0.88 0.83 0.81 1.02 0.94

45.44 43.66 42.13 56.76 45.33

√ α1 (95% CI) 1.522 0.895 0.818 4.153 2.445

(1.521 (0.894 (0.816 (4.151 (2.444

1.523) 0.896) 0.820) 4.154) 2.446)

⋆ WMAPE

2250 11697 16856 168 498

TABLE III S CALING LAW FIT FOR CV

0

0

20

MAPE 40 60

Group Mean [kWh] 200 400 600 800

80

100

1200

C. Linear Scale Observations

TABLE I M ODELS USED IN ANALYSIS

1

5 20

100 400 800 Group Size

(a)

2000

0

200

400 600 800 1000 Group Size

(b)

Fig. 4. (a) MAPE boxplot of randomly generated groups of N customers. (b) Group mean load vs number of customers.

Linear scale analysis investigates the forecasting performance scaling with respect to number of customers that form a group. Analyses with 2000 customers was presented in [14] and with 200 customers in [15]. However, analysis with such limited group sizes are unable to identify the various regimes present in the scaling law. More importantly, they also fail to identify the nature of the law that is driven by average kWh consumption rather then numbers. This observation is important when considering heterogeneous customers. If a 500kWh average consumer forms a group with ten 1kWh consumers, the MAPE is predicted at level 510kWh. If customers are somewhat homogeneous, linear scale analysis can provide some insight. Figure 4(a) plots empirical MAPE against number of customers in a group for model

Model

p

M1 M2 M3 M4 M5

0.84 0.78 0.82 0.72 0.92

√

β0

53.81 53.50 54.40 52.92 55.87

√ β1 (95% CI) 2.132 1.254 1.172 5.961 3.465

(2.130 (1.249 (1.171 (5.959 (3.463

2.133) 1.259) 1.174) 5.963) 3.467)

⋆ WCV

2179 15122 11615 430 421

In this section we validate that the scaling law holds for a variety of typically utilized models in short term load forecasting. The scaling law parameters also provide a way to compare the performance of these different models. Five models were selected for comparison (Table I) after testing a variety of different choices. The resulting scaling law fits for MAPE and CV are shown in Tables II and III. Figure 5 displays the resulting √ MAPE fits. The variation √ models of the scaling error α0 and β0 between different √ √ is small. The irreducible errors α1 and β1 are quite different between the models. Combining these two important

100 % 10 % 10 k

[7] [17][14] [6]

1%

100 k 1 M 10 M 100 M Mean Load Wh

1k

10 k

MAPE

[6]

10 % 10 k


(c)

[1]

[3] [11]

[13] [5]

1%

[2]

0.1 %

MAPE 1k

0.1 %

Error Fit Ideal

Error Fit 95 % Int.


(b)

1k

1%

MAPE

10 %

10 %

Error Fit Ideal

100 %

(a)

0.1 %

1k

1% 0.1 %

100 %

MAPE

100 %

Error Fit Ideal

100 %

0.1 %

1%

MAPE

10 %

5

Error Fit Ideal

1k

10 k


10 k

100 k

1M 10 M Mean Load Wh

100 M

1G

10 G

Fig. 6. Forecasting performance on varying levels of aggregation using seasonal model M3 , including the 95% confidence interval for the fit. Previous work on short term load forecasting is superimposed. Data suggests that previous work agrees with model suggested here.

(d)

Fig. 5. Approximate model fit to MAPE(W ). (a) Model M2 is a seasonal ARMA model: (1, 0)×(1, 0)24 . (b) Model M3 is a seasonal ARMA model: (2, 0)×(3, 0)24 . (c) M4 (SVR) and (d) M5 (FFNN). Note that for short term forecasting, non-linear methods might not always provide the most accurate methods. They are also more difficult to train.

observations leads to identifying the irreducible errors as a fundamental performance metric for model comparison. Utilizing this metric, the most complex adaptive seasonal linear model M3 is the best preforming. The different nonlinear models tested SVR and FFNN perform poorly, with irreducible MAPE errors of 4% and 2.4%, respectively. SVR in particular has been shown to sometimes outperform linear models [8] and sometimes perform poorly [16]. This might be explained by scaling law effects. Nonlinear methods considered in our study performed worse than adaptive linear models for short term load forecasting. Furthermore, they are in general more difficult to train and can be unstable under adaptive learning. The critical load value W ∗ can be compared between different models. It can be seen to depend almost exclusively on the irreducible error since the reducible errors are close to each other. This observation leads to the conclusion that forecasters with low irreducible error benefit more from aggregation. Model M3 has critical load 16MWh and its flat region in the scaling law is far to the right when compared to Model M5 (Figure 5). E. Short Term Load Forecasting Literature Review The literature on load forecasting is very extensive. We focus on papers related to very short term load forecasting, and do not attempt to provide an extensive review of other tasks. The main goal is to compare the different papers according to the approximate scale of the forecast. Recent work shows a fundamental limitation to the predictability of individual customers. [8] performs one hour

ahead forecasting based on hourly data utilizing machine learning. The methods achieve MAPE of 1.61% to 13.41% for a 700kWh commercial building and between 15% to 30% for three homes with mean consumption close to 1.5kWh. In [17], machine learning methods are compared on data from three homes with mean consumption 1 to 2 kWh achieving MAPE close to 25%. In [16], various methods are utilized to forecast peak demand for individual homes. The authors conclude that seasonal autoregressive models achieve the best performance, with MAPE of 30%. In [9], a Kalman filter based forecaster is applied to single home data with mean consumption of 0.8kWh to achieve MAPE of 30%. Very low errors are reported at high aggregation levels. In [1] the authors use an artificial neural network to forecast a mean load of 2.5 GWh, with MAPE ranging from 1.73% to 3.02%. In [5] the authors apply wavelet multi-scale decomposition based autoregressive approaches. They report MAPE values of 0.7% to 3.5% depending on the method used on a dataset with mean load 9GWh. In [13] artificial neural networks are applied to data with a mean consumption of 800 MWh achieving MAPE errors from 1.11% to 1.63%. Similarly, [2] obtains MAPE in the range 0.81% to 1.21% utilizing neural networks on a 8GWh load data. In [7] a novel ANN architecture is applied to two datasets with peak load 4.4GWh and report MAPE between 0.8% and 1.5%. Finally, [11] applies artificial neural networks to attain an error rate of 1.7% for a load of 7 GWh. The MAPE for different studies can be compared utilizing the MAPE and the approximate aggregate size. Figure 6 displays the results and compares them to the scaling law obtained for model M3 . In the lower aggregation range, the results fall within the 95% confidence interval and at the GWh aggregation level the accuracy is related to that predicted by the model.

6

F. Multiple Hour Ahead Forecasting

loads 100kWh to 1MWh) need to consider the improvements due to aggregation when compared to each other.

100 %

100 %

H. Robustness of the Scaling Law

1 Hour Ahead 2 Hour Ahead 3 Hour Ahead 4 Hour Ahead Ideal

1%

10 %

100 %

MAPE

MAPE

10 %

RES SMB

MIN 10 % 25 % 50 % 75 % 90 % MAX Fit

1% 1k

10 k

100 k

1M

Mean Load Wh

10 M 100 M

(a)

1k

100 k

10 M 400 M

Mean Load Wh

(b)

Fig. 7. (a) MAPE for multiple hours ahead using model M3 (b) Estimated model using M3 for both residential and SMB time series.

MAPE 10 %

0.1 %

Horizon (hours ahead) 1 2 3 4

MAPE √ √ α0 α1

√

β0

√ β1

73.7 90.2 96.8 99.5

57.2 76.1 86.4 92.6

1.06 7.79 19.93 34.26

1.45 7.30 9.89 11.69

CV

This section tests the scaling law with respect to forecasting multiple hours ahead to verify its validity for different horizons. The model M3 is utilized for this purpose. Figure 7(a) and Table IV displays the results. It is clear that the irreducible error increases with forecasting horizon, reducing the benefit from aggregation. It indicates that for more complex tasks such as day ahead forecasting, the forecasters need to be designed carefully to achieve low irreducible error. G. SMB Data Analysis The analysis is extended to the SMB dataset by computing the MAPE scaling law for model M2 (best performing). The √ √ obtained parameters are α0 = 46.67, α1 = 0.92 and p = 0.82. The scaling law can be compared to the residential dataset scaling law for the same model. The parameters for model M3 shown in Table II are very close to the obtained for SMB data. Figure 7(b) compares the obtained laws and the SMB data scaling. Notice the scaling law is quite consistent despite the mean loads of residential and SMB data differing by an order of magnitude. This observation validates the choice of kWh average to drive the scaling rather than the number of customers. An intuitive interpretation of the scaling law is that every building (residential or SMB) consumes electricity as a series of tasks of similar average sizes. Larger buildings can be thought of as an aggregation of smaller buildings, so the number of tasks scale linearly and so does average consumption. This leads to kWh providing the proper scaling for forecasting. Finally, we note that for the SMB data the critical load is close to 10MWh. Forecasting studies in commercial buildings (mean

1%

TABLE IV S CALING LAW FIT FOR MULTIPLE HORIZONS

1k

10 k

100 k 1M Mean Load Wh

10 M

100 M

Fig. 8. Comparing quantiles of forecast errors for each aggregation group. The scaling law is robust to the mechanism utilized to generate groups.

Its important to verify whether the previous results were obtained due to the method of generating aggregates, that is following a random assignment. In this section we compare the aggregation error curve for the aggregates with best performance and worst performance at each group level. The choice is determined by setting a quantile for MAPE error at each group size for residential data. Figure 8 displays the result. The scaling law is observed at the different performance quantiles, thus different aggregation mechanisms will obtain results similar to reported in this paper. V. A NALYTIC M ODEL OF AGGREGATION In this section we propose a theoretical analysis to validate the empirically tested scaling law and observed aggregationerror curves. We present a set of sufficient conditions for the underlying consumption stochastic process that lead to the proposed scaling. The CV error metric is chosen due to its mathematical properties that leads to a simple analysis. The analysis is developed in multiple steps: a general individual behavioral model, a statistical analysis of aggregate forecasting, and specific statistical assumptions on the components of the behavioral model. A. Individual Consumption Behavior Electricity consumption for individual n for a specific time is given by xn (t) and the profile spans a 24 hour period. The consumption is decomposed as xn (t) = pn (t) + en (t). The individual chooses a daily profile for each day pn ∈ RT (T = 24) and deviates from it according to en . Therefore a dataset spanning many days is composed of two different stochastic processes: an individual dependent unique shape

7

generation process and a random deviation stochastic process en . We detail next the forecasting process applied to aggregates and relate the error to pn and en .

the error variance of the following type ! N X VAR en (t) = κN 2 + σ ′2 N, n=1

B. Aggregate Forecasting Error Decomposition aggregation of N consumers is given by xN = PAn N x n=1 n . Assume a forecaster that produces the corresponding forecast xˆN . The forecaster is a function of the previous data available at a particular time. The forecast residual is rN = xN − x ˆN . The following assumptions hold: A1 The minimum support of the consumption signal mean grows P PT linearly with the number of customers: 1 n∈A T t xn (t) ≥ N µ. A2 The error residuals from the forecaster are uncorrelated in time, i.e. the covariance COV (rN (t), rN (t + τ )) = 0 for τ 6= 0.

Denote by PN ∈ RN ×T the matrix of the profile patterns pn for all N customers. The mean squared error (MSE) for the forecaster can be decomposed by conditioning on the (random) profile matrix using the tower property:

where the parameters κ and σ ′ depend on the specific error structure. We identify these parameters for two cases. Consider a finite correlation structure where en correlates to a constant fixed P number K of other individual errors. In N this case, VAR = (σ 2 + Kρ)N = N σ ′2 . In n=1 en (t) particular this captures the case where errors are uncorrelated (K = 0). Assume there is a correlation structure where any two individuals can have a positive covariance with some probability. That is COV(em (t), en (t)) = ρσ 2 with probability γ for any two individuals. Consider the indicator random variable Im,n which is equal to 1 with probability γ or zero otherwise. Using the law of total variance: ! " !# N N X X VAR en (t) = E VAR en (t)|I n=1

E[MSE(xN , x ˆN )] = E[E[MSE(xN , x ˆN )|PN ]]. The conditional MSE on profiles (E[MSE(xN , x ˆN )|PN ]) can be further decomposed according to T 1 X 2 E rN (t)|P T t=1   !2 T N N X 1 X  X = E pn (t) − x ˆN + en (t) |PN  T t=1 n=1 n=1   !2 N T X X 1 pn (t) − x ˆN (t) |PN  + = E T t=1 n=1 ! T N X 1 X 2 + E en (t) T t=1 n=1 !2 T N 1 X 1 X x ˆN (t) 2 =N E |PN pn (t) − T t=1 N n=1 N | {z }

E[MSE(xN , x ˆN , )|PN ] =

Population Bias δ(ˆ xN /N,PN

)2

! T N X 1 X VAR en (t) , + T t=1 n=1 {z } | Additive Error

where VAR denotes the variance function. Specific (general) assumptions on the forecaster and the random error process rn lead to estimates of the scaling of these quantities as explained in the next subsections.

n=1 = E N σ 2 + ρσ 2 #{m,n pairs equal to one} γρσ 2 γρσ 2 2 2 N +N σ − = 2 2 = κN 2 + σ ′2 N.

D. Population Bias and Forecasters The quantity T 1X δ(ˆ xN /N, PN ) = T t=1 2

N 1 X xˆN (t) pn (t) − N n=1 N

!2

represents how well any function can match the mean profile of a the population. Notice that this depends on the set of profiles PN and the estimating function. If we an unbiased estimate of the population generating process. That is xˆNN = p¯, the mean profile of all customers. Under such estimator: 

T 1 X E[N 2 δ(¯ p, P )2 ] = N 2 E  T t

VAR(P ) N = N VAR(P ).

= N2

!2  N X 1 pn (t) − p¯(t)  N n=1

If on the other hand, a forecaster has non-zero expected population bias as: x1 ) . E[N 2 δ(ˆ x, P )2 ] = N 2 δ¯2 (ˆ

C. Additive Error and Properties of en We consider different correlation structures on the random error en that are commonly considered in practice. The main observation is that the various structures generate a scaling of

E. CV Scaling Law We can combine the two terms which results in the following:

8

E[MSE(xN , x ˆN )] = E [E [MSE(xN , x ˆN )|PN ]] 2 ¯ xN ) + κ) N 2 + σ ′2 N = (δ(ˆ | {z } δ2

The coefficient of variation (CV) can be now bounded as a function of MSE. First using Jensen’s inequality: p ˆN )2 ]. E[CV (xN , xˆN )] ≤ E[CV (xN , x Finally combining the results:

E[CV (xN , xˆN )] ≤ E[CV2 (xN , x ˆN )]  

1/2

1/2

ˆ1 )    M SE(xN , x = E  P P 2  T N 1 x (t) n t n T 1/2 E[M SE(xN , xˆ1 )] ≤ µ2 N 2 1/2 2 2 N δ + N σ2 = µ2 N 2 2 1/2 δ σ2 = + µ2 N µ2

Therefore we have the upper bounding envelope function: s σ2 1 δ2 + 2 E[CV(xN , x ˆN )] ≤ 2 µ µ N Using assumption A1, the relationship can be put as a function of mean load yielding: r α1 E[CV(xN , x ˆN )] ≤ α0 + W The result can also be understood according to the following intuitive analysis. Coefficient of variation is approximately √ residual variance/signal mean. If the residual mean grows as κN 2 + σ 2 N , the quadratic component of the growing MSE will counterbalance the growing signal mean. In particular, the gain in aggregation is due to reduction of the effect due to the variance of en , but its limited by bias and variance terms that grow quadratically with aggregation numbers. VI. C ONCLUSION This paper introduces the idea of the effect of aggregation on load forecasting. We show that forecasting accuracy, as measured in relative error (MAPE/CV) improve with larger mean load until a critical load. We verify this model with empirical experiments. We also provide sufficient conditions leading to the observed aggregation-error curves introduced in the paper. Currently, various papers focus on new model formulations to describe individual electricity consumers (e.g. [4], [3], [10]). These models can be utilized to justify a detailed understanding of how aggregate consumption patterns are formed and verified on higher resolution data. Moreover, novel ideas can be investigated for aggregate forecasting based on models induced by aggregating this individual consumption models.

The aggregation phenomena is also likely to be observed in other types of forecasting processes, such as for example day ahead load forecasting, wind forecasting and electric vehicle availability. Determining the scaling parameters for these problems is an important task as it can lead to new concepts on the limitations of forecasting big and small aggregates. R EFERENCES [1] G.A. Adepoju, S.O.A. Ogunjuyigbe, and K.O. Alawode. Application of neural network to load forecasting in nigerian electrical power system. The Pacific Journal of Science and Technology, 8(1):68–72, 2007. [2] A.J. Al-Shareef, E.A. Mohamed, and E. Al-Judaibi. One hour ahead load forecasting using artificial neural network for the western area of saudi arabia. International Journal of Electrical Systems Science and Engineering, 1(1):35–40, 2008. [3] A. Albert and R. Rajagopal. Thermal profiling of residential energy consumption using smart meter data. IEEE Transactions on Power Systems, Submitted for review, 2012. [4] O. Ardakanian, S. Keshav, and C. Rosenberg. Markovian models for home electricity consumption. In Proc. ACM SIGCOMM Green Networking Workshop, pages 31–36, 2011. [5] D. Benaouda, F. Murtagh, J.L. Starck, and O. Renaud. Waveletbased nonlinear multiscale decomposition model for electricity load forecasting. Neurocomputing, 70(1):139–154, 2006. [6] G. Box, G. M. Jenkins, and G. C. Reinsel. Time series analysis: forecasting and control. Wiley Publisher, 2013. [7] I. Drezga and S. Rahman. Short-term load forecasting with local ann predictors. IEEE Transactions on Power Systems, 14(3):844–850, 1999. [8] R. E. Edwards, J. New, and L. E. Parker. Predicting future hourly residential electrical consumption: A machine learning case study. Energy and Buildings, 49:591–603, 2012. [9] M. Ghofrani, M Hassanzadeh, M. Etezadi-Amoli, and MS. Fadali. Smart meter based short-term load forecasting for residential customers. In North American Power Symposium (NAPS), 2011, pages 1–5. IEEE, 2011. [10] Z. J. Kolter and T. Jaakkola. Approximate inference in additive factorial hmms with application to energy disaggregation. In International Conference on Artificial Intelligence and Statistics, pages 1472–1482, 2012. [11] K.Y. Lee, Y.T. Cha, and J.H. Park. Short-term load forecasting using an artificial neural network. IEEE Transactions on Power Systems, 7(1):124–132, 1992. [12] A. Pankratz. Forecasting with univariate Box-Jenkins models: Concepts and cases, volume 224. Wiley Publishing, 2009. [13] T. Senjyu, H. Takara, K. Uezato, and T. Funabashi. One-hour-ahead load forecasting using neural network. IEEE Transactions on Power Systems, 17(1):113–118, 2002. [14] R. Sevlian and R. Rajagopal. Value of aggregation in smart grids. In Smart Grid Communications (SmartGridComm), 2013 IEEE International Conference on, pages 714–719, 2013. [15] P.G. Da Silva, D. Ilic, and S. Karnouskos. The impact of smart grid prosumer grouping on forecasting accuracy and its benefits for local electricity market trading. IEEE Transactions on Smart Grid, 5(1):402– 410, 2014. [16] R. P. Singh, P. X. Gao, and D. J. Lizotte. On hourly home peak load prediction. In IEEE International Conference on Smart Grid Communications, 2013. [17] H. Ziekow, C. Goebel, J. Struker, and H. Jacobsen. The potential of smart home sensors in forecasting household electricity demand. In IEEE International Conference on Smart Grid Communications, 2013.