Applied Energy 87 (2010) 925–933
Contents lists available at ScienceDirect
Applied Energy journal homepage: www.elsevier.com/locate/apenergy
A data-driven approach for steam load prediction in buildings Andrew Kusiak *, Mingyang Li, Zijun Zhang Department of Mechanical and Industrial Engineering, 3131 Seamans Center, The University of Iowa, Iowa City, IA 52242–1527, USA
a r t i c l e
i n f o
Article history: Received 18 June 2009 Received in revised form 29 August 2009 Accepted 3 September 2009 Available online 30 September 2009 Keywords: Data mining Building load estimation Steam load prediction Neural network ensemble Energy forecasting Monte Carlo simulation Parameter selection
a b s t r a c t Predicting building energy load is important in energy management. This load is often the result of steam heating and cooling of buildings. In this paper, a data-driven approach for the development of a daily steam load model is presented. Data-mining algorithms are used to select significant parameters used to develop models. A neural network (NN) ensemble with five MLPs (multi-layer perceptrons) performed best among all data-mining algorithms tested and therefore was selected to develop a predictive model. To meet the constraints of the existing energy management applications, Monte Carlo simulation is used to investigate uncertainty propagation of the model built by using weather forecast data. Based on the formulated model and weather forecasting data, future steam consumption is estimated. The latter allows optimal decisions to be made while managing fuel purchasing, scheduling the steam boiler, and building energy consumption. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Predicting the load of heating, ventilating and air-conditioning (HVAC) systems is important for energy management especially during peak energy demand hours [1]. Prior research has been done for both the long- and short-term prediction of heating and cooling loads. The approaches discussed in the literature include exponential smoothing [2], multiple regression [3], Kalman filters, [4] and state estimation [5]. The autoregressive integrated moving average (ARIMA) model [6,7] is an example of statistical approach applied for predictions of several hours. Time-series models capture relationship between the energy usage over time given time-series data. The statistical time-series models may lead to numerical instability and could be inaccurate, if highly correlated factors such as weather-related variables are ignored [8]. Traditional models usually reflect stationary linear relationships between the load and weather variables; however, the nonlinearity and complexity of the weather–load relationship make them not practical, especially for the long term-forecasting [9]. In recent years, considerable attention has been given to data-driven based methods, e.g., neural networks (NNs) [9–11], support vector machine (SVM) [12], etc. Unlike energy analysis based on analytical models [13–15] and industrial methods [16,17], data-mining algorithms offer powerful tools for discovery of models from large volumes of data. Another advantage of data-derived models is that they can be easily updated based on new data. This is especially important as the relationship
* Corresponding author. Tel.: +1 319 3355934; fax: +1 319 3355669. E-mail address:
[email protected] (A. Kusiak). 0306-2619/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.apenergy.2009.09.004
between the weather and the load is nonstationary and nonlinear. Elkateb et al. [18] applied a fuzzy neural network for forecasting medium-term loads and enhancing the prediction performance by introducing a time index feature. Hou et al. [19] combined the rough set and support vector machine theory to predict the cooling load. The rough set approach was applied to find the relevant parameters of the load. Based on the building cooling load calculated from the software, Li et al. [20] compared different supervised learning algorithms for load prediction. Yang et al. [21] evaluated performance of an adaptive NN to model unexpected pattern changes in the incoming data. A real-time approach to on-line building energy prediction was presented. In this paper, steam load is used to represent the heating and cooling load of buildings, as most of the steam consumption is absorbed by the two loads. Data-mining algorithms are used to develop a nonlinear mapping between the steam load and the outside weather data. Computational experience has determined that the multiple-linear perceptron (MLP) ensemble NN outperformed other algorithms. The MLP NN ensemble is selected for the development of a steam load predictive model on a daily basis. The future steam load is estimated by updating the predictive model with the weather forecast data. The steam prediction model is of interest to building energy management, fuel management, and boiler operations and maintenance scheduling. 2. Data description and methodology for steam load prediction This research is based on the data provided by the University of Iowa (UI) Energy Management Department. The UI Power Plant
926
A. Kusiak et al. / Applied Energy 87 (2010) 925–933
produces steam consumed by over 100 buildings, including the University Hospital. In the cold season, steam provides heat to the buildings. In the cooling season, steam is predominantly used to run chillers. The weather patterns impact the total steam consumption. The weather and the steam load data are stored in a data historian (PI system). A predictive model is built based on the historical steam consumption and the weather data collected from 2004 to 2007. Once the weather forecasting data becomes available, the steam load can be computed with the predictive model. The diagram in Fig. 1 illustrates the model extraction and steam load estimation process. Though the historical data has been collected at a higher frequency (here 1 min), the weather forecast frequency is a limiting factor, and therefore daily intervals are considered. However, once validated for daily loads, the methodology proposed in this paper can be applied to estimate loads at any frequency determined by the data, e.g., a minute, 10 min, or an hour. The basic description of the data used in this research is provided in Table 1. The data in Table 1 has been preprocessed, and therefore missing and abnormal (e.g., outside of the physical range) data has been removed. 3. Case study In this section, parameters, training data, and the algorithm are selected for the development of a steam load model. 3.1. Parameter selection and dimensionality reduction The historical steam load and outside weather data stored in the PI system at 1-min time intervals are used. The dataset of 1000
Actual steam load
Outside weather patterns
Weather forecasting data
Supervised learning algorithm
Steam load predictive model Fixed HVAC operations
Estimated steam load Fig. 1. Modeling process.
data points contains values of 13 parameters. Six of them are related to the power plant sideline loads. The daily total steam load is calculated by summing up these values. The remaining parameters include outside air temperature, outside air humidity, barometric pressure, wind speed, rain gauge, solar radiation, and wind position. Unfortunately, for some parameters most of the data was missing. Only two parameters, air temperature and humidity, were somewhat complete. To represent the daily weather patterns, four attributes are computed for each parameter: the mean, standard deviation, maximum, and minimum value. After data transformation, nine parameters are derived from the original data set, as illustrated in Table 2. Among the eight input parameters (the last eight rows in Table 2), some may be redundant or even irrelevant to the steam load prediction, and therefore parameter selection is needed. Eliminating parameters that are less important may improve the comprehensibility, scalability, and possibly, accuracy of the resulting models [22]. The correlation data shown in Table 3 demonstrates the strength of linear relationships among different parameters. Assuming each parameter is a random variable, correlation quantifies the strength and direction of the linear relationship among random variables [23]. As shown in Table 3, when the correlation threshold is set at 0.2, four variables, namely Temp_max, Temp_mean, Humidity_std and Humidity_min, correlate to the output Total_day. Among these four variables, Temp_mean and Temp_max have a strong linear relationship. The Humidity_std and Humidity_min are also strongly correlated. Therefore, the input dimension can be reduced to two, one reflecting the outside air temperature and the other humidity. Correlation indicates a degree of linear relationship among variables; however, the total dependence structure cannot be fully conveyed by this statistical measure. Data mining offers ways to select parameters to overcome the ‘‘curse of dimensionality” phenomenon [24]. High dimensional data may be redundant and some of the data may not be useful, thus increasing complexity of data mining. Dimensionality reduction (parameter selection) discards unrelated and redundant data. In this paper, a boosting tree algorithm is used, as it shares advantages of the decision tree induction and tends to be robust in removal of irrelevant parameters [25,26]. In the boosting method, a sequence of binary trees is built. Each tree focuses on learning instances misclassified by the previous trees based on the prediction error. All trees are integrated with different weights in a single model. In the boosting tree algorithm, a split at every node of any regression tree is based on certain cri-
Table 2 Descriptions of transformed parameters. Transformed Parameter
Description
Unit
Total_day Temp_mean
Total steam load per day Mean value of the outside air temperature per day Standard deviation of the outside air temperature per day Maximum value of the outside air temperature per day Minimum value of the outside air temperature per day Mean value of the outside air humidity per day Standard deviation of the outside air humidity per day Maximum value of the outside air humidity per day Minimum value of the outside air humidity per day
klbs °F
Temp_std Temp_max Table 1 Data set description.
Temp_min
Data set
Time period
Description
Training data set Validation data set Test data set
1/1/2004– 12/31/2005 1/1/2006– 12/31/2006 1/1/2007– 12/31/2007
722 Observations; used for parameter selection, algorithm selection, and data split 357 Observations; used for parameter selection, validation of the algorithm, and data split 364 Observations; used to test models
Humidity_mean Humidity_std Humidity_max Humidity_min
°F °F °F RH RH RH RH
927
A. Kusiak et al. / Applied Energy 87 (2010) 925–933 Table 3 Correlation coefficient values between different parameters. Parameters Correlation coefficient Parameters Correlation coefficient Parameters Correlation coefficient Parameters Correlation coefficient Parameters Correlation coefficient Parameters Correlation coefficient
Temp_mean Total_day 0.3906
Temp_std Total_day
Temp_max Total_day 0.3928
Temp_min Total_day 0.1458
Humidity_mean Total_day 0.162
Humidity_std Total_day 0.2228
0.1789
Humidity_max Total_day 0.0034
Humidity_min Total_day 0.2067
Temp_std Temp_mean 0.2772
Temp_max Temp_mean 0.9188
Temp_min Temp_mean
Humidity_mean Temp_mean 0.1008
Humidity_std Temp_mean 0.2821
Humidity_max Temp_mean 0.0931
Humidity_min Temp_mean 0.2097
Temp_max Temp_std 0.5584
Temp_min Temp_std 0.5739
Humidity_mean Temp_std 0.439
Humidity_std Temp_std 0.4232
Humidity_max Temp_std 0.222
Humidity_min Temp_std 0.4808
Temp_min Temp_max 0.2409
Humidity_mean Temp_Temp_max 0.1988
Humidity_std Temp_max 0.3696
Humidity_max Temp_max 0.0401
Humidity_min Temp_max 0.3149
Humidity_mean Temp_min 0.315
Humidity_std Temp_min 0.1433
Humidity_max Temp_min 0.2977
Humidity_min Temp_min 0.2627
Humidity_std Humidity_mean 0.5734
Humidity_max Humidity_mean 0.7742
Humidity_min Humidity_mean 0.9187
Humidity_max Humidity_std 0.0072
Humidity_min Humidity_std 0.8027
Humidity_min Humidity_max 0.554
teria, e.g., minimization of the total regression error used in this paper. In the process of generating successive trees, the statistical importance of each variable at each split of every tree is accumulated and normalized. Predictors with a higher importance rank indicate a larger contribution to the predicted output parameter. Table 4 lists the predictor importance produced by the boosting tree algorithm. As shown in Table 4, if the threshold value for the rank is set at 40, five variables above this value are considered as important. Based on the correlation analysis and the results produced by the boosting tree algorithm (see Tables 3 and 4), two variables, Temp_mean and Humidity_min, are selected as inputs to the steam load prediction model. 3.2. Algorithm selection After parameter transformation and selection, the steam prediction model is expressed in (1).
ysteam ¼ f ðxTemp
mean ; xHumidity min Þ
ð1Þ
To extract the mapping among these variables, several datamining algorithms, namely CART, CHAID, exhaustive CHAID, boosting tree, MARSplines, random forest, SVM, MLP, MLP Ensemble, and k-NN, are used. The algorithms included in the data mining software Statistica has been used. CART (classification and regression tree) is a common method for building statistical models in a tree-building fashion. Developed by Breiman et al. [27], CART constructs binary trees for both classification trees and Table 4 Predictor rank and importance produced by the boosting tree algorithm.
Temp_mean Temp_max Temp_min Humidity_mean Humidity_min Humidity_max Temp_std Humidity_std
Variable rank
Importance
100 85 68 56 48 39 37 35
1.000000 0.852059 0.678650 0.560133 0.481238 0.386233 0.372589 0.347607
0.4702
regression. It predicts continuous as well as categorical variables with minimization of prediction square error as the split criterion. Chi-squared automatic interaction detector (CHAID) [28] and exhaustive CHAID [29] is decision-tree algorithm allowing multiple splits of nodes. Exhaustive CHAID is derived from CHAID, and it involves a more thorough merging compared to the standard CHAID. Boosting tree [25,26] is a data-mining algorithm applied in regression and classification. Multivariate adaptive regression splines (MARSplines) [30] is a nonparametric regression procedure for solving regression-type problems. It predicts continuous values based on a set of predictors. Random forest is a data-mining method for classification and regression introduced by Breiman and Cutler [31]. Unlike the standard classification tree that uses the best split among all variables at each node, the random-forest algorithm uses the best split among a subset of randomly selected predictors at that node. Support vector machine [26] (SVM) is a supervised learning method based on kernel functions, and it is used for classification and function approximation. Using specific kernel functions, original parameter space is transformed into a high-dimensional space where a separated hyperplane is constructed with the maximum-margin. Multi-layer perceptron (MLP) [32,33] is a commonly used feed-forward neural network involving numerous units organized into multiple layers. Through adaptive adjusting weights among units under supervised learning, the MLP is capable of identifying and learning patterns based on input data sets and the corresponding target values. Due to the high variance parameter of the MLP, the MLP ensemble method [24] presents a model combiner to leverage the power of multiple models in achieving better prediction accuracy than any individual MLP models could on their own. The knearest neighbor (k-NN) is a simple machine-learning algorithm [24] based on the concept that objects which are similar to each other are likely to belong to the same category. It is used in the prediction of continuous and categorical variables. As shown in Table 1, the data set of 2004–2005 is used to build a model, while the 2006 data is used to validate it. The following four metrics (3)–(7) are used to measure the prediction accuracy of the model: the mean absolute error (MAE), standard deviation of absolute error (Std_AE), mean absolute percentage error (MAPE) and standard deviation of absolute percentage error (Std_APE) [34]:
928
A. Kusiak et al. / Applied Energy 87 (2010) 925–933
^ yj AE ¼ jy PN AEðiÞ MAE ¼ i¼1 N ^y y APE ¼ j j y PN APEðiÞ MAPE ¼ i¼1 N sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN 2 i¼1 ðAEðiÞ MAEÞ Std AE ¼ N1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN 2 i¼1 ðAPEðiÞ MAPEÞ Std APE ¼ N1
ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ
^ is the predicted value where AE in (2) is the absolute error, and y obtained from the model, while y is the actual value measured, N is the number of data points used for model training, validating or testing. Table 5 presents the training and test accuracy results of models built with different data-mining algorithms. In the CART algorithm, minimization of the misclassification cost is used as the splitting criterion, while splitting continues when the minimum number of cases at the node is more than five and the maximum number
Table 5 Training and testing accuracy results for models extracted with different data-mining algorithms. Data Set
Algorithm
MAE
Std_AE
Training 2004– 2005 Testing 2006 Training 2004– 2005 Testing 2006 Training 2004– 2005 Testing 2006
CART
133.2204
165.6246
3.30
4.49
CART CHAID
649.1558 405.3211
530.2855 352.1727
18.15 9.90
16.71 10.24
CHAID Exhaustive CHAID
577.5735 398.4697
440.2995 347.7473
16.20 9.71
13.70 10.12
Exhaustive CHAID Boosting tree regression
569.5217
434.9203
15.87
13.28
365.7726
334.6696
9.18
10.34
Boosting tree regression Random forest
540.3095
408.6700
15.18
12.88
3
360.6245
319.8897
9.05
10.14
4
Random forest MARSplines
561.6843 344.1872
407.1481 345.7752
15.99 8.75
13.38 10.77
MARSplines SVM
512.4989 439.9067
413.0148 353.3947
14.47 11.35
12.98 12.18
SVM MLP
648.5277 340.5511
431.8684 341.5077
18.56 8.65
14.67 10.57
MLP MLP ensemble
512.6047 338.7780
414.7041 340.8402
14.53 8.60
13.14 10.52
MLP ensemble k-NN
510.1599 299.6001
412.7694 311.2201
14.44 7.57
13.06 9.36
Training 2004– 2005 Testing 2006 Training 2004– 2005 Testing 2006 Training 2004– 2005 Testing 2006 Training 2004– 2005 Testing 2006 Training 2004– 2005 Testing 2006 Training 2004– 2005 Testing 2006
Std_APE (%)
Table 6 Four scenarios of parameter selection. Scenario
Selected parameters
Description
1
Temp_mean, Humidity_min
2
Temp_mean, Temp_std, Temp_max, Temp_min, Humidity_mean, Humidity_std, Humidity_max, Humidity_Humidity_min Temp_Temp_max, Temp_mean, Humidity_std, Humidity_min Temp_mean, Temp_max, Temp_min, Humidity_mean, Humidity_min
Based on selection procedure in Section 3.1 All eight temperature and humidity inputs
Based on correlation coefficient of Table 3 Based on the boosting tree algorithm
18% 16% 14% 12%
MAPE
Training 2004– 2005 Testing 2006
MAPE (%)
of nodes does not exceed 1000. For CHAID and exhaustive CHAID algorithms, the p value is set at 0.05 for splitting and merging. The boosting tree regression uses a learning rate of 0.1, and the maximum number of additive trees equals 200. To avoid overfitting, each consecutive tree is built using a subset of data, and the subset proportion is set to 0.5. In the random-forest algorithm, the maximum number of trees from the forest is set to 200, and the subset proportion is 0.5. For the MARSplines algorithm, the number of basis functions and the corresponding weighted coefficients are adjusted to minimize the least square error. The maximum number of basis functions is set as 21. The radial basis function (RBF) is used as the kernel function in the SVM algorithm, while the capacity factor is set to 10 to avoid overfitting. For the hidden and output neurons of the MLP neural network algorithm, five different activation functions are selected, namely, the logistic, identity, tanh, exponential, and sine functions. The number of hidden units is set between 5 and 18, and the weight decay for both the hidden and output layer varies from 0.0001 to 0.001. The MLP Ensemble involves five MLPs. In the k-NN algorithm, k is set to 5. Based on the training and testing error, the computational results reported in Table 5 show that the MLP Ensemble performs best on the MAE and MAPE metrics. Therefore, it is selected as the algorithm for building the steam load model in Sections 3.3 and 3.4. To demonstrate the importance of parameter selection, along with two variables selected in Section 3.1, three other scenarios choosing different input variables based on both the results of the coefficient matrix (Table 3) and the boosting tree (Table 4) are shown in Table 6. The training and test MAPEs of the models generated by the MLP Ensemble for the four scenarios in Table 6 are compared in Fig. 2.
10% 8% 6% 4% 2% 0%
Senario 1
Senario 2
Training MAPE k-NN
548.8362
423.5867
15.29
13.18
Senario 3
Senario 4
Testing MAPE
Fig. 2. MAPEs for the four parameter selection scenarios of Table 6.
929
A. Kusiak et al. / Applied Energy 87 (2010) 925–933
As shown in Fig. 2, though the two-parameter scenario produces a larger training error than any other scenario, its test result has the smallest mean absolute percentage error. It should be noted that the two-parameter scenario has the smallest gap between the training and testing error. A smaller number of inputs not only leads to a stable prediction accuracy but also tends to reduce the variance.
3.3.1. Data split scenario I Due to the seasonality effect, the data from 2004 to 2005 is divided into four data subsets shown in Table 7. Four different models are built from each data set respectively. For load estimation at future horizons, a model for each month is selected. 3.3.2. Data split scenario II As shown in Tables 3 and 4, the mean temperature value is a significant parameter in estimating steam load. Two dimensional plots, with the horizontal axis referring to the mean temperature and the vertical axis representing the steam consumption of 2004 and 2005, are shown in Figs. 3 and 4. As illustrated in Figs. 3 and 4, there is a rough threshold for the split of the mean temperature at 55 °F; the steam load and the mean temperature vary. Therefore, 2004 and 2005 data are divided into two subsets based on this split point, and models are built for each data subset. Based on the current practice, the split point 55 °F is used to select the most appropriate model.
3.3. Data split selection In this section, different scenarios of data split are investigated to improve the model’s accuracy. In Section 3.2, data from year 2004 to 2005 is used as the training data set. In predicting the steam load of a certain month, seasonal effects may highly influence the results. For example, predicting steam load during the winter is much different from predicting it in the summer. Since training data includes all types of seasonal data, much information is redundant or even irrelevant in predicting the steam load for a specific month. Three different scenarios are proposed and described below.
3.3.3. Data split scenario III Unlike scenario II, where the data was divided into two parts based on one split temperature, here the data is grouped into different temperature bins with equal intervals of 10 °F. The details are shown in Table 8. The previously selected MLP ensemble with five MLPs is used to develop the model with different data split scenarios. To test the model accuracy, 2 months’ worth of data with the best and the worst test MAPE results of 2006 are selected. Fig. 5 shows the test MAPE results for each month in 2006 obtained from the best model built in Section 3.2 using 2004 and 2005 data. The results in Fig. 5 illustrate that May and December of 2006 have the largest and smallest MAPE, respectively. Therefore, these 2 months have been used for testing the model developed by data split scenarios of Table 8. The MAE, Std_MAE, MAPE, and Std_MAPE for the test data sets are shown in Table 9, where two different baselines are included. Baseline I refers to using the 2004 load data directly as the estimation of the load at the same day in 2006 (e.g., using steam load on May 1st, 2004 to estimate load on May 1st, 2006). Baseline II refers to using the 2005 load data directly as the estimation of the load at the same day of 2006. As shown in Table 9, data split scenario I has the best overall performance of all the scenarios. Therefore, the seasonal split data is used to develop the steam load model. Note that though data split scenario I uses significantly fewer data points for training the model and predicting the future load, it has a much better accuracy in May 2006 and an acceptable accuracy in December 2006. Less redundant data demonstrate that the model accuracy
Table 7 Data split scenario I. Data set
Description
1 2 3 4
January–March of 2004 and 2005 April–June of 2004 and 2005 July–September of 2004 and 2005 October–December of 2004 and 2005
Steam load (Klbs)
8000 7000 6000 5000 4000 3000 2000 0 -10
10
20
30
40
50
60
70
80
May
June
90
Temperature (ºF) January July
August
Febuary September
March
April
October
November
December
Fig. 3. Relationship between the mean temperature and steam consumption in 2004.
Steam load (Klbs)
8000 7000 6000 5000 4000 3000 2000 0 -10
10
20
30
40
50
60
70
80
90
Temperature (ºF)
Fig. 4. Relationship between the mean temperature and steam consumption in 2005.
930
A. Kusiak et al. / Applied Energy 87 (2010) 925–933
Table 8 Data split scenario III.
Table 9 Test results with different data split scenarios.
Date set
Mean temperature bin (°F)
No. of observations
1 2 3 4 5 6 7 8
Less than 10 10–20 20–30 30–40 40–50 50–60 60–70 70–80
26 28 62 120 103 105 151 127
Data split scenario
Test data set
I
Test May 2006 Test December 2006
279.5387
173.6689
7.74
4.83
234.0529
223.0083
4.90
4.46
Test May 2006 Test December 2006
874.2131
594.0196
29.02
20.38
193.2253
164.5481
4.00
3.07
Test May 2006 Test December 2006
792.9835
519.6384
26.63
17.93
266.8676
213.3769
5.71
4.61
Test May 2006 Test December 2006
1217.8438
820.3141
26.28
15.67
1276.3474
729.7703
25.70
14.86
Test May 2006 Test December 2006
647.1931
354.5781
17.51
8.82
1043.6175
717.7555
18.08
11.59
808.0371
469.6589
27.20
16.59
186.3490
172.1987
3.81
3.22
II
III
MAPE
Baseline I
Baseline II
Jan. Feb. Mar. Apr. May June July Aug. Sep. Oct. Nov. Dec. All 2004, 2005 data
Fig. 5. Test MAPE for each month of 2006.
could be improved. Figs. 6 and 7 show the test results of scenario I for 2 months. 4. Test results and sensitivity analysis An independent data set of 2007 was used as the test set to evaluate the steam load prediction model. Based the selected parameters, algorithm, and data split, both the training and validation data of Table 1 are used as the training set. The MLP ensemble with five MLPs is the mapping algorithm, while data split scenario I is used to divide the training set into several subsets. Four models are built to predict steam consumed at different seasons. Table 10 contains the details of the four models. Model 1 is used for predicting steam consumed from January to March 2007. Model 2 is used for predicting steam consumed from April to June 2007. Model 3 is used for predicting steam consumed
Test May 2006 Test December 2006
Std of MAE
MAPE (%)
Std of MAPE (%)
from July to September 2007. Model 4 is used for predicting steam consumed from October to December 2007. The prediction statistics for each month in 2007 are shown in the Table 11. As shown in Table 11, the accuracy of the load prediction during the winter is better than in the summer. The observed and predicted loads from the four models are shown in Fig. 8 through Fig. 11. As demonstrated in Fig. 8, the predicted and observed load from January to March match each other well. In Fig. 11, the results are almost the same except November. It is due to the fact that during the heating season, the heating load is roughly equal to the steam consumed and mapping between the weather pattern and the steam load could be clearly established. The prediction error is rel-
4500 4300 4100
Steam load (Klbs)
MAE
3900 3700 3500 3300 3100 2900 2700 2500
Time (day) Observed Predicted Fig. 6. Test results of scenario I for May 2006.
931
A. Kusiak et al. / Applied Energy 87 (2010) 925–933
7000 6500
Steam load (Klbs)
6000 5500 5000 4500 4000 3500 3000 2500
Time (day) Observed Predicted
Table 10 Descriptions of the four MLP ensemble models. Model
NN structure
Activation function at hidden layer
Activation function at output layer
1
MLP 2-8-1 MLP 2-7-1 (3) MLP 2-9-1
Logistic function Hyperbolic tangent function Logistic function
Sine function Sine function
2
MLP 2-7-1 (3) MLP 2-8-1 (2)
Hyperbolic tangent function Logistic function
Sine function Identity function
MLP MLP MLP MLP MLP
Exponential Exponential Exponential Exponential Exponential
Logistic function Identity function Sine function Identity function Logistic function
4
2-4-1 2-4-1 2-6-1 2-6-1 2-8-1
function function function function function
MLP 2-7-1 (3) MLP 2-4-1
Hyperbolic tangent function Logistic function
MLP 2-8-1
Hyperbolic tangent function
Table 11 Test results for each month of 2007.
11
21
31
41
51
61
71
81
Fig. 8. Test results of January, February, and March of 2007.
Sine function Hyperbolic tangent function Sine function
9000 8000 7000 6000 5000 4000 3000 2000 1000 0
1
6000
Steam load (Klbs)
3
Sine function
Steam load (Klbs)
Fig. 7. Test results of scenario I for December 2006.
5000 4000 3000 2000 1000 0
MAE
Std of MAE
MAPE (%)
Std of MAPE (%)
January February March April May June July August September October November December
229.6125 470.6864 376.4847 729.2874 591.0668 640.7030 692.3318 717.4539 1030.7757 297.9699 1319.8354 356.0059
127.6796 218.3952 202.0408 547.4329 497.8888 417.8491 104.5319 144.6213 439.0330 253.0848 562.1025 111.7109
4.07 7.39 10.71 17.10 17.64 16.19 17.82 18.61 31.74 10.31 52.62 6.15
2.08 2.61 8.11 10.23 16.10 10.48 3.21 4.55 17.99 11.81 26.18 1.79
atively constant in November; however, the trend remains the same. In Figs. 9 and 10, as the outside temperature increases, the prediction error becomes larger. Note that observed values are less than the predicted ones for most of the time. One possible explanation is that during the cooling season, only some of chillers are run by steam. Electricity-driven chillers also account for a major proportion of the load. The predicted load is based on the assumption
1
11
21
31
41
51
61
71
81
Fig. 9. Test results of April, May, and June of 2007.
Steam load (Klbs)
Month
5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0
1
11
21
31
41
51
61
71
81
Fig. 10. Test results of July, August, and September of 2007.
91
932
A. Kusiak et al. / Applied Energy 87 (2010) 925–933 Table 12 Selected instance for Monte Carlo simulation.
Steam load (Klbs)
8000 7000 6000
Date
Actual load
Temp_mean
Humidity_min
Predicted load
5000
1/1/2007
4802.6609
32.1764
56.3725
4698.9778
4000 3000 2000 1000 0 1
11
21
31
41
51
61
71
81
91
Fig. 11. Test results of October, November, and December of 2007.
that the steam consumption equals approximately the total cooling load. In fact, the observed steam load is less than the total cooling load as it only partially represents the cooling load. The latter explains why the observed load is smaller than the predicted load. The load data does not contain the electricity consumed by the chillers during this period. To estimate the steam load during the cooling season, the electric chillers data should be available. As indicated in Section 2, the model inputs for the predicted load are the weather forecast data. Since the forecasted weather data is not highly accurate, each input can be considered as a random variable. In that case, the deterministic model is transformed
into a stochastic one. The input variables may affect the probability distribution of the outcome. Therefore, uncertainty propagation needs to be investigated in this model. Due to the complexity of the NN ensemble model, Monte Carlo simulation [35,36] is applied for analyzing uncertainty propagation. Data of 1/1/2007 has been used in the Monte Carlo simulation. Table 12 describes the data. By incorporating Gaussian noise for each input variable, the mean is set as 0 while the standard deviation is set as 1. The total 1000 simulated points for each input variable have been generated, and their distribution plots are described in Figs. 12 and 13. Based on Model 1 described in Table 10, the steam load output can be calculated, and its distribution is presented in Fig. 14. Note that short bar in the middle of Fig. 14 indicates the position of initial predicted load. As shown in Fig. 14, the predicted load distribution can be considered as the approximate normal distribution. The mean of the load distribution is 4702, while the initial predicted load is 4698. It has been demonstrated that if the confidence of the forecasted weather data is relatively high, though noise and uncertainty exist,
Frequency
Temp_mean
Temp_mean interval Fig. 12. Distribution of the simulated Temp_mean.
Frequency
Humidity_min
Humidity_min interval Fig. 13. Distribution of the simulated Humidity_mean.
933
Frequency
A. Kusiak et al. / Applied Energy 87 (2010) 925–933
Predicted load interval Fig. 14. Distribution of the predicted load.
the predicted load has a higher confidence, since the mean of the load distribution is close to it. Small uncertainty does not significantly affect the quality of the predicted results.
5. Conclusion In this paper, a data-driven approach for steam load prediction was presented. A correlation coefficient matrix and the boosting tree algorithm were used for parameter reduction. Performance of 10 different data-mining algorithms was studied, and the ensemble algorithm with five MLPs was selected as the best mapping algorithm. Different training data split scenarios were also investigated. A steam load prediction model was developed using data from 3 years. Test results for the follow-up year were discussed. The current steam prediction model is limited to the heating season. The lower accuracy in the cooling season was due to lack of data on the electric chillers supplementing the chill load. Once the data on system operations, occupancy, and chiller running schedules become available, they will improve prediction accuracy.
Acknowledgement This research has been supported by the Iowa Energy Center, Grant No. 08-01.
References [1] Bida M, Kreider JF. Monthly-averaged cooling load calculations-residential and small commercial buildings. ASME Trans: J Sol Energy Eng 1987;109(4):311–20. [2] Christiaanse WR. Short-term load forecasting using exponential smoothing. IEEE Trans Power Ap Syst 1971;PAS-90:900–10. [3] Thompson RP. Weather sensitive demand and energy analysis on a large geographically diverse power system: application to short-term hourly electric demand forecasting. IEEE Trans Power Ap Syst 1976;PAS-95:384–93. [4] Frissari GD, Widergren SE, Yehsakul PD. On-line load forecasting for energy control center application. IEEE Trans Power Ap Syst 1982;PAS-101:71–8. [5] Toyoda J, Chen MS. An application of state estimation to short-term load forecasting, Parts 1 and 2. IEEE Trans Power Ap Syst 1970;PAS-89:1678–1688,. [6] Kimbara A, Kurosu S, Endo R. On-line prediction for load profile of an airconditioning system. ASHRAE Trans 1995;101(2):198–207. [7] Kawashima M, Dorgan CE, Mitchell JW. Hourly thermal load prediction for the next 24 h by ARIMA, EWMA, LR, and an artificial neural network (Part 1). ASHRAE Trans 1995;101:186–200. [8] Park DC, E1-Sharkawi MA, Marks II RJ. Electric load forecasting using an artificial neural network. IEEE Trans Power Syst 1991;6(2):442–9. [9] Islam SM, Al-Alawi SM, Ellithy KA. Forecasting monthly electric load and energy for a fast growing utility using an artificial neural network. Electr Power Syst Res 1995;34(1):1–9.
[10] Kawashima M, Dorgan CE, Mitchell JW. Optimizing system control with load prediction by neural networks for an ice-storage system. ASHRAE Trans 1996;102(1):169–1178. [11] Gonza´lez PA, Zamarreno JM. Prediction of hourly energy consumption in buildings based on a feedback artificial neural network. Energy Build 2005;37(6):585–601. [12] Li Q, Meng Q, Cai J, Yoshino H, Mochida A. Applying support vector machine to predict hourly cooling load in the building. Appl Energy 2009;86(10):2249–56. [13] Zhai H, Dai YJ, Wu JY, Wang RZ. Energy and energy analyses on a novel hybrid solar heating, cooling and power generation system for remote areas. Appl Energy 2009;86(9):1395–404. [14] Difs K, Danestig M, Trygg L. Increased use of district heating in industrial processes – impacts on heat load duration. Appl Energy 2009;86(11):2327–34. [15] Yildiz A, Güngör A. Energy and energy analyses of space heating in buildings. Appl Energy 2009;86(10):1939–48. [16] Desideri U, Proietti S, Sdringola P. Solar-powered cooling systems: technical and economic analysis on industrial refrigeration and air-conditioning applications. Appl Energy 2009;86(9):1376–86. [17] Ruan Y, Liu Q, Zhou W, Firestone R, Gao W, Watanabe T. Optimal option of distributed generation technologies for various commercial buildings. Appl Energy 2009;86(9):1641–53. [18] Elkateb MM, Solaiman K, Al-Turki Y. A comparative study of medium-weatherdependent load forecasting using enhanced artificial/fuzzy neural network and statistical techniques. Neurocomputing 1998;23(1–3):3–13. [19] Hou Z, Lian Z, Yao Y, Yuan X. Cooling load prediction based on the combination of rough set theory and support vector machine. HVAC&R Res 2006;12(2):337–52. [20] Li Q, Meng Q, Cai J, Yoshino H, Mochida A. Predicting hourly cooling load in the building: a comparison of support vector machine and different artificial neural network. Energy Conver Manage 2009;50(1):90–6. [21] Yang J, Rivard H, Zmeureanu R. On-line building energy prediction using adaptive artificial neural network. Energy Build 2005;37(12):1250–9. [22] Wang J. Data mining: opportunities and challenges. Hershey, PA: IGI Global; 2003. [23] Rodgers JL, Nicewander WA. Thirteen ways to look at the correlation coefficient. Am Stat 1988;42(1):59–66. [24] Tan PN, Steinbach M, Kumar V. Introduction to data mining. New York: Addison Wesley; 2005. [25] Friedman J. Stochastic gradient boosting. Stanford University, Statistics Department; 1999. [26] Hastie T, Tibshirani R, Firedman JH. The elements of statistical learning. New York: Springer; 2001. [27] Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Monterey, CA: Wadsworth International; 1984. [28] Kass GV. An exploratory technique for investigating large quantities of categorical data. Appl Stat 1980;29(2):119–27. [29] Bigss D, Ville B, Suen E. A method of choosing multiway partitions for classification and decision trees. J Appl Stat 1991;18(1):49–62. [30] Friedman JH. Multivariate adaptive regression spline. Ann Stat 1991;19(1):1–67. [31] Breiman L. Random forests. Mach Learn 2001;45(1):5–32. [32] Hertz JA, Krogh A, Palmer RG. Introduction to the theory of neural computation. Boulder, CO: Westview Press; 1999. [33] Haykin S. Neural networks: a comprehensive foundation. Englewood Cliffs, NJ: Prentice Hall; 1998. [34] Casella G, Berger R. Statistical inference. 2nd ed. Pacific Grove, CA: Duxbury Press; 1990. [35] Metropolis N, Ulam S. The Monte Carlo method. J Am Stat Assoc 1949;44(247):335–41. [36] Goodman L. On the exact variance of products. J Am Stat Assoc 1960;55(292):708–13.