Exploiting Generalized Additive Models for Diagnosing Abnormal. Energy Use in Buildings. Joern Ploennigs, Bei Chen, Anika Schumann, Niall Brady. Smarter ...
Exploiting Generalized Additive Models for Diagnosing Abnormal Energy Use in Buildings Joern Ploennigs, Bei Chen, Anika Schumann, Niall Brady Smarter Cities Technology Center IBM Research
{Joern.Ploennigs,BeiChen2,Anika.Schumann, Niall.Brady}@ie.ibm.com Abstract
1
Buildings consume 40 % of the energy in industrialized countries. Thus detecting and diagnosing anomalies in the building’s energy use is an important problem. The existing approaches either retrieve limited information about the anomaly causes, or are difficult to adapt to different buildings. This paper presents an easily adaptable diagnosis approach that exploits the building’s hierarchy of submeters, i. e. information on how much energy is used by the different building equipments. It computes novel diagnosis results consisting of two parts: (i) the extent to which building equipments cause abnormal energy use, and (ii) the extent to which internal and external factors determine the energy use of building equipments. Computing such diagnosis results requires an approach that can predict the energy use for the different submeters and that can also determine the factors that influence the energy use. However, existing building approaches do not meet these requirements. As a remedy, we propose a novel approach using the generalized additive model (GAM), which incorporates various exogenous variables affecting building energy use, such as weather conditions and time of the day. Our experiments demonstrate that the proposed method can efficiently model the impact of different factors and diagnose the causes of anomalies.
The world energy outlook 2012 stated that the highest energy efficiency potential lies in buildings1 and is mainly untapped [8]. Increased energy efficiency can be realized in many ways like weatherization actions and retrofits, optimization of the building control strategy, or the timely detection and diagnosis of abnormal energy use. In this paper we address the latter problem. In contrast to existing approaches we do not only seek to compute the causes of any abnormal energy consumption but also the influencing factors that govern the energy use. This supports the manual assessment of whether equipment with abnormally high energy use is indeed malfunctioning or whether the anomaly was caused by factors governing the operation of that equipment. Our approach thus requires a method that can identify the factors that determine the energy use and provide short-term predictions. Existing methods for modeling and predicting energy use include domain specific statistical models [7, 12], Fourier series [4], cluster analysis [13], a combination of both [2, 3], artificial neural networks (ANN) [6, 14] or support vector machines (SVM) [11, 20]. Unfortunately the energy models computed by these approaches are not easily interpretable by domain experts and can thus not be used for extracting the factors that influence the energy use. In fact, only a subset of the above mentioned papers discusses the anomaly detection of the building’s energy consumption. Liu et al. [12] and Bellala et al. [2] used their prediction models to identify abnormal days. Hao et al. delegated the anomaly detection to the user in a visual analytics approach [7]. Outlier detection approaches were used in [9,16] and a manual investigation was performed in [1]. A classification of faults based on the signal spectrum of meter data was done in [17]. None of these approaches used the capabilities of the prediction model to diagnose the anomalies. On the other hand, existing approaches for diagnosing anomalies require substantial building specific knowledge and are thus difficult to adapt to different buildings [10]. This paper presents a novel diagnosis method based on the hierarchy of the building’s submeters and on generalized additive models (GAM). The latter are not only capable of modeling energy use but also capable of identifying the factors that govern the energy consumption. Furthermore the residuals of the GAMs have the property that they are seri-
Categories and Subject Descriptors I.5.5 [Hardware]: Power and energy—Power estimation and optimization; I.3.4 [Information systems]: Applications—Decision support systems, Data analytics; G.3.7 [Probability and Statistics]: Nonparametric statistics; A.2.6 [Artificial intelligence]: Knowledge representation and reasoning—Causal reasoning and diagnostics
Keywords Prediction, Fault Detection & Diagnosis, GAM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Buildsys’13, November 13 - 14 2013, Roma, Italy. c 2013 ACM 978-1-4503-2431-1/13/11 ...$10.00 Copyright
Introduction
1 Considered
were industry, transport, and power generation.
2
Illustrating the Applicability of GAMs
4am
12pm
8pm
Weekend
250
350
400 300 200
Total Electricity Consumption (KW)
CM ; low high if Pd (high) − Pd (low) >CM ; ad = abnormal if Pd (high) + Pd (low) >CN ∧ (10) Pd (low) − Pd (high) ≤CM ∧ Pd (high) − Pd (low) ≤CM , where CM and CN are constants with CM ≥ CN that are set to 0.5 and 0.3 in our experiments. Hence, a day is classified as abnormal if more than 30 % of the day’s observations are either high or low. A abnormal day is further categorized as high if the absolute majority of observations is high, i. e. the percentage of high observations is still larger than 50 % after the percentage of low observation has been subtracted.
3.3
Diagnosis
The previous subsection gives us the tools to determine which meters are abnormal and whether these abnormalities persist only for individual observations or whether the sensor readings of the current day can be classified as abnormal. This is already very valuable information for a facility manager. To assist him even more we now present an approach for automatically computing the causes of such abnormalities. Specifically we show how to retrieve for each abnormal meter all submeters that explain the abnormal observation. These submeters might be abnormal themselves or their readings might just slightly deviate from their prediction. For example, the main meter might have a high anomaly with a meter reading that is 10 kW higher than predicted. A submeter whose electricity consume is 5 kW higher than predicted helps to explain the anomaly of the main meter regardless of whether a 5 kW deviation of that submeter is considered normal or not, i. e. regardless of whether its meter reading is still between its upper and lower bound or not. For a facility manager two types of information are relevant: the extent to which submeter readings explain an anomaly of a meter, and (ii) the anomaly status of these submeters. Further actions might be required only if that submeter is abnormal. Information of type (ii) can already be retrieved following the procedure of the previous subsection. We now show how we diagnose abnormal meters, i. e. how we compute to what extent which submeters explain the anomaly of a meter. First we present an approach for diagnosing abnormal observations at time t. Based on that we then show how we explain the anomaly of days. Our approach makes use of the meter hierarchy, i. e. knowledge about which submeter observations contribute to the observations of a meter m. The meter value yt (m) is the aggregation of the values of it submeters Sm and an additional unobserved energy consumption yt (um ), which is not captured by any submeter. yt (m) =
∑ yt (s) + yt (um ).
(11)
s∈Sm
For the diagnosis, we assume the unobseved submeter um contains an part that is the correction of the observable submeters εt (um ) = εt (m) − ∑s∈Sm εt (s). This allows us to also diagnose the unobservable submeter.
To determine which submeters explain the anomaly of a meter we make use of our GAM models and the deviations εt between observed and predicted meter values (see eq. (1)). Clearly, a submeter s can only help to explain the anomaly of a meter m, if its deviation from the predicted value has the same signature. For example, an abnormally high observation of m can be explained only by submeters whose observations are higher than predicted. The set of submeters Sd (m) that explain the anomaly of m is thus given as Sd (m) = {s ∈ Sm ∪ um | εt (m) · εt (s) > 0}.
(12)
The extent et (s, m) to which a submeter explains the anomaly of m is given by the proportion of its deviation with respect to the deviations of all submeters in Sd (m) if s ∈ / Sd (m); 0 εt (s) et (s, m) = (13) otherwise . ∑s∈Sd (m) εt (s) Having shown how to retrieve the extent to which submeters explain the anomaly at time t we now describe how they contribute to the anomaly status of a day. Let T = {t1 , . . . ,tk } denote the set of times where meter m showed abnormal values, i. e. observations classified as low or high in (10). The extent ed (s, m) to which a submeter explains the anomaly status of a day is then given as its average contribution to the anomaly during these times, i. e. ed (s, m) =
1 ∑ et (s, m). k t∈T
(14)
For each submeter that contributes to the day’s anomaly we also extract the influencing factors from its GAM model and return them to facility manager.
4
Validation
In this section we validate our method using real life applications, including the IBM living lab in Dublin and an office Building B5. The first example shows an analysis on a large number of submeters, and the second one covers a long time span over several years. Before applying the GAM algorithm, we first conduct a sensitivity study to identify a set of exogenous variables which have significant impact on modeling (see subsection 4.1.2). At this stage, there are two questions we aim to tackle: (i) how regular the energy consumptions of the building appliances are; (ii) how well they can be described by GAM (sec. 4.1.1). After this initial validation of applying GAM, we demonstrate our anomaly detection approach in subsection 4.1.3 and its subsequent diagnosis in subsection 4.1.4.
4.1
IBM Living Lab Dublin
The first case study we conducted is on the IBM living lab in Dublin. This building is about 15 years old and actively being used by 200 people. It was retrofitted 2 years ago, one third of which is used by IBM research as a living laboratory. There are 2,500 sensors installed within this 3,300 sq meter building to provide real-time data on heating and cooling systems, lighting, water and electricity meters, footfall and motion. It contains 354 electric meters down to individual electrical circuits.
For our analysis purpose, we aggregated the original 15 min meter readings to one hour in order to reduce measurement noise. We analyse one year of data. Missing values in the data are ignored and not interpolated, since one of the properties of GAM is able to be trained with missing data.
4.1.1
Applicability of GAM
As a first step, we investigate the applicability of our approach to different meter types. Given the availability of data, we examined a set of 179 electricity meters, which contains the main meter M1 and its direct submeters T1 and T2, 20 HVAC (i. e., Chiller, air handling units, fan coil units) appliances, 76 service meters (including aggregated socket loads) and 80 meters measuring light channels. For each meter in this set, we fit a GAM with variables xtDayType , xtTimeOfDay , xtTimeOfYear , xtTemperature and xtSolarRadiation . For our testing step, we use all available data for both training and verification. For each GAM, we evaluate the model quality using the proportion of the deviance explained (DE) [19]. Note that DE is a measure ranges from 0 to 100 %, and is 100 % if the model is an exact fit of the data. Table 1 lists the GAM quality (in terms of DE) for different meter classes. The highest GAM quality of 81 % on average is reached for HVAC equipment. These appliances are operated on a demand-driven base during business hours. Therefore, they have strong dependency on the time and temperature. The main meters reach a quality of 72 %. This is due to their aggregated nature and the superposition of the individual effects in the submeters. The light and service meters still have quite a high model quality with a median of 69 % and 64 %. They are more dependent on nondeterministic user behaviour. Table 1. GAM quality (%) for different meter types meter class Main HVAC LGHT SRVC
4.1.2
count 3 20 80 76
min 58.5 10.9 20.2 2.5
mean 72.5 81.0 68.8 64.2
median 75.3 87.0 75.7 70.0
max 83.7 92.2 91.3 95.1
sd 12.8 18.4 16.5 21.3
Significance of exogenous variables
In the second step, we investigated the significance of the different temporal and exogenous variables. Here we excluded 31 meters with GAM quality below 50 % from the initial meter set. To capture temporal dependencies due to, e. g., time scheduled appliances and business day usage, we considered temporal variables xtDayType , xtTimeofDay and xtTimeofYear . Table 2 (upper part) shows the GAM qualities with different combinations of variables. The impact of xtDayType and xtTimeofYear are less significant (7.4 % and 11.7 %) compared to that of xtTimeofDay (53 %), since the building energy consumption is strongly determined by daily schedule. As a result, the combinations with xtTimeofDay have relatively high GAM quality, among which the best occurs when using all three temporal variables (71.5 %). In addition, we consider exogenous variables including xtIllumination , xtSolarRadiation , xtWindspeed , xtTemperature and xtWeatherCondition 2 . As an 2 general
weather condition including such categories “clear”, “rain” and “mostly cloudy” collected from http://www.wundergound.com
intuition, xtIllumination and xtSolarRadiation 3 have impact on both heating and lighting systems within the building; xtWeatherCondition alters the effectiveness of solar radiation; xtWindspeed influences the building thermal behavior, e. g., high wind speed improves the heat transfer on the building surface; xtTemperature and xtWeatherCondition impact the heating and cooling load by thermal heat transfer via the facade. Note that the building occupancy and user behavior also largely determine the building energy consumption. Due to the lack of relevant measurements, we do not consider them in this study. Let us systematically examine the impact of exogenous variables via computing the GAM quality using as base xtTimeofDay , xtDayType , xtTimeofYear . Table 2 (lower part) lists the GAM qualities of the different combinations of exogenous variables: xtWeatherCondition and xtWindspeed by themselves have negligible effects. Considering xtTemperature helps to increase the GAM quality to 73.0 %. Surprisingly, xtSolarRadiance leads to a GAM with better quality (74.9 %) than xtIllumination (74.2 %), even if it only considers clear sky. We suppose that it is related to the higher correlation of the radiance with the outside temperature than the illumination. The best GAM quality, 76.5 %, is reached by using the combination of xtTemperature , xtSolarRadiance and xtWeatherCondition , but, has a high computation time of 104 seconds. Table 2. GAM quality for different variables (xtW.Con is the abbreviation of xtWeatherCondition ).
xtTimeofDay xtDayType xtTimeofYear xtTimeofDay , xtDayType xtTimeofDay , xtTimeofYear xtTimeofYear , xtDayType xtTimeofDay , xtDayType , xtTimeofYear xtW.Con xtWeedSpeed xtTemperature xtIllumination xtSolarRadiance xtSolarRadiance , xtW.Con xtTemperature , xtIllumination xtTemperature , xtSolarRadiance xtTemperature , xtSolarRadiance , xtW.Con
4.1.3
GAM Quality (%) min mean median max 0.7 52.8 56.2 87.8 0.1 7.4 3.3 36.3 0.0 11.7 3.3 93.7 1.4 59.3 65.4 89.4 35.0 65.2 64.8 94.3 0.1 18.6 10.4 94.4 41.0 71.5 72.8 95.0 42.3 71.6 73.1 95.0 41.2 71.8 73.4 95.0 42.7 73.0 74.3 95.1 50.6 74.2 75.4 95.1 51.5 74.9 76.2 95.0 52.7 75.9 77.2 95.1 51.7 74.7 75.8 95.1 51.9 75.4 77.0 95.1 54.6 76.5 77.9 95.2
sd 52.8 7.4 11.7 59.3 65.2 18.6 71.5 71.6 71.8 73.0 74.2 74.9 75.9 74.7 75.4 76.5
Comput. time (s) 1.0 0.1 0.2 1.1 1.9 0.2 2.0 2.3 4.7 4.8 4.7 4.7 83.0 7.2 7.2 104.0
Anomaly Detection
For validating of the anomaly detection procedure, we use GAM which takes into account (xtTimeofDay , xtDaytype , xtTimeofYear , xtTemperature and xtSolarRadiation ). This model has overall good quality and computational performance. We begin with training GAM using 4-months of historical data, based on which the 95 % prediction bands (PBs) of the energy consumption are constructed for the following day. Then we classify the status of the following day (normal/abnormal) using the PBs. If the day is classified as normal, we add it to the training data. We exclude data which is older than 13 months from the training stage to avoid overfitting and to capture seasonal cycles (see sec. 4.2). 3 The
solar radiation is the power of sun per unit area with a clear sky (which is uncommon in Ireland) computed by the solaR package in R [15].
TP 271 67 59 4 0
TN 63 271 283 372 382
FP 7 43 36 6 0
FN 42 7 5 1 1
400
400
Value 200
300
300 Value 200 100
100
5
10
15
20
15
20
15
20
Hour
150
150
200
200
(a) Saturday, 18.08.2012
Value 100
Alg 278 105 95 10 0
20
50
Expert 309 74 65 5 4
15 Hour
0
Class Normal Abnormal - High - Low - Undefined
10
Value 100
Table 3. Classification results for building B3. (TP - True positive; TN - True negative; FP - False positive; FN - False negative)
5
50
The building operator analysed the energy consumption of the main meter M1 and classified days as normal, high, and low using time series plots. We regard this list as ground truth and compare it with the result of our GAM approach. Table 3 shows the classification results of the anomaly detection. The operator identified 74 abnormal days in the data set of 388 days, and classified 65 of them as higher, five as lower, and four as abnormal. The algorithm correctly classifies 67 days (91 % of 74) as abnormal and 59 (91 % of 65) of them correctly as higher and four as lower (80 % of 5). The approach is more sensitive than the expert and classifies 43 days (14 % of 309) more as abnormal. Overall, the results are good. For the anomaly detection the good sensitivity of 91 % is important and the rate of false alarms is acceptable.
5
10
15
20
5
10
Hour
Hour
(b) Monday, 18.03.2013, St. Patricks Day
250
300
200
250
100
Value 150
Value 150 200
50
100
0
50
For validating the diagnosis approach, we investigate three special occasions that occurred in the data to check if the diagnosis can correctly identify the cause of the anomaly. The first anomaly we investigate occured on Saturday, August 18th 2012 and is related to the renovation of a building extension. The renovation work resulted in an energy increase in the mainmeter M1 from 130 kW to 410 kW, while the submeter T 1 increases from 70 kW to 190 kW and T 2 from 60 kW to 220 kW. Fig. 4(a) shows the energy consumption of the day of the different meters. We also depict the predictions as well as the upper and lower limits. Abnormal observations in the main meter are highlighted by circles and by a tilde in the submeter. The correct diagnosis result for the main meter M1 should be that both submeters T 1 and T 2 cause the anomaly. The increases were correctly detected by our anomaly detection approach. The diagnosis correctly identifies T 1 and T 2 as causes and assignes 48 % to T 1 and 52 % to T 2 and nothing to um . The second anomaly we want to study is a high energy consumption on public holidays. If the public holiday coincides with a business day, the building management system should operate the building similar to weekends. Hence, the energy consumption should reflect this pattern by being as low as it normally is on weekends (compare Fig. 1). Fig. 4(b) displays the energy consumption of the main meter M1 in comparison to its submeter T 1 and T 2. The energy consumption of the main meter with circa 180 kW is at noon about 90 kW higher than on a Sunday, but, also lower than the 330 kW of a normal bussiness day in March. This is explainable from two aspects: (i) the socket loads (SRVC) are low, as no people occupy the building, but, (ii) the HVAC system still operate as on a normal business day. This should be reflected in the diagnosis, by identifying the HVAC related systems as the anomaly cause, and not the SRVC sec-
300
Diagnosis
0
4.1.4
5
10
15
20
5
10
Hour
Hour
(c) Monday, 03.06.2013 main value main anomaly main PB
sub value sub anomaly sub PB
aligned anomaly
Figure 4. Energy consumption of M1, T1 (left) and M1, T2 (right) for different diagnosis cases.
tion. Fig. 5 shows the diagnosis result for March 18th 2013, which is St. Patrick. The diagnosis correctly identifies the chiller, and the hot water (LPHW), and cold water loop (CHW). Though, a large element of the error remains unexplained (T1.Unknown, T2.Unknown), which is related to the aspect that the new building extension is not yet covered by submeters. In the last month the energy consumption in the building has increased significantly and is oscillating heavily by about 100 kW during bussiness hours as shown in Fig. 4(c). The source of the oscillation was first unknown to the operator. Our diagnosis approach identified the chiller and fan coil units (FCU) as the cause of the anomaly. The operator could then explain the anomaly by interpreting the transfer function in the GAM and identifying the for Ireland uncommon high temperature of more than 20◦ C, which results in the
(a) Full 75 months
Figure 5. Diagnosis for St. Patricks Day 2013 with submeters. (b) Windowed 4 months
cooling systems to be activated several times over the day. The anomaly was detected as the GAM did not reflect this new situation. The historical data contained until this date only small periods with a high outside temperature during which the building did not heat up and the cooling system was not activated. Therefore, the GAM did not model the dependence of the cooling system consumption on the outside temperature. The GAM adapts over time to the new situation.
4.2
(c) Windowed 13 months
Building B5
As another use case the data from a regular office building B5 was evaluated. The 3000 square meter building is 7 years old and was designed as low energy building; it features solar panels, geothermal heat pumps, and heat recovery systems. We have 6.3 years of data for four meters of the building. Due to the large size of the dataset we have no labelled data set from an operator we can compare to. The limited number of meters also does not allow a validation of the diagnosis. Instead, we want to investigate for this building the long term behavior of the approach. We investigate an appropriate window size for the GAM models. In the first appraoch (Full) we train a GAM on the whole data set of 75 months and then apply our anomaly detection approach to classify abnormal days. In the second version, we train and evaluate the GAM for a sliding window with window sizes of 4 and 13 months, respectively. In the last approach, we use the same window sizes, but use a twopass approach that autonomously removes days classified as abnormal and removes them from the training data. Table 4. Classification results for building B5. Class GAM quality Days Normal Abnormal
Full 75m 64.0 % 1953 70.7 % 29.3 %
Windowed 4m 12m 75.9 % 73.4 % 1868 1868 59.9 % 63.7 % 40.1 % 36.3 %
Autoclean 4m 12m 81.9 % 78.6 % 1868 1868 63.8 % 66.0 % 36.2 % 34.0 %
Table 4 compares the different approaches and Fig. 6 shows exemplarily the calendar plot of the anomaly count per day in 2007 and 2011. The GAM quality is better for the windowed approaches. As GAM models fit the transfer functions to the mean behavior, they can better adapt to the data variance in small datasets than in large ones. The autocleaning approach further improves the model quality by the automatic removal of abnormal days from the training set.
(d) Autoclean 4 months
(e) Autoclean 13 months no nor− data mal 3
6
9
12 15 18 21 24
Figure 6. Calendar plot for building B5 of year 2007 and 2011 (white: missing data; dark blue: normal; skybluered: 3-24 number of anomalies). The strong adaptation of the full GAM model to the mean behaviour should result in the classification of more days as abnormal. But, Table 4 shows actually the lowest anomality rate. This is related to an effect visible in Fig. 6(a). Most of the abnormal days are in 2007, while much less abnormal days are found in 2011. This is related to the ARMA limits, which are tight in 2007 and then widen over the years to adapt to the high variance of the prediction error. Therefore, the approach is losing its ability to detect anomalities over the years. The windowed and the auto-clean approaches use the first four months of data as training set. Therefore, Fig. 6(b) to 6(e) start later than Figure 6(a). Using a short sliding window of four months results in a better GAM quality as discussed above and shown in Table 4. On the other hand, the high adaptation of the GAM to the input data results in a higher sensibility and detection rate of anomalies. The comparison in Fig. 6(b) and Fig. 6(c) reveals that the approaches also detect anomalies in different regions. For example, June to August contains anomalies that were classified by an operator as normal afterwards. Fig. 6(b) contains some abnormal
days due to overfitting of the four months window. Fig. 6(c) shows less anomalies as the model is less sensitive to overfitting. This is confirmed by comparing the short window size in Fig. 6(d) to the 13 months window in Fig. 6(e). The auto-cleaning does improve the GAM model quality as abnormal days are automatically removed from the training set. This results in a lower anomaly rate in Table 4 on the one hand. On the other hand, abnormal days are usually characterized by a higher anomaly percentage. This is visible by a darker red colour of days in the comparison of Fig. 6(e) to Fig. 6(c) and Fig. 6(c) to Fig. 6(e). This is related to the fact that the auto-cleaning allows the GAM model to better fit to the underlying regular behavior of the data, rather than to the noise created by anomalous days. Our recommendation is to use a larger window size of about one year. It provides the benefit of including yearly seasonal effects in the model and prevents the model from overfitting as seen in the four months window, or from over generalization as revealed by the full time approach. The auto-cleaning is advised as it supports the GAM model to fit to the underlying regular behavior and, therefore, reduces the false alarm rate.
5
Conclusion
We have presented a novel diagnosis approach that is solely based on submeter readings and thus easily adaptable to different buildings. In contrast to other easily adaptable methods we do not only detect abnormal energy use but also identify its causes. Specifically we retrieve which submeter behavior, and thus which building equipment behaviour, explains the abnormal energy use, and (ii) which influencing factors govern the behavior of these submeters. This allows a facility manager not only to get a quick understanding of the abnormal energy use but also to track down its root causes which may reside in external factors. The approach is applicable to main meters and to sub meters. To realize such a diagnosis approach we made use of GAM models. We have shown that they can be used to model energy use and also to extract its influencing factors. Combined with ARMA models they are able to detect and quantify anomalies of meter and submeter readings. Our experimental evaluation showed that the approach is well applicable to HVAC and Light appliances and that anomalies can be detected with high precision. Furthermore our results showed that the facility manager could indeed utilize the information about the influencing factors of abnormal submeters to track down the root cause. Future work includes assessing the applicability of our approach to other building types. So far we have only validated our approach on commercial buildings, i. e. buildings whose energy use is dominated by large consumers such as air handling units (AHU) and chillers, rather than small consumers such as socket loads. The former units are operated by control schemes based on demand and time schedules. We also plan to extend our approach to automatically retrieve the root causes of abnormal energy use.
6
References
[1] Y. Agarwal, T. Weng, and R. K. Gupta. The energy dashboard: improving the visibility of energy consumption at a campus-wide scale. In 1st ACM Workshop on Embedded Sensing Systems for EnergyEfficiency in Buildings, BuildSys ’09, page 55–60, New York, NY, USA, 2009. ACM. [2] G. Bellala, M. Marwah, M. Arlitt, G. Lyon, and C. Bash. Following the electrons: methods for power management in commercial buildings. In 18th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, KDD ’12, page 994–1002, New York, NY, USA, 2012. ACM. [3] G. Bellala, M. Marwah, M. Arlitt, G. Lyon, and C. E. Bash. Towards an understanding of campus-scale power consumption. In 3rd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Buildings, BuildSys ’11, page 73–78, New York, NY, USA, 2011. ACM. [4] A. Dhar, T. A. Reddy, and D. E. Claridge. Modeling hourly energy use in commercial buildings with fourier series functional forms. Journal of Solar Energy Engineering, 120(3), Aug. 1998. [5] J. Duchon. Splines minimizing rotation-invariant semi-norms in sobolev spaces. In W. Schempp and K. Zeller, editors, Constructive Theory of Functions of Several Variables, volume 571 of Lecture Notes in Mathematics, pages 85–100. Springer, 1977. [6] P. A. Gonz´alez and J. M. Zamarre˜no. Prediction of hourly energy consumption in buildings based on a feedback artificial neural network. Energy and Buildings, 37(6):595–601, June 2005. [7] M. C. Hao, H. Janetzko, S. Mittelst¨adt, W. Hill, U. Dayal, D. A. Keim, M. Marwah, and R. K. Sharma. A visual analytics approach for PeakPreserving prediction of large seasonal time series. Computer Graphics Forum, 30(3):691–700, 2011. [8] International Energy Agency. World energy outlook 2012, 2012. [9] V. Jakkula and D. Cook. Outlier detection in smart environment structured power datasets. In IE - 6th Int. Conf. on Intelligent Environments, pages 29–33, 2010. [10] S. Katipamula and M. Brambley. Methods for fault detection, diagnostics, and prognostics for building systems - a review, part i. HVAC&R Research, 11(1):3–25, 2005. [11] Q. Li, Q. Meng, J. Cai, H. Yoshino, and A. Mochida. Predicting hourly cooling load in the building: A comparison of support vector machine and different artificial neural networks. Energy Conversion and Management, 50(1):90–96, Jan. 2009. [12] F. Liu, H. Jiang, Y. M. Lee, J. Snowdon, and M. Bobker. Statistical modeling for anomaly detection, forecasting and root cause analysis of energy consumption for a portfolio of buildings. BS - Building Simulation, 2011. [13] M. Misiti, Y. Misiti, G. Oppenheim, and J. Poggi. Optimized clusters for disaggregated electricity load forecasting. REVSTAT, 8:105–124, 2010. [14] A. H. Neto and F. A. S. Fiorelli. Comparison between detailed model simulation and artificial neural network for forecasting building energy consumption. Energy and Buildings, 40(12):2169–2176, 2008. [15] O. Perpi˜na´ n. solaR: Solar radiation and photovoltaic systems with R. Journal of Statistical Software, 50(9):1–32, 2012. [16] J. E. Seem. Using intelligent data analysis to detect abnormal energy consumption in buildings. Energy and Buildings, 39(1):52–58, Jan. 2007. [17] S. Shaw, S. Leeb, L. Norford, and R. Cox. Nonintrusive load monitoring and diagnostics in power systems. IEEE Trans. Instrum. Meas., 57(7):1445–1454, 2008. [18] R. Tibshirani and T. Hastie. Generalized additive models. Chapman & Hall, London, 1990. [19] S. Wood. Generalized additive models: An Introduction with R. Chapman & Hall, London, 2006. [20] H. Zhao and F. Magoul`es. Parallel support vector machines applied to the prediction of multiple buildings energy consumption. Journal of Algorithms & Computational Technology, 4(2):231–249, June 2010.