GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
Modeling Human Behavior in Building Performance Simulation: Gaussian Process and Monte Carlo Approach to the Energy Simulation of Residential Buildings Hwang Yi Paul L. Cejas School of Architecture Florida International University Miami, FL, United States
[email protected]
Abstract—Occupant behavior has a large impact on residential building energy use; nevertheless, behavior models in building energy simulation need scrutiny of investigation and use of oversimplified schedules results in a great deal of uncertainty in building energy prediction; the modeling of accurate occupant schedules requires a complex dataset with long-term observations, which is concealed during early stages of building projects. This study seeks to estimate behavior-related building operation schedules based on a minimum number of observations, so that energy simulation is effectively involved in the early stages of building projects. To this end, Gaussian process (GP) regression is applied to modeling five major occupant activities (space occupancy, activity level, hot water use, appliance use, and lighting control) of a single-family house in the United State. Monte Carlo simulation with sampling from GP-based occupant schedules demonstrates large variability of energy simulation results according to different human behaviors.
Society of Heating, Refrigerating, and Air-Conditioning Engineers) 90.1 underlines the significance of the operating and occupancy schedules [5], and IEA EBC (The International Energy Agency Energy in the Building and Communities) Annex 53 states that occupant behavior is one of the major six drivers of building energy consumption, since buildings basically serve to provide human habitation and indoor comfort. Accordingly, for sustainable building projects, little or biased considerations of occupant-building interactions in the early stages of the projects, misleadingly undermine the robustness of energy-efficient building planning. Building occupant behavior can be defined at various scales, but energy-related behavior generally refers to dynamic inputs of human activities and manual thermodynamic settings of spaces, including HVAC (Heating Ventilation and Air Conditioning) systems control, switching lights, window/blind operation, space occupancy, water use, clothing status, and plugging/unplugging appliances, which ends up influencing indoor thermal condition and energy usage directly and indirectly. To predict building energy use and operation status through behavior-driven changes during design/construction stages, building designers have little choice but to resort to simulation data or predefined standards; ASHRAE 90.1 provides templates of building operation schedules standardized according to building functions; nevertheless, it oversimplifies them to static and deterministic models [5], and offer little benchmarking data to single-family houses. moreover, predicting occupant behavior and incorporating it as a robust parameter into simulation work suffer from lack of reliable data and designed decision-making can be concealed by inappropriate modeling methods.
Keywords-component; occupant behavior; building energy simulation; Gaussian process; Monte Carlo Simulation
I.
INTRODUCTION
The energy consumption of residential buildings explains 25% of total energy used in the United States, and singlefamily houses account for up to 80% of that energy [1]. Green houses can be achieved by considering high-performance construction materials and technical energy conservation measures (ECMs) for system equipment and operations during as early stages as possible. Despite the growing awareness of energy-related issues in the residential building design, a number of recent studies report significant discrepancy between predicted (simulated) energy use and actual meter data [2] because of different building operation; Parker et al.’s study of ten identical houses (same floor areas in the same location, built in the same year) shows that there can be up to 65% difference in end-energy use due to different behavior-related operational settings [3]. McKinsey analyzes that behaviorbased saving can contribute to the energy reduction of residential buildings by 20% [4]. ASHRARE (American
II.
MODELING OF HUMAN BEHAVIOR IN BUILDING SIMULATION
A. Methods of Occupant Behavior Modeling Several methods have been introduced to and adopted for building occupant behavior modeling. Technical methods to
DOI: 10.5176/ 2251-3701_4.3.206 66
© The Author(s) 2017. This article is published with open access by the GSTF
GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
predict building occupant behavior are categorized into two folds: (i) probabilistic modeling and (ii) non-probabilistic modeling; the former can be divided into two such as parametric and non-parametric modeling according to whether behavior-related simulation parameters are assumed to fit a predefined probability distribution function (PDF). Bernoulli process is one of the simplest modeling based on an on/off status of operational settings, and discrete Markov chain model is non-parametric stochastic process that defines behavioral status based on a probability matrix of event causality and roulette with a random number. Survival analysis is also an option based on probability to estimate lapsed time before an action is taken [2]. In contrast, non-probabilistic modeling focuses on the interdependency of behavioral events or group activities, rather than assuming an explicit PDF or parameter, highlighting dynamic causes of human behavior. For example, agent-based modeling (ABM) is employed to programming behavioral attributes (e.g., desire of activity, belief of decisionmaking, intention) to each individual of a space based on a machine-learning algorithm and rules of collective behaviors that are monitored to model dynamic variation. Statistical data or stochastic assumptions can be used in part for preliminary data analysis or outlining the mechanism of physical building equipment.
this study proposes the Gaussian process (GP)–based modeling procedure of occupant behavior, attempting to integrate GPregression models within energy simulation work of a residential building. Dong and Lam [6] suggested the efficacy of GP in the prediction of building occupancy for the optimal management of mechanical systems of a residential unit; however, GP is still paid less attention to residential occupant behavior modeling. As a pilot study, this study presents GP modeling to test its applicability to an early building simulation phase. B. Gaussian Process and Bayesian Framework Gaussian Process (GP) is a supervised machine-learning technique to infer a regression function in a continuous domain such as time or space; it is also a Bayesian approach, since the inference occurs based on a set of training (observed or experimented) data [7]. GP models can be understood as an extension of multivariate Gaussian distribution to a functional relationship of random variables; it assumes that every function value of a regression model refers to a Gaussian distribution with a mean function m(x) and a covariance (or kernel) function (x, xꞌ), as generally in
While modeling algorithms and techniques are in progress and advanced, each method has pros and cons; the stochastic framework of probability-based modeling is appropriate to reveal the unpredictable nature of occupant behavior, addressing the uncertainty of a deterministic description of occupant behavior; non-probabilistic approach is better to define the interactive complexity of occupant behavior. However, the discreteness of Markov-chain approach may overlook an implicit continuity or a repetitive habitual pattern of human behavior within a space, because occupants are assumed to have no memory of the past but just before, not allowing for interactions between different behavior and indoor settings. Moreover, should one model an occupancy schedule on a continuous domain (of space or time), Markov chain must generates random numbers timelessly to obtain a continuous curve of schedule description; on the other hand, a challenging barrier of ABM is that it needs expert-level prior knowledge to concoct agents’ attributes and complicated action scenarios. Hybrid techniques can also be considered and developed to address the multiplicity of behavioral decision-making and to describe hierarchies of the underlying cause of human actions with enhanced precision. For example, even for a single user action, ABM and Markov chain can be integrated for different time frames according to difference in the primary cause of the action.
f ~ GP (m, )
where (x, xꞌ)=E[(x-m(x))(xꞌ -m(xꞌ))], and f denotes an extremely high dimensional vector which is a set of function values defined within a finite domain of vector space such that f = {f1, f2, …, fi, … , fn; n∈N1}. This expression is quite straightforward, but it is important to note that, unlike general probabilistic distributions, fi is a random variable of GP, and xi in a vector of variable x ⊂𝜒 works as an index of the regression process; this means that each dimension of f corresponds to x, so fi ends up with a regular Gaussian distribution whose mean and variance is associated with an index value xi, such that f(x1), … , f(x1), … , f(xn)) 1in∑ where
( x1, x1) ( x1, x2) ... ∑ ( x2, x1) ( x2, x2) ... and i = m(xi). ... ... ... The formula of a joint multivariate Gaussian distribution between two random variables, x1 and x2, is
However, in any case, it is still challenging to predict energy use with an accurate modeling of occupant behavior during schematic phases of building projects, since a dilemma inherent in behavior modeling and its use for environmental simulation is that it is inextricably associated with postoccupancy phases for specific data collection and modeling correction and validation. Therefore, an advanced technique to formulate behavior with limited information is needed to improve the accuracy of early-phase modeling as well as to document uncertainty of energy simulation results. To this end,
𝒙𝟏 𝒙𝟐
𝑥11 … 𝑥1𝑝 = 𝑥 ~𝑁 21 … 𝑥2𝑞
𝐒𝐀 𝝁𝟏 𝝁𝟐 , 𝐒 ⏉ 𝐂
𝐒𝐂 𝐒𝐁
; 𝑝, 𝑞 ∈ ℕ1
(3)
where SA stands for a covariance matrix of x1i and x1j (i, j = 1, ... , p), SB is a set of covariances of x2i and x2j (i, j = 1, ... , q), and SC is that of x1i and x2j (i = 1, ... , p; j = 1, ... , q). Then, the conditional distribution of x1 given x2 is given by,
67
© The Author(s) 2017. This article is published with open access by the GSTF
GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
−𝟏 ⏉ 𝒙𝟏 |𝒙𝟐 ~𝑁(𝝁𝟏 + 𝑺𝒄 𝑺−𝟏 𝑩 𝒙𝟐 − 𝝁𝟐 , 𝑺𝑨 − 𝑺𝑪 𝑺𝑩 𝑺𝑪 )
occupant activities of distant time laps, while maintaining a strong causality between activities of close time laps. Therefore, it can be an efficient tool to predict continuous timeseries data of unknown occupant activities.
(4)
Introducing Eq. (3) and (4) to the conditional distribution of an unseen function vector f* (test data) given a set of observed values f (training data), the following formula can be made.
∑ 𝝁 𝒇 ~ N( 𝝁 , 𝒇∗ ∑∗ ⏉ ∗
⟹
∑∗ ∑∗∗
C. Case study building and baseline simulation A detached single-family house in Philadelphia was chosen to represent a typical setting of US residential buildings. A general overview of this building is described in Figure 1 and Table 1. Data show a similarity to averaged characteristics of a household (Appendix). A family of three persons resides in this building, and each person’s activity and setting are lumped into unified behavioral descriptions for the purpose of this study. Since a gas furnace for forced warm-air heating and a central air-conditioning unit for cooling with a duct system serve an entire living area, the geometry of energy simulation input was simplified to one conditioned thermal zone and two nonconditioned thermal zones with internal mass and glazing area representing actual building data (Figure 2).
)
𝒇∗ |𝒇 ~ 𝑁(𝝁∗ + ∑∗ ⏉ ∑ −𝟏 (𝒇 − 𝝁), ∑∗∗ − ∑∗ ⏉ ∑ −𝟏 ∑∗ )
(5)
In terms of Bayesian, Eq. (1) gives a prior distribution with an appropriate setting of m(x) and (x, xꞌ), and, now, Eq. (5) enable us to update the prior with a training dataset; hence, applying Eq. (5) to Eq. (1) and (2), a posterior GP given a set of observed data D within a domain of 𝜒 is expressed as, fD ~ GP (m,)
(6)
From Eq. (5), the posterior mean function and kernel function is computed by,
𝑚𝑫 𝑥 = 𝑚(𝑥) + ∑(𝝌, 𝑥) ⏉ ∑ −𝟏 (𝒇 − 𝒎)
ĸ𝑫 𝑥, 𝑥
′
′
⏉
= ĸ(𝑥, 𝑥 ) − ∑(𝝌, 𝑥) ∑
−𝟏
′
∑(𝝌, 𝑥 )
(7) (8)
Here, it is important to note that covariances x, xꞌ ∈ XD → fD, and observation data, 𝜒 → f, reduce uncertainties of the prior parameter functions (m and ) of fD (or f*). Finally, if the training data are noisy with errors (ε ~ N (0, σn2)), Eq. (6) can be rewritten with extra covariance such as, y ~ fD + ε ~ GP (m, σn2)
(9)
so, y is the final vector of estimation, and the GP model ends up with a certain confidence range. Given little information about a data description, assuming a normally distributed association between data points is quite natural. That is, a GP model can be used as a prior distribution of data for Bayesian inference. By the same token, the GP approach can give a better prediction to describing building occupant behavior probabilistically with little prior information of occupant schedule than Markov chain models. In most cases of building projects, specifically during early stages of design or construction, it is not conceivable to simulate energy use with complete knowledge of occupant behavior; GP models are, depending on the definition of covariance function , available for describing interrelationships between different
Figure 1.
Test building (Front view, system equipment, and living room)
TABLE I.
Site area Conditioned area
68
DESCRIPTION OF TEST BUILDING 1265.99 m2 282.38 m2
Total floor area Conditioned volume
340.73 m2 722.01 m3
© The Author(s) 2017. This article is published with open access by the GSTF
GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
Exterior surfacea Window-towallb
1265.99 m2 460.31 m2
Total floor area Fenestration
b
340.73 m2
(b) Simulation input geometry and thermal zoning 2
35.11 m
931.26 m2 Gas furnace/Central air Heating/Cooling system conditioner Structure Wood frame Glazing Double pane a. If both side of an interior wall are exposed to a thermal zone, its surface area is double counted. b.
Baseline building geometry
Figure 2.
11.30%
Internal massc
A little unified compliance code has been offered for building energy simulation for low-rise residential buildings, since it is difficult to standardize the energy usage pattern of small buildings; in this regard, Wilson et al. [8] recently conducted a study to provide a benchmarking protocol of operating parameters and occupancy schedules of single-family houses in the U.S (BA protocols). For the baseline energy simulation of this study, operation schedules and parameters available in the protocol (as well as thermostat set point at 21℃ for heating / 24℃ for cooling) were put in, while geometryrelated factors are based on as-is conditions. Simulated endenergy use (Aug. 23 – Dec. 31) is presented as below (Figure 3 (a) and (b)).
Considering conditioned area (heat transfer zone) only.
c.If both side of an interior wall are exposed to a thermal zone, its surface area is double counted.
Interior Light
Electric Equipment
People heat gain
Zone temperature
1.6
35
1.4
30
Energy (MJ)
1.2
25
1 20 0.8 15 0.6
Temperature (C)
Site area
10
0.4
5
0.2 0
0 1
501
1001
1501 Time (hour)
2001
2501
3001
(a) Baseline simulation (hourly data)
Fans (electricity) 10%
(a) Ground plan
Cooling (electricity) 8%
Water (gas) 17%
Interior equipment (electricity) 9%
Heating (gas) 56%
(b) Baseline simulation (annual end-energy use breakdown) Figure 3.
69
Baseline energy simulation results
© The Author(s) 2017. This article is published with open access by the GSTF
GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
D. Preliminary data and point estimation of schedules Based on the baseline settings, six major behavior-related schedules of building operation were selected to be modeled: (i) electric appliance use, (ii) lighting intensity, (iii) people occupancy in a conditioned zone, (iv) domestic hot water use, (v) individual activity level (bodily sensible heat), and (vi) window blind (interior shading) status (which is controlled manually). On the other hand, to obtain observation data for GP, this study was assumed that some data points (actual user activities) were identical to the schedules of BA protocols, so GP finds missing data with continuous regression models; for the assurance of point estimation and the variance of each data points, preliminary study was followed by a simple questionnaire survey with occupants’ responses based on their life styles (i.e., yes/no questions if residents occupied a space as at the same rate as BA protocols suggest). III.
(b) Interior lighting
RESULTS
GP regression was carried out for each operational schedule selected. The process in this study was initiated by random generations of unconstrained (or free-ended) Gaussian distribution functions at 30minute intervals, and, afterwards, functions were modified to fit observed (known) data. The mean value and the standard deviation of each initial Gaussian functions were evaluated from the average of each schedule offered by BA protocols. The experimental results are presented in Figure 4.
(c) Zone occupancy
(d) Domestic hot water use (a) Plug-in loads
70
© The Author(s) 2017. This article is published with open access by the GSTF
GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
(e) Occupant activity level (sensible heat)
temperature distributions of the baseline model and the GPbased prediction model.
(f) Windows interior shading (1: Fully opened, 0: Closed)
Figure 4.
Results of behavior prediction of the test house from GP
(a) Electricity consumption
In each figure, the blue solid line denotes a collection of mean function values at each time interval, and Gray area stands for minimum and maximum bounds (dot lines) within 95% confidence. The black line connects discrete data points of BA protocols’ schedules, and red dots represent observed data from an actual survey with deviations. Based on these observation and the mean values of assumed static schedules, GP resulted in continuous schedule profiles over 24 hours. GP prediction approximates original schedules (Figure (a) ~ (e)) by fitting function values to observed data; however, predictive means of GP regression also shows considerable inconsistency with a large amount of uncertainty, especially at time intervals where data are missing; in the plug-in loads (Figure 4 (a)) and zone occupancy (Figure 4 (e)), GP tends to reduce fraction values (up to 70%), whereas original profiles undergo a linear transition or show a sharp up/down turn. It is because (i) prior probability distributions were generated only with overall mean values (without prior knowledge about standard deviation) of BA protocol data, and (ii) GP evaluates predictive means of missing points based on a distance between variables. GP shows an efficient approximation, considering that predictive profiles were obtained with only 6 ~ 12 data points rather than a full set of 24-hour data. That said, results also show that this advantage can be undermined by a great deal of uncertainty, and GP may lead to a very rough behavior model ill-suited to actual measurement. Figure 4 (c) demonstrates that more information is required to accurately predict a linear pattern of a behavior schedule, since GP basically finds a regression model with a curve-fitting method.
(b) Gas consumption
In order to characterize the uncertainty of generated GP models, 1000 different schedules per each operational behavior were sampled from GP models with the predictive means and confidence levels. Along with this sampling data, Monte Carlo simulation was run through Energy Plus to monitor variations in end-energy use. Parameters except for the modeled schedule were identical to baseline simulation. Figure 5 shows the results of energy simulation and Figure 6 compares
(c) Total energy consumption Figure 5.
71
Results of Monte Carlo simulation with GP-based behavior models
© The Author(s) 2017. This article is published with open access by the GSTF
GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
one; biased curve-fitting is also apparent in space occupancy. The regression model does not clearly reflect the sharp change of the baseline schedule at 7 a.m. Due to the dominance of space heating in house energy consumption, the simulation results of total energy use also exhibit bimodality; mean of each mode is 29.93 and 34.16, respectively, and standard deviations are 0.31 and 1.09. Differences between overall means of data sets in gas use and total energy and baseline data are less than 7% (6.3% and 5.0% each). However, bimodality and deviations of Monte Carlo simulation results demonstrate the ambiguity of static behavior schedules in energy simulation, as well as how deceptive point estimation can be. IV.
CONCLUSIONS
This study briefly presented a GP-based modeling method in residential building energy simulation, as an alternative that can be introduced during early phases of a building project or if building data about occupant behavior is little available. GP regression was carried out for five major behavior-related operational schedules. Monte Carlo simulation was run with sampling of GP models. Results showed potential effectiveness of GP in early phase simulation, since continuous behavior schedules can be modeled without a full series of data sets or probability of temporal events. Therefore, it follows that GP can provide rapid effective modeling, especially if one attempts to estimate occupant behavior given little information, such as starting a new building project or energy simulation from scratch or correction of an existing behavior model with additional information. Another advantage of GP modeling is that it is presented with a degree of uncertainty so that simulation results can be monitored with data assurance.
Figure 6.
Nevertheless, findings also showed that too little observed data may lead to a biased outcome, especially in case an actual behavior schedule undergoes a little variation (i.e., linear transition of zone occupancy at night) or a sudden change within a short time; to improve the robustness of GP-based predictability, further study is needed such as study of occupant behavior different type of building, parametric monitoring of mixed human behavior patterns, or a hybrid methodology of machine-learning processes to quantify social/psychological aspects of occupant interactions as well as the development of rapid, correct data collection methods.
Zone mean air temperature (baseline (above) and GPpredicted (below))
Simulation results were compared based on fuel sources of end-energy (gas: 23.22 GJ, electricity: 9.42 GJ, total: 32.64 GJ); Electricity use (Figure 5 (a)) from GP model sampling shows a normal distribution whose mean is 9.24 (GJ) and standard deviation is 0.18 (GJ). Even if T-test shows that this sampling distribution is not exactly the same as the baseline (red dot line), mean values are quite similar (2% difference), considering little data inputs given for GP regression. On the other hand, it should be noted that gas consumption is more extensively distributed (19 ~ 26 GJ); Monte Carlo simulation resulted in a bimodal distribution ⸺ a mixture of two different normal distributions. Total mean of the mixture distribution is 21.76 (GJ) and standard deviation is 1.96 (GJ). The frequency of distribution shows that a data set clustered in the mixture (with mean of 20.65 and standard deviation of 0.42) is dominant than the second mode of the rest data whose mean is 25.09 and standard deviation is 0.29. This multimodality in gas use simulation results from the low fitness of GP models in the schedules of domestic water use and space occupancy (Figure 4 (d) and (e)). The water use schedule was modeled with the least number of observed data, which ended up with double peak points of a GP model, from 8a.m. to 7p.m., rather than
APPENDIX TABLE II.
CHARACTERISTICS OF A TYPICAL SINGLE-FAMILY HOME [9]
Year built Stories
mid-1970s 1
Number of rooms Glazing type Number of windows Total area of windows
72
Occupants 3 Foundation Concrete Bed rooms 3 Other rooms 3 Bath 2 Double-pane 15 11.5% of conditioned floor area
© The Author(s) 2017. This article is published with open access by the GSTF
GSTF Journal of Engineering Technology (JET) Vol.4 No.3, August 2017
Year built mid-1970s Space and water heating equipment Space cooling equipment
Appliances
incorporates machine-learning algorithms, physical-cyber system computing and information theory into design automation and optimization. He is a Fulbright scholar and a laureate of various architectural prizes, including The Jean Prouve-Kim Jung-Up Architecture Prize from the France Government (2009) and T.S. Kim Architecture Prize (2012). He has earned a PhD in Architecture from the University of Pennsylvania (2016), and a Bachelor of Engineering and a Master of Engineering in Architecture (2007) from Seoul National University, South Korea.
Occupants 3 Central warm-air furnace (gas) Central air-conditioner (electricity) Refrigerator TV 3 ea. Computer 2 ea. Range/oven Electric type
REFERENCES [1] [2]
[3]
[4]
[5] [6]
[7] [8]
[9]
http://aceee.org/sector/residential [Accessed 10/23/2016]. T. Hong, S.C. Taylor-Lange, S. D’Oca, D. Yan, and S.P. Corgnati, “Advances in research and applications of energy-related occupant behavior in buildings”, Energy and Buildings, vol. 116, pp. 694-702, 2016 D. Parker, E. Mills, L. Rainer, N. Bourassa, G. Homan, “Accuracy of the home energy saver energy calculation methodology, in: ACEEE Summer Study on Energy Efficiency in Buildings, pp. 12-206-12-222, 2012. S.Heck, H.Tai, “Sizing the Potential of Behavior Energy-Efficient Initiative in the US Residential Market Report”, McKinsey Company, 2014. User’s Manual for ANSI/ASHRAE/IESNA Standard 90.1-2004 B. Dong, K.P. Lam, “A real-time model predictive control for building heating and cooling systems based on the occupancy behavior pattern detection and local weather forecasting”, Building Simulation, vol. 7, pp. 89-106, 2014 C. E. Rasmussen and C. K. I. Williams, “Gaussian Processes for Machine Learning”, MIT Press, 2006 Wilson, E., C. Engebrecht Metzger, S. Horowitz, and R. Hendron. "2014 Building America House Simulation Protocols." edited by U.S. Department of Energy. Oak Ridge, TN: National Renewable Energy Laboratory, 2014 U.S. DOE. Building energy data book, section 2.2: residential sector characteristics, 2012.
AUTOHERS’ PROFILE
Dr. Hwang Yi, Assoc. AIA, LEED AP, is an Assistant Professor in the School of Architecture at Florida International University. He is an architect and professional expert in sustainable building design and digital building performance simulation. His research focuses primarily on interdisciplinary data-driven approaches for intelligent design processes, which
73
© The Author(s) 2017. This article is published with open access by the GSTF