MANUFACTURING FACILITIES BUILDING AND ...

MANUFACTURING FACILITIES BUILDING AND ELECTRIC VEHICLE CHARGING MACHINE LEARING FORECASTING SYSTEM Roger N. Anderson, Albert Boulanger, Viabhav Bhandari , Arthur Kressner, Xiaohu Li, Somnath Sarkar, and Leon Wu 11/1/2012 Abstract Manufacturing facilities of all kinds have more complex electrical load profiles than office space or homes. The latter is generally sinusoidal with highest load in the day and lowest load at night. Manufacturing facilities are instead controlled by the work cycle, which often involves significant nighttime loads. All of these buildings are migrating to more and more of an electrical economy, in which electric vehicles are delivering their produce and powering the cars of their employees. We have invented a Machine Learning (ML) System that learns the patterns of these complex systems, as well as the simpler systems, and by adding predictions of upcoming weather and seasonal work pattern changes, forecasts building loads for both the manufacturing facilities and their EV recharge facilities so that optimal energy usage results. Peak consumption penalties are avoided, workers are happier and safer, and companies contribute less to environmental polution. In one enablement, the invention has been reduced to practice in a large package delivery facility that routinely processes 8000 to 10000 packages per day. Conversion to Electric Delivery Vehicles (EDVs) has commenced, and control of recharging times and speeds must not interfere with the conveyor belts and air quality equipment that keeps workers safe and deliveries on time. Peak loads occur in the morning, afternoon, and around midnight. Duration of these spiky loads depends on package volume, which is held constant at a continuous flow of poackages so that excessive load results in longer duration spikes in electricity consumption rather than in higher spikes in load. Our newly invented ML Forecasting system uses Support Vector Machine Regression (SVMR) to predict the day-ahead electric load of the facility using past histories of load for that day, hour, and weather prediction. It has a feedback loop that scores the statistical accuracy of its prediscions against the actual buildiong load, which is controlled by package volumes coming in and going out. The ML system learns from its errors, which are minimized over time. I can then convert this electrical load prediciton to a forecast of package volumes for the next day, week, month, and season. In another enablement, our Machine Learning Forecasting System is built into a commercial battery recharge optimization system so that tomorrows expected package load and weather forecasts are used to successfully optimize the time windows allocated for EDV fleet recharge and intensities of power to the batteries in each vehicle. Peak load spikes are avoided which draw penalties from the utility. In another enablement, the ML Forecasting System is used as a simulator, in that it can compute the scaling to hundreds of theoretical EDVs at this facility or other differently sized depot facilities to identify how electric load can be predicted and minimized, therefore requiring less new capital equipment from the utility since added supply may no longer be needed from the utility as the depot expands from 10% to 100% EDV's in the near future.

Manufacturing Facility Electric Load Prediction With the increasing demand for oil and its decreasing supply, the cost of fossil fuels has increased considerably over past decades and the trend is expected to continue. Furthermore, the increasing awareness about the environmental impact of burning fossil fuels has generated a lot of interest in the search for cleaner and more sustainable sources of energy. The transportation sector is dependent on fossil fuels for over 90% of its energy requirements. Using electric delivery vehicles (EDVs) to replace large fleets of vehicles, for example those used for public transportation, mail delivery etc. in large urban areas, is being looked at as an easy way to reduce dependence on fossil fuels for transportation. Close to a third of the electricity produced in the US is generated from non-fossil fuel sources making electricity a relatively cleaner energy source. Using EDVs on a large scale necessitates managing the charging activity to avoid increasing the peak electric demand significantly, which, in turn, avoids overloading the existing electric distribution infrastructure. In order to manage, one must be able to model the charging load. To predict the charging load at manufacturing facilities that utilize EDVs, a Machine Learning model was invented to forecast the total charging load. This ML model uses support vector machine (SVMR) regression, a method that can build a model purely based on past charging load observations requiring no knowledge of the number or type of EDVs, the number and type of batteries used in each EDV, or any other physical properties. This pure dependence on history data makes the model easily applicable to different types of manufacturing facilities with few adjustments. The SVMR model was built to predict day-ahead energy usage based on past energy usage, and temperature, and dew point temperature forecasts. Modeling the energy usage of such a manufacturing facility is an important step in improving efficiency of personnel and equipment, as well as identifying EDV fleet expansion opportunities without increasing peak electric demand. Moreover, a good model will help owners better understand their facility’s manufacturing volumes and timing as well as electric load requirements, which inevitably leads to new and better ways of optimizing the existing facility utilization. Our ML Forecasting System can be used to flatten the charging load profile so as to avoid “demand charges” from the utility. One embodiment is visualized above. GE, FedEx

Express, Consolidated Edison of New York and Columbia University are demonstrating the nation’s first electric delivery vehicle depot implementing smart charging technology connected to the distribution grid in an intelligent manner. Figure 1 depicts the overall program, participants, and perceived benefits of the smart charging electric fleet depot.

Electric Delivery Vehicle Charging Optimization The baseline charging infrastructure will include commercially available vehicle charging units that is networked into the facility intranet along with a local PC running the charging and ML Forecasting Systems The joint system will accomplish basic EV charging and will record event parameters including charge time, vehicle ID, and kWh consumed. Data Acquisition and Supervisory Control System This system will include two major components: 1) a commercial data acquisition and historian software database loaded onto a local PC at the facility and/or at a remote server to collect and archive data as well as provide the proper visualization screens for each project member to view status and historical trends, and 2) a supervisory control system component will better understand the grid state as well as some of the finer details of the vehicle and depot states. This commercial software solution will help analyze the entire system state and provide recommended charge schedules for the vehicles meeting predetermined constraints, such as a fully charged vehicle by the required departure time and lowest possible electricity fuel cost. Our Machine Learning Forecasting System Our newly invented ML system will connect to the control system in order to forecast the building load and charging load 24 hours in advance. The Columbia Machine Learning Forecasting System (MLFS) applies machine learning techniques on various feature datasets including electrical load, weather, holiday, and package volume to predict next day’s building load, charging load, and building load minus charging load for electrical load and charging schedule optimization of the facility and EDVs, as envisioned below.

Manufacturing Facility Electric Load Forecasting

Data Inputs Electric load data for the building has been available for the last 5 months sampled at a frequency of every 15 minutes. Load Profile 450

400 350

300 250 200 150 100

Weekly Load Profile 350

300

250

200

150

100 24-Sep

25-Sep

26-Sep

27-Sep

28-Sep

29-Sep

30-Sep

Covariate Selection As a quick measure of the relative importance of each covariate, we have the correlation coefficient1 of each with the load. Further statistical significance of each covariate was measured, taking in account the issue of multicolinearity. Covariates Previous week load Previous day load Previous day average Previous week average Humidex Holiday Hour of the day Day of the week

Correlation value with the load value 0.73 0.68 0.53 0.50 0.48 0.34 0.17 0.12

In view of some literature on the effect of atmospheric pressure on load forecasting [3], we tried including other weather covariates such as atmospheric pressure, sky clarity and wind speed. However, the additional covariates did not improve our predictive capability. Below is a graph for comparison.

1 Partial correlation can be computed to measure the importance of each variable keeping the remaining constant.

500 400 300 200 100 7/18/2012

7/20/2012 Actual

7/22/2012 7/24/2012 Predicted (MAPE: 11.15%)

With humidex as the only weather covariate

500 400 300 200 100 7/18/2012

7/20/2012 Actual

7/22/2012 7/24/2012 Predicted (MAPE: 12.02%)

With 4 weather covariates: Atmospheric pressure, wind speed, sky clarity and humidex

Support Vector Machine Regression [1] Our best-performing SVMR model uses 8 covariates. Since there is a cyclical component in the load profile covariates such as previous day load, previous week load, previous day average, previous week average, time-of-the-day and day-of-the-week were incorporated. Additionally to account for the HVAC, a heat index called humidex (composite of temperature and dew point) was included as a covariate. As an initial approach to model package volume, we added a covariate with discrete sets of values for different kinds of holidays/weekends. In regard to the choice of kernels, we used Radial Basis Function in our SVR model to project the data into the infinite dimensional feature space, which improved results substantially. A challenge that remained was to capture the unpredictable spikes caused by the operation of the large conveyor belts and exhaust fans (time and duration of occurrence and magnitude) in the load profile during the busy package loading and unloading hours, when the electric load goes up by more than 100 percent.

Ensemble of Machine Learning Models

We note that seasonal changes exist for all Manufacturing Facilities that also affect EDV charging patterns. We have therefore supplemented SVMR learning with the simultaneous use of several other algorithms that yield forecasts.

SARIMA (3, 1, 1)*(3, 1, 1), MAPE: 7.33%

SVM Prediction; MAPE: 5.03%

Neural Network Being much of a black-box, neural networks are difficult to analyze. Its opaque nature makes it very hard to determine how a network of neurons is solving a problem. They are difficult to troubleshoot when they don't work, and when they do work, they may suffer from over fitting. We compared Neural Network, BART and SVM results in the graph below [*].

Bayesian Additive Regression Trees (BART) [2] BART is a Bayesian ensemble method to learn a regression relationship between a variable of interest y and p potential predictors x1, x2,..xp. The idea is to model the conditional distribution of y given x by a sum of random basis elements plus a noise distribution. Based on a basis of random regression trees, BART produces a predictive distribution for y at any x (in or out of sample) that automatically adjusts for the uncertainty at each x. BART can do this for nonlinear relationships, even those hidden within a large number of irrelevant predictors. BART’s basis is a Regression Tree Model, which uses a decision tree to map observations about an item to conclusions about its target value. Let T denote the tree structure including the decision rules. Let M = {µ1, µ2,…,µb } denote the set of bottom node µ’s. Let g(x; Θ) = (T; M) be a regression tree function that assigns a µ value to x. The BART Model is f (x|. ) = g(x;T1,M1) + g(x;T2,M2) + ... + g(x;Tm,Mm ) + σ * z, z ~ N(0,1). Therefore f (x|. ) is the sum of all the corresponding µ's at each bottom node for all the trees.

BART approximates the unknown form of f(x1, x2, .., xp) = E[Y | x1, x2, .., xp] by a "sum-of-trees" model that is coupled with a regularization prior to constrain each tree to be a weak learner. Essentially we want to fit the model Yi = f (Xi ) + ei. BART requires an iterative simulation procedure, the Metropolis-Hastings (MH) algorithm, which is a Markov Chain Monte Carlo (MCMC) method for stochastic search of the posterior to generate regression trees. Draws from f | (x; y) are averaged to infer f. To get the draws we 1) put a prior on ‘f’, and 2) specify a Markov chain whose stationary distribution is the posterior of ‘f’.

Parameters in BART Our modeling experiments relied on the Bayes Tree package in R. We imposed a prior over all the parameters of the sum-of-trees model, namely, (T1,M1), . . . , (Tm, M) and σ. These parameters are in turn based on the following hyperparameters: 1. α (base), β (power) : Determines the tree depth. 2. k: Sets the prior probability on the function to be estimated to lie within a certain bound. 3. ν, q: Sets the error tolerance level. (Smaller tolerance level may lead to over-fitting.) Other parameters include the number of trees, the number of iterations before burn-in, and the number of post burn-in iterations. In a Markov Chain Monte Carlo process, we want the underlying distribution to converge before we can take iid samples from the distribution. So, the number of draws until convergence is called “burn-in”. We look at the plots of successive draws and discard the initial samples (burn-in) until the samples become stationary. Grid Search can be deployed to find the optimum set of parameters. Since there are large numbers of parameters, any effort to obtain an optimal parameter set will be computationally extremely expensive. We used the default parameters to evaluate our BART model: α=2.0; β=0.95; k=2; 600 trees; 2000 iterations before burn-in; 5000 iterations after burn-in. As evident, there is a lot of scope to improve the predictions from the BART model.

Covariate Selection As a quick measure of the relative importance of each covariate, we have the correlation coefficient2 of each with the load. Further statistical significance of each covariate was measured, taking in account the issue of multicolinearity. Covariates Previous week load Previous day load Previous day average Previous week average Humidex Holiday Hour of the day Day of the week

Correlation value with the load value 0.73 0.68 0.53 0.50 0.48 0.34 0.17 0.12

2 Partial correlation can be computed to measure the importance of each variable keeping the remaining constant.

In view of some literature on the effect of atmospheric pressure on load forecasting [3], we tried including other weather covariates such as atmospheric pressure, sky clarity and wind speed. However, the additional covariates did not improve our predictive capability. Below is a graph for comparison. 500 400 300 200 100 7/18/2012

7/20/2012 Actual

7/22/2012 7/24/2012 Predicted (MAPE: 11.15%)

With humidex as the only weather covariate

500 400 300 200 100 7/18/2012

7/20/2012 Actual

7/22/2012 7/24/2012 Predicted (MAPE: 12.02%)

With 4 weather covariates: Atmospheric pressure, wind speed, sky clarity and humidex

Relative Unimportance of Humidex The building in our study has a multi-floor sorting facility with huge power-drawing conveyor belts and exhaust fans. As such, its power consumption patterns are very different from a normal office building where HVAC is the dominant load. In the FedEx facility other exogenous factors involved in package processing workflow such as package volume and late freight play a crucial role.

Weak dependence of electric load on Humidex. (Y-axis: Load, X-axis: Humidex)

Data Electric load data for the building has been available for the last 5 months sampled at a frequency of every 15 minutes. Additionally, a calendar of holidays and observed weather data (temperature and dew point – source: Central Park NOAA observation data via the Weather Underground) were fed into several different Machine Learning (ML) models to predict day-ahead electric load for the building. Mean Average Percentage Error (MAPE) was used to measure the accuracy of our predictions and select the best performing ML model

MAPE as the Measure of Error MAPE is based on the absolute value; hence under-prediction and over-prediction will be assigned the same error value if both are equidistant from the actual. Other measures such as Mean Square Error or Mean Absolute Deviation will be dominated by the relative lack of precision at the peaks which is not our primary concern. We want to capture the timings of the peaks and the general load profile. However under-prediction is a more serious concern than over-prediction. Since the MAPE does not distinguish between the two, we can use SMAPE to penalize more for under-prediction. As an example suppose At denotes actual value Ft denotes the forecast value then we have the following 2 cases. • Over-forecasting: At = 100 and Ft = 110 give SMAPE = 4.76%

• Under-forecasting: At = 100 and Ft = 90 give SMAPE = 5.26% However, under-prediction is not a major concern in our model. Hence the use of SMAPE does not change the results significantly.

Cross Validation Taking the same set of optimized parameters for the whole day led to bad predictions. Hence an hourly optimized model was used, where we formulated 24 different SVR models corresponding to each hour of the day. Grid search with exponential distance between the grids was used to find the optimal values of the parameters in the SVR model. In view of the time series data, a customized cross-validation algorithm was implemented. We partitioned the training data into two sets, all available data but the latest week to train the model and the ‘left out’ week to validate the prediction. The process was repeated for every week. Minimizing MAPE was used as the objective and the MAPE corresponding to each week’s predictions were stored. These MAPE values were then averaged using exponentially decaying weights with the most recent week receiving the highest weight. The set of parameters corresponding to the minimum average MAPE were selected as the optimal parameters for that hour. The whole process was repeated for each hour of the day. These by-hour parameters were then used to build the prediction model. Grid search is a computationally expensive algorithm to discover the optimized values. The effect of ‘Cost’ and ‘Gamma’ on prediction is much more than ‘epsilon’. Limited set values for cost and gamma were explored and default value of epsilon was used. The use of hourly prediction substantially reduces the space complexity of the model, thus leading to faster results. The hourly algorithm can be easily parallelized. Performance of the hourly model is not affected by the unpredictable spikes in the load. 500

Load

400 300 200 100 18-Jul

19-Jul

20-Jul

21-Jul

22-Jul

23-Jul

Actual BART (MAPE: 11.14%) Comparison of forecasted electric load under different models

24-Jul

80.0% 60.0% 40.0% 20.0% 0.0% 18-Jul

19-Jul

20-Jul

21-Jul

22-Jul

23-Jul

24-Jul

BART SVR Neural Network Percentage error comparison under different models

Conclusion SVM is one of the best models for prediction. SVM has been widely used both for regression and classification. Using the kernel trick, the data is projected to higher dimensions where the algorithm finds a linear classifier. For nonlinear regression, the Gaussian Radial Basis function is extremely versatile as its feature space is a Hilbert space of infinite dimensions. However, the effectiveness of SVM depends on the selection of kernel, the kernel's parameters, and soft margin parameter. The model, as we have now, has a relatively good predictive power. However, there is further room for improvement. Firstly our grid search points are exponentially distanced to search for the optimal values quickly; finer search between each grid points can also be implemented, however it will be computationally expensive. We have not optimized for error margin ε in our SVR model, as it is unlikely to improve the predictions significantly. As a further study we may check the effects of changing the error margin. Bayesian Additive Regression Tree (BART) is also an extremely versatile method as it is based on ensemble learning where each tree constitutes a weak learner. It has a completely different approach to SVM and performs quite well on noisy data. However, the model has a lot of parameters and finding the optimal set is computationally extremely expensive. Moreover, its implementation is only available in R and has no Python libraries.

Online Tool

References [1] Cortes, C. & Vapnik, V. (1995), “Support-vector network. Machine Learning”, 20, 1–25. [2] H. A. Chipman, E. I. George, and R. E. McCulloch, "BART: Bayesian Additive Regression Trees," Annals of Applied Statistics, vol. 4, pp. 266-298, 2010. [3] Soares, A.P.: Atmospheric Pressure Applied to a Neural Network based Short Term Load Forecasting. In SBRN (2000)

Electric Delivery Vehicle Charging Load Forecasting The charging load for EVs. depends on a number of factors. The time of day, day of the week and package volume affect the energy demand most dramatically. Most of the charging activity happens on weekdays after the EVs come back in the evening. By including past charging load observations in the prediction, this weekly cycle of usage during late evenings and early mornings is learned by the model and is used to predict charging load over the next 24 hours. The other important factor in predicting charging load is the weather. On particularly hot days, more energy is required to cool the EV, and more energy goes into heating on very cold days. Similarly, humidity and the presence of precipitation can change the temperature perceived by the EV operator, which affects the amount of energy required to regulate temperature and hence to charge the batteries. Past energy demand, temperature, and dew point temperature were all used in the creation of the computer model. Plot of observed charging load along with some covariates: total_load_KW_3P_Total

min_humidex

max_humidex

avg_humidex

load_last_week

holiday_dist

70 60 50 40 30 20 10 5/7/12 0:05 5/7/12 17:15 5/8/12 10:25 5/9/12 3:35 5/9/12 20:45 5/10/12 13:55 5/11/12 7:05 5/12/12 0:15 5/12/12 17:25 5/13/12 10:35 5/14/12 3:45 5/14/12 20:55 5/15/12 14:05 5/16/12 7:15 5/17/12 0:25 5/17/12 17:35 5/18/12 10:45 5/19/12 3:55 5/19/12 21:05 5/20/12 14:15 5/21/12 7:25 5/22/12 0:35 5/22/12 17:45 5/23/12 10:55 5/24/12 4:05 5/24/12 21:15 5/25/12 14:25 5/26/12 7:35 5/27/12 0:45 5/27/12 18:00 5/28/12 11:10 5/29/12 4:20 5/29/12 21:30

0

Methods Support Vector Machine (SVM) is a machine-learning tool that embeds past data in multidimensional space in order to output a regression. Each data point of past charging load was graphed against three sets of attributes: time of day and time of week related, weather-related and observed load: a day ago, a week ago and averaged over intervals of various lengths and recent trend based on the total daily energy usage for charging over the past few days. The observed

weather at charging time doesn’t have a direct effect on the charging load, but the weather experienced by the EV operator during his/her route does. An SVM regression model was used to find patterns in the data.

•

Support Vector Regression (SVR) and various time series models were tried to find the model with the best prediction. Some representative results from various time series models tried are as follows: •

ARIMA (2, 1, 2) model: 24 hour forecast. The forecasted data is on the right. The orange region is the 80% confidence range and yellow is 95%.

•

Holt Winters model: 24 hour forecast. The forecasted data is on the right. The orange region is the 80% confidence range and yellow is 95%. Note that the confidence range becomes quite wide towards the end of the forecast interval.

•

De-seasonalized load SVR model: Multiplicative decomposition was performed on the observed charging load and the seasonal component so obtained was removed from the observed data. An SVR model was then built using the de-seasonalized load. The following charts illustrate this model.

9/13/12 8:00 9/13/12 13:10 9/13/12 18:20 9/13/12 23:30 9/14/12 4:40 9/14/12 9:50 9/14/12 15:00 9/14/12 20:10 9/15/12 1:20 9/15/12 6:30 9/15/12 11:40 9/15/12 16:55 9/15/12 22:05 9/16/12 3:15 9/16/12 8:25 9/16/12 13:35 9/16/12 18:45 9/16/12 23:55 9/17/12 5:05 9/17/12 10:15 9/17/12 15:25 9/17/12 20:35 9/18/12 1:45 9/18/12 6:55 9/18/12 12:05 9/18/12 17:15 9/18/12 22:25 9/19/12 3:35 9/19/12 8:45 9/19/12 13:55 9/19/12 19:05 9/20/12 0:15 9/20/12 5:25

Forecast comparison: observed_load (kW)

9/13/12 9/14/12

svr_forecast (kW)

svr_mae (kW)

9/15/12 9/16/12

de-seasonalized_svr_forecast (kW)

35

30

25

20

15

10

5

0

Errors: de-seasonalized_svr_mae (kW)

4 3.5 3 2.5 2 1.5 1 0.5 0 9/17/12 9/18/12 9/19/12

8/13/12 8:00 8/13/12 13:10 8/13/12 18:20 8/13/12 23:30 8/14/12 4:40 8/14/12 9:50 8/14/12 15:00 8/14/12 20:10 8/15/12 1:20 8/15/12 6:30 8/15/12 11:40 8/15/12 16:50 8/15/12 22:00 8/16/12 3:10 8/16/12 8:20 8/16/12 13:30 8/16/12 18:40 8/16/12 23:50 8/17/12 5:00 8/17/12 10:10 8/17/12 15:20 8/17/12 20:30 8/18/12 1:40 8/18/12 6:50 8/18/12 12:00 8/18/12 17:10 8/18/12 22:20 8/19/12 3:30 8/19/12 8:40 8/19/12 13:50 8/19/12 19:00 8/20/12 0:10 8/20/12 5:20

svr_forecast_rmse

9/13/12

•

35 9/14/12

Sample SVR model errors 9/15/12

Actual charging load (kW)

de-seasonalized_svr_forecast_rmse

6

5

4

3

2

1

0 9/16/12 9/17/12 9/18/12 9/19/12

SVR model produced the best prediction

Sample SVR results: Actual vs. predicted charging load

Predicted charging load (kW)

30

25

20

15

10

5

0

7

Mean Absolute Error (kW)

Root Mean Squared Error

6 5 4 3 2 1 0 8/13/2012 8/14/2012 8/15/2012 8/16/2012 8/17/2012 8/18/2012 8/19/2012

•

Key challenges included capturing the effect of holidays and the recent trend on the timing and width of the charging load peak.

•

The model was configured to rebuild with live feed of data every half hour.

•

A cross-validation algorithm that optimized parameters by minimizing mean squared error (MSE) significantly improved the accuracy of predictions.

System Architecture

The cross-validator has been designed as a separate module which runs once a day and computes the model parameters which give the best prediction on the training data using k-folds methodology. The k for this module is one-week worth of contiguous observed charging load data. The model builder runs once every 30 minutes and builds a new model using the parameters computed by the most recent run of the cross-validator module.

Conclusion The computer-generated models are reasonable estimates of future energy demand. Even more

accurate results could likely be achieved using multiple years of data as the computer model could more easily learn seasonal cycles.

References Burges, C. “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, vol. 2 (1998): 121-167. Farmer, J. D. and Sidorowich, J. J. “Predicting Chaotic Time Series”, Physical Review Letters, vol. 59 (1987).