based data center monitoring (DCM) solution which includes the hardware system and ... of DCs and server rooms experience 1-4 downtimes a year. [3] due to ...
A Scalable WSN based Data Center Monitoring Solution with Probabilistic Event Prediction Sunil Kumar Vuppala, Animikh Ghosh, Ketan A Patil, Kumar Padmanabh Infosys Labs Infosys Limited, Bangalore, India Email: {sunil vuppala,animikh ghosh,ketan patil}@infosys.com
Abstract—The two most important objectives of the data center operators are to reduce operating cost and minimize carbon emission. Consolidation of data centers is not always possible and big enterprise end up having data centers in multiple locations across different cities and countries. In such a diverse deployment manual monitoring is not a cost effective solution. ASHRAE [1] suggested considering the energy efficiency as key factor in data center design. Our initial experiments reveal that a reduction in one degree Celsius of data center room temperature results in 4% excess consumption of electricity. We developed a WSN based data center monitoring (DCM) solution which includes the hardware system and an enterprise application. We deployed the hardware system in hundreds of location at 7 different cities and monitored them from a central enterprise application dashboard. In this paper, we describe the system architecture and analyzed data that was captured for nine months. This is one of the largest real life WSN deployment and based on the result we argue that the manual monitoring cost of data centers is reduced by 80%. This deployment also helped in avoiding a significant amount of carbon emission. DCM also provides a mechanism to predict events in real time. Keywords-data center, monitoring, control, gateways, motes, cost, DCM
I. I NTRODUCTION Data centers (DC/DCs) are the places in enterprise where multiple servers host applications or store critical data. There is a need for sophisticated ambience control. The ambient parameters in DC such as temperature and humidity are likely to change based on the varying load on the servers and alteration in outside weather. There are sophisticated ambient control systems available in the market [2] which does rack and server level precision control and can set their operating parameters adaptive to the ambience. However, there are enterprises which have multiples of small DCs spread across different cities or countries. The available complex solutions may not be a cost effective choice for deployment in small DCs. Wireless sensor networks are emerging as an enabling technology for many applications which require quantities measured at multiple points in the physical world. A typical Wireless Sensor Network (WSN) consists of tiny sensor nodes (termed as motes) with the capability of performing integrated sensing, data processing and transmission. In the proposed solution we leveraged the enterprise network along with the WSN to come up with a cost effective and highly scalable solution. Apart from monitoring, it can be used to reduce carbon emission and predict real time events. According to
research, energy consumption by DCs within US is getting doubled every 5 years from the year 2001 and more than 60% of DCs and server rooms experience 1-4 downtimes a year [3] due to change in operational conditions such as rise in temperature, UPS failure, humidity, water leakage, smoke and inappropriate air flow. Most of the DCs in large corporate organizations (who have large number of small DCs) are still monitored manually and optimal working condition are hardly achieved. In this paper we are describing the DCM solution that monitors all possible ambient parameter and is extensible to incorporate future sensors. DCM is automatic and generate different alerts either to avoid any potential downtime or to save energy. The DCM solution is now operational in our organization for more than nine months. Initially it was a test bed spread across 7 different cities which eventually evolved as a full-fledged solution. Following are the contribution of this paper: 1) System Architecture: We describe the system architecture that leverages internet and ZigBee to aggregate sensor data available in distant geography and locally. 2) Gateway-motes communication Protocol: There is a hierarchal network of sensors. Wireless sensors (motes) report to gateway on wireless link. All such gateways form a network and report to an enterprise server via Ethernet. There is a combination of Server Push and Client Pull mechanism of data collection. This communication protocol defines the entire process of data collection. 3) Probabilistic Event Prediction (PED): The system can predict events such as sudden steep rise or fall in temperature reported by any sensor placed at a rack level in the DC. A sudden steep rise in temperature indicates chances of fire hazard where as a steep fall in temperature indicates over-cooling. Over-cooling indicates wastage of energy. These type of event prediction in real time are based on a novel formulation of Hidden Markov Model (HMM). This will enable the DC operators to take preventive measures. This prediction can reduce the chances of accidents (such as fire hazards) or damage of server room equipment’s due to abnormal heating, over-cooling and many other factors. The DCM hardware facilities monitoring of DC ambient parameters closely and raise alerts in case of threshold breech
(thresholds are preset by admin). This allows the DCs to be operated in an energy efficient way by maintaining the operating level of these ambient parameters at the edge of the ASHRAE [1] permissible limits. The presence of this alert based DCM solution reduces the manual operating costs of DC significantly. The rest of the paper is organized as follows. In section II, we give an overview of existing DCM solutions from the literature. We present the DCM system architecture in section III. The results are analyzed and insights are presented in section IV and finally the conclusion is drawn in section V. II. R ELATED W ORK Sensors have been deployed to collect data remotely and eliminate human involvement from the system. Sensor networks have turned out to be one of the key solutions for automated, controlled and structured gathering of data continuously. Sensor networks are deployed for measuring precision values in agriculture [4], addressing area coverage problems [5], [6], [7], [8], animal habitat [9], [10], [11], predicting hazardous volcanic eruptions [12], earthquake monitoring [13], forest fire [14] and even for body area sensing [15] using biomedical or actuating sensors. Worldwide sensor deployment strategies are undertaken for research or industrial purpose [16] to address a variety of problems described above which may require outdoor as well as indoor deployment. The key focus of this paper revolves around strategic indoor deployment of sensors. There are various indoor scenarios that may require strategic sensor deployment [17], [18], [19]. One of those challenges in hand is the data center monitoring [2]. It involves thermal management [20], temperature aware workload distribution [21], secure and efficient data transmission for high performance data intensive computing [22]. To address such critical requirements automated data gathering appears to be necessary and it may involve deploying field sensors or such automated monitoring devices. RacNET [23] provides high temporal and spatial fidelity measurements that can be used towards improvement of DC safety and energy efficiency. RacNET deploy sensors to assure high fidelity measurement to track wastage of energy which may be due to unnecessary operational hours of CRAC systems, water chillers and (de)humidifiers. The authors of RACNET claim their contributions in terms of safety as tracking heat distribution and predicting thermal runaways. The June 2005 issue of AFP Magazine [24] fire hazard has been pointed out as one of the costly damages that may affect computer and DCs. So steps to prevent or predict onset of fire is extremely critical. There are smoke detection [25] and alerts systems available but to the best of our knowledge there does not exist a data center energy monitoring device that comes with inbuilt smoke sensors, sensor to detect water leakage or wet floor which are extremely critical to maintain the overall DC safety. SynapSense [2] mainly addresses environmental savings by
developing a DC optimization platform. The SynapSense Data Center Optimization Platform is comprised of sensor nodes, gateways, routers and server platforms. It interprets temperature, humidity and sub floor pressure differential data from thousands of sense points. It also enables measuring of power and incorporate pre-existing BMS data via BACnet, Modbus and SNMP. Monitor ambient parameters to stay within specified ASHRAE ranges and provide alerts when boundaries are exceeded. But standard SynapSense projects are used for floors of area above 20,000 square feet and thousands of sensing points and the data can be used for complex ap InterSeptor Environmental Monitoring System [26] from Jacarta is a fullfeatured, scalable network environmental monitoring device designed to remotely monitor temperature, humidity and other environmental conditions in DCs, IT rooms and racks. Email and SNMP alerts are available as standard. But the InterSeptor unit does not communicate via wireless so cannot form a peer to peer network. Absence of such group level network operational architecture removes the possibility of pushing edge intelligence to address group level decision making ability at the device level. Also, unit price of approximately $800 may be considered too high for a DC rack level sensing component by any enterprise. Summarily these works do not have extensive application platform and their hardware cannot incorporate existing sensors in the building as their hardware is not extensible. Moreover they do not give alerts to the operator to set operational threshold parameter which will minimize energy consumptions. III. DCM H ARDWARE D ESCRIPTION The goals of the system are two folded. First is to design cost effective WSN based monitoring application for geographically distributed small DCs. Second is to avoid unfortunate events in the system by using prediction model. The entire solution can be broadly classified into four categories. The first one is system components. The second category is the protocols used for communication and third one is the system architecture that describes how different components are collaborating with each other. The last one is the methods and functionalities in the system. A. Components: The different components of DCM solution are categorized as hardware, embedded software and application software. 1) Hardware: The DCM hardware comprise of wireless module and sensor module as depicted in figures 1(a) and 1(b). The wireless module is also known as ”mote” and is capable of working independently. The wireless module is based on CC2431 System-OnChip from Texas Instruments. The CC2431 has two systems fabricated in a single chip namely IEEE 802.15.4 compliant CC2420 RF transceiver and an industry-standard enhanced 8051 MCU (Micro Controller Unit). The chip has 128 KB flash memory and 8 KB RAM. The wireless module has an embedded battery monitoring chip DS2438. The presence of
(a) Sensor Module
(b) Wireless Module Fig. 1.
Hardware components for DCM
this chip avoid imposing battery monitoring task to the microcontroller. Thus charging of the wireless module is possible even when the micro-controller is in sleep mode. It also has an on-board Intersil ISL29023 light sensor. For design simplicity and to reduce cost of the module we have used the internal temperature sensor of DS2438 for temperature monitoring purpose. The wireless module is powered by a 900 mAH Lithium-ion battery placed at the bottom of the PCB for space optimization. For extended range of communication we have used CC2591 low noise amplifier (LNA) which acts as a booster and useful for impedance matching. We have tested this wireless module which is a full fledged mote and it is giving a range of 800 meters in line of site environment. We anticipate lots of interference in DCs and this power amplifier increases reliability in message delivery. The sensor module consists of a 16 bit PIC24FJ256GB110 micro-controller from Microchip. This micro-controller supports up to 16 MIPS Operation at 32 MHz. It has 16K RAM and 256k ROM. The DCM module has an embedded Honeywell SHT15 Humidity sensor, LM92 temperature sensor from National Semiconductors. The module has the ability of being powered through USB along with standard power outlet. The module supports serial, USB, Ethernet,and ZigBee mode of communication with its peers. It has 4 analog and 4 digital I/O pins. The analog I/0 pins has the capability to support 4mA-20mA current loop and 0-5 V analog inputs. This can be used to connect external sensors. Its amplifier is controlled by the software and hence varieties of sensors can be directly connected. This Hardware configuration of sensor module and wireless module gives the flexibility of reusing the system for different WSN application verticals. 2) Embedded software: MOJO [27] is the middleware to process and extract information from sensor packets. MOJO abstracts the complexities of wireless sensor network and present the Application Programmable Interfaces (APIs) to the developer. Thus developers need not worry about the physical motes as the functionalities are available via APIs. Further
information about MOJO can be collected from [27]. 3) Application software: The application software is built on top of the processed sensor data which includes business logic, database and user interfaces. B. The network communication Each DC may need several motes and gateways as per the size of the DC room and requirement of the sensor parameters to be monitored. Each DC has at least one DCM gateway to collect the data from motes. Motes communicate to the gateway using ZigBee wireless protocol. The mote and gateway used in DCM implementation are shown in figure 3. • Communication between gateway and server: The server communicates with the DCM gateway through Ethernet medium. Similar to UPnP device the DCM gateway acts as a DHCP client and search for a DHCP server when the device is first connected to the network. The communication mechanism involves both Server Push and Client Pull. Discovering the DCM gateway over Ethernet is done using light weight device discovery protocol. In this method the network is periodically flooded with multicast UDP packets encrypted with in-house lightweight encryption algorithm suitable for embedded applications. The UDP packets are picked up by DCM gateway and decrypted to verify the identity of the server. Due to the business strategies the description of encryption is not in the scope of this paper. The initial communication setup is completed by sending DCM gateway description to the server for registering the device. The server push is initiated at regular interval to collect data from the DCM gateway. The client pull mode of communication is possible in the occurrence of high priority event that should be informed to the server. For example in case of low battery status of a mote the gateway needs to inform the server immediately and should not wait for server push. • Communication between motes and gateway: The DCM gateway communicates with motes associated with it
using low-power ZigBee wireless protocol. The DCM gateway acts as master coordinator polling each mote for health status and data at specific periodic intervals. Addition of new motes or deletion of dead motes to/from the network are handled by the ZigBee wireless association/disassociation schemes. The DCM hardware system has temperature, light, humidity and battery monitor sensors, which are available on board. There is a provision of connectivity to the external sensors along with the on board sensors. The external sensors used in our deployment are smoke sensor, water leak detector, 30 kVA UPS and 160 kVA UPS status detectors.
Fig. 2.
the option to change the threshold values as per the need. For example, the room temperature is normal till 25°C, it is in first level alert if it crosses 25°C but less than 27°C. Second level of alert is above 27°C and below 30°C. High alert is raised if it is above 30°C. Administrators can set alerts based on the severity and sensor type to the concerned persons using the DCM Solution. Based on the threshold configuration, the respective alerts will be notified via SMS and Email. The application is easy to configure and deploy. Live deployment of the DCM gateway in a server room is shown in the figure 4.
Hardware for DCM
C. System Architecture: The architecture of the system is depicted in figure 3. Fig. 4.
Live Deployment of DCM Gateway
D. Functionalities and category of users in the system: Different categories of users in the system are guest user, operator and administrator. Various functionalities in the system are assigned to different users as shown in the table I. TABLE I F UNCTIONALITY AND USERS CATEGORY ACCESSIBILITY
User Type
Monitor Analysis Config
Guest user Yes Operator Yes Administrator Yes Fig. 3.
DCM System Architecture
The motes transmit sensor data to DCM gateway which aggregates data from all associated motes and transmit that aggregated data to the server. At the server the data is processed by the MOJO middleware. After the extraction of the sensor readings, the system verifies if any alert condition is met. In case of alert condition, the corresponding alert is raised and assigned to the concerned operator of DC. All controller functionality and the business logic are implemented in the core of the web application and web interfaces are provided to the end users for viewing / analyzing the aggregated data and alerts. The DCM Solution gives provision to set a hierarchy of threshold levels for all the types of sensors located in various DCs. In general, the threshold settings will be same across the organization for different DCs. But the web interface provides
Yes Yes
Yes
User Mgt Yes
Test cases Yes Yes
1) Monitoring: These are set of features that enable display of live sensor data on DCM Solution dashboard and allows real time monitoring of alerts from the dashboard by applying location filters at different levels like country, city, building and DC. 2) Analysis: This set consists of the log files, reports and analysis which is accessible to registered users and administrators for off line analysis. 3) Configuration: The configuration and settings are accessible to the administrators where they can set the thresholds for individual motes, configure new motes, assign alerts to users etc... 4) User Management: These set of features helps in user registration process and edit user profiles. The approval is subject to the authorization from the administrator.
5) Test Cases: These features allow the user to test the correctness of the system and sensor data, which is used during deployment and the maintenance of the system. E. APIs used in DCM Solution: The data transmitted to the DCM application server from the DCM gateways is processed at MOJO and the live sensor data is available in the data structures termed as device cloud and the database. The server has different methods based on the sensor data, which are specific to a location like setData, listAlertThresholds, checkAlerts and Alert number generator. The server has few more generic methods like alertProcess to process the alerts and notify via SMS and email. 1) setData: The method updates the sensor data into the device cloud and in the DCM central server database. 2) listAlertThresholds: The method lists the threshold values of all the sensor readings for a mote. Each sensor reading is compared to the alert-threshold list to identify any alerts in the system. 3) checkAlerts: In this method, alert is checked for each sensor reading against the thresholds defined. Hysteresis behavior is observed for temperature, humidity, light and voltage readings before generating the alerts to avoid false positive conditions. An alert is generated if the specific condition is satisfied for an alert level. If an alert is generated for one alert level and a particular mote and sensor type, the alert is not again generated unless there is a change in the alert level or no response in 30 mins/ one hour (reference values) time frame for that particular alert. 4) AlertNumberGenerator: This is generation of unique reference number for the alert signifying the location, mote and sensor type. Alert is tracked for its status using this reference number. 5) AlertProcess: Once the alert is generated, the server processes the particular alert and notifies via SMS or Email or both to the concerned person. If there is no response for the generated alert in first 30 minutes, then another alert is generated with escalation. These 30min, one hour are taken as hypothetical times, which can be set as per the business application requirement. IV. R ESULTS & A NALYSIS In this section we describe our deployment set up, the method of our data collection and subsequently we analyze the gathered data. A. Deployment setup: The experimental setup in our organization is one of the large scale real time WSN deployments in true distributed environment. The DCM gateways and motes are spread across thousands of kilometers in distance in various cities in India (Bangalore, Hyderabad, Pune, Mysore, Chennai, Chandigarh and Mangalore) and system is operational from more than nine months. The system gives the flexibility for the operators to set various threshold levels for each sensor in the motes. We have collected millions of records of sensor data which is analyzed
for concluding interesting insights. The data is available for offline analysis and for generation of reports. There are number of parameters affecting the DC operation namely: HVAC system, number of servers, dimension of the room and external weather conditions. For HVAC system, the variables affecting the energy consumption level are outlet water temperature, fan speed and damper positions. Temperature, humidity, carbon dioxide levels are going to offer a particular comfort level of DC operation. Chiller electrical load can be expressed in terms of percentage of full load amps (FLA) which is a power consumption indication. There is a relation between electrical power consumption and outlet temperature set point. The temperature can be measured at four levels in this type of system. Chiller level temperature, HVAC set point temperature, room and rack level temperature inside a DC. In this paper, our experiments are focused on measuring the room level and rack level temperatures. B. Manual Monitoring Cost Savings: With the introduction of semi automated DCM system, we can severely cut down the cost of repetitive manual monitoring. In our organization, the operator visits the server room every 2 hours and logs the temperature and humidity readings manually (Case-1 in Table II: figure 6 represents this case with 26°C as alert condition). With the introduction of this DCM solution, the operator needs to visit only if an alert is raised. (Case-2 in Table II). In response to the alerts generated, the operator needs to fix the corresponding issue. The operator needs to visit the DC at least once. If the alert is raised for temperature only at one corner/rack level, then operator can check the air flow in that particular place to resolve the issue. If the problem persist he may need to fix the issue by changing the A.C. temperature. In some cases, for each alert raised, the operator may visit the DC two times. In the first visit, the operator can fix the issue by adjusting the AC temperature and second visit to set the temperature level back to the normal operating range after the system stabilizes. The sample observation for three days is presented in Table II. TABLE II M ANUAL M ONITORING C OSTS : N UMBER OF VISITS TO DC
Day Day-1 Day-2 Day-3
Case-1 12 12 12
Case-2 2 1 2
Remarks Temperature alert UPS alert Temperature alert
We have observed that a saving of over 80% in manual monitoring costs of the DCs is possible. C. Operating levels of DC: Room level Temperature Experiments: In figure 5, the temperature of four corners in a DC was recorded from 1:00PM 1st June 2011 to 4:00AM on 2nd June 2011. The readings are within the ASHRAE level and on the lower bound. We have conducted the experiments by increasing the temperature level by 2-3 degrees and observed
Fig. 5.
Room level temperature of all corners in a DC Fig. 7.
the potential energy savings. As per our practical observations, one degree centigrade (1°C) reduction in temperature of HVAC corresponds to 4% increase in electricity consumption. In effect, close to 10-12% energy savings are observed by increasing the data center AC temperature and still keeping within the ASHRAE levels and able to generate the alerts as and when the sensor values cross its thresholds. D. The Green Effect: On average, for each kWh of electricity, 743gms of carbon dioxide [28] will be emitted. In a typical example of Infosys, 8% of energy consumption is for the server rooms/DCs. Total energy consumed at Infosys is 250 million units per annum, therefore the server room energy share is a minimum of 20 million units considering 8% share. By using our system, we can save upto 10% of energy which directly translates to saving of 1486 tons of carbon dioxide emission in an year. (2000000 units * 0.743 kg = 1486000 kg)
Humidity pattern for 3 days in a DC
temperature of refrigerant which is carrier of heat/cold (air in our case), fan speed and area of opening. Though the temperature is set at a fixed value, however due to change in environment the temperature of refrigerant changes. Based on these results we recommended our operators that fan speed of HVAC should be changed according to the time of the day. There is a similar recommendation for the humidity. Thus, while it is imperative to change the operating condition of HVAC according to the season, our solution enables the operators to have precise control according to the time of the day. As Future improvements to the DCM Solution we will have a feedback loop for automatic adjustment of temperature F. Probabilistic prediction of abnormal temperature fluctuation by novel formulation of Hidden Markov Model(HMM)
E. Observation of Patterns: The patterns of temperature and humidity are observed over several days at rack level. Consecutive three days data is presented in the graphs. The temperature in degree Celsius (y-axis) and humidity percentage (y-axis) are plotted over a period of 3 days(x-axis) in figure 6 and figure 7 respectively. Fig. 8. Standard Normal Density function of continuously collected temperature data with mean µ = 0 and standard deviation σ = 0.092
Fig. 6.
Temperature pattern for 3 days in a DC at rack level
The temperature variation over a day demonstrate periodicity of occurrence. This is because though the DC have a controlled environment, however outside temperature varies. There are three factors affecting the temperature of DC. The
The real time acquisition of temperature data of seven consecutive days (20, 000 data points approximately at an interval of every 30 sec), collected by sensors(motes) placed on racks of a DC was analyzed to develop a model to predict events (PED). We performed Jarque-Bera [29] test to check if the null hypothesis [30] that the data points are from a normal distribution holds. The observed p-value [30] was 0.20 when the level of significance(α) [30] was set to 5 percent. Since the observed p-value is more than α(0.05), so we accept the null hypothesis.The graphical view of the data belonging to a normally distributed cluster is depicted in Figure 8. By developing a HMM [31] based model it was possible to probabilistically predict sudden rise or fall in temperature based on the current temperature value. In DC scenario this
sudden rise or fall in temperature are identified as events. HMM is a Markov process typically identified by a set of hidden states and observables. Transition among hidden states are governed by a different set of probabilities called transition probabilities. Every hidden state has an observable output which is emitted with known conditional probability distribution called the emission probability distribution. In DC temperature monitoring hidden states are based on temperature range. We define the hidden states as FREEZE (m degree Celsius to (n-1) degree Celsius), NORMAL (n degree Celsius to (o-1) degree Celsius, HOT (o degree Celsius to p degree Celsius). A typical DC should operate in the NORMAL range of temperature, transition to FREEZE or HOT states definitely indicates over-cooling (wastage of energy) or overheating (potential threat of fire) respectively. Now we first define transition probability between hidden states which is more important scenario as it is a direct indicator of chances of sudden rise or fall in absolute temperature. Let X be a random variable representing temperature. Let xi represent absolute temperature at time instance i. Then xi ∈ X, ∀i ∈ 1, 2, 3, ....n. The current temperature at time t is xt and temperature at time instance (t-1) will be xt−1 . Suppose the temperature now is in NORMAL state and we are trying to predict its probability to transit to either FREEZE or HOT state. If xt > xt−1 then there is a chance the temperature may lead to HOT state as a temperature rise is observed.We calculate rate of increase as (xt - xt−1 )/xt−1 . We now predict the temperature at time (t+1) if the same rate of increase continues as: xt+1 = xt (1 + (xt − xt−1 )/xt−1 )
(1)
We compute the z-score corresponding to this predicted value as: zt+1 = (xt+1 − µ)/σ. (2) Let the area under the normal curve between +zt+1 and −zt+1 be P r percent. The transition probability is P r percent. So with a probability of P r the system will transit to HOT state from NORMAL state (current state). Such probability can also be computed for probabilistic transition from NORMAL state to FREEZE state if a fall in temperature is observed i.e. xt < xt−1 . So this probabilistic event prediction facilitates concerned people to take some evasive action to avoid such abnormal rate of temperature overflow or under flow by pushing in more or less cold air through HVAC [32] system or similar preventive measures can be taken. Since the states are hidden we do not observe them but what we observe is absolute temperature change (rise or fall). For each of the possible hidden states there exists a set of emission probabilities. Emission probabilities define the distribution of observed variable at a time based on the hidden variable state at that particular time. The question framed according to our scenario of computing emission probabilities can be: How likely is the current temperature value (xt ) to be in the particular state it should be in ideal scenario? (How likely is
26 °C to be in Normal range? (pre knowledge of Normal range 22°C -27°C) For example if xt = µ, then it is best likely to be in the known state, as area under the bell curve is zero at µ, so probability is 1, so at temperature value xt , we calculate the z-score in normal curve as: z = (xt − µ)/σ
(3)
Corresponding to this z score there will be some percentage area under the curve say pe percent. We claim with a probability of (1-pe ) it belongs to that state. G. Practical Case Study Since the paper is targeted towards server room monitoring, we assume the NORMAL range as ASHRAE recommended level for DCs which is 22°C to 27°C. We defined HOT state to be a wide range over 27°C up to 80°C. We decided the FREEZE state to be a range below 22°C down to 0°C. We observed temperature of a rack in our DC to be 25 degree Celsius on June 3rd,2011 at 5:00 P.M. in the evening. (within NORMAL range) The temperature at the next observed instance at 5:10 P.M. was 25.8°C. (The sensors are sampled at an interval of 10 minutes to maximize battery life). Since we observe a rise in temperature then we are interested to compute the chance of migrating to HOT state. The rate of increase is computed as 0.032. We now predict the temperature at the next instance if this rate of increase in temperature continues. It is calculated to be 26.63°C (refer to ”(1)”). The mean and standard deviation of the NORMAL range are 24.5°C (since range is 22-27°C) and 0.56 (standard deviation computed from real time data logs)respectively. The z-score is computed as 3.8. The area under the standard normal curve between z-value of +3.8 and −3.8 is 0.998. So there is 99.8 percent chance the temperature will violate NORMAL range and transit to HOT state at the next instance. So based on this probabilistic prediction concerned people can take necessary action to avoid such scenario. In our server room monitoring scenario even high chance of transition from NORMAL to FREEZE state was also observed which indicates chances of saving energy by turning off unnecessary cooling. Now we compute the emission probability of the data observed on 3rd June,2011 at 5:10 P.M. The z-score will be (25.8 − 24.5)/0.56 = 2.32 The are under the standard normal curve between +2.32 and−2.32 is 0.9796. So with a probability of 0.0204 (1 - 0.9796) the temperature is likely to belong to NORMAL state. H. Scalability Metrics After the deployment of this system in more than 100 DCs, the number of packets received at the server per day is more than 3,000,000 packets. The other DCM system metrics are listed below in Table III. V. C ONCLUSION The complex solutions for data center are not cost effective for small enterprises whose business is spread across geography. The manual monitoring of small DC may lead to
TABLE III S CALABILITY M ETRICS
S.No 1 2 3 4 5 6 7 8
Parameter Data Centers motes / DC Frequency of data sensing Frequency of data sending to server Packets / min at server Packets / day at server Avg alerts per day Users
Number 102 4 to 16 1 sec 30 sec 2,000 pkts 3,000,000 pkts 20 1000
human error, loss of data and wastage of electricity. Through hundreds of deployment in different cities we have demonstrated that our proposed DCM solution can reduce the manual monitoring costs by 80% and gives provision to operate the DCs in high operating levels by having the online monitoring system and prediction of real time alerts. With this system, potential energy savings of 10% and corresponding carbon emissions reduction are also observed. The effect of outside weather conditions on the chiller level and HVAC set point temperatures is considered in the future work. The present version of the system is capable of monitoring and notifying alerts to the concerned people. The feedback loop is planned in future work with actuation capability to set the HVAC based on real time ambient parameters. ACKNOWLEDGMENT The authors would like to thank Jayraj Ugarkar, Lakshya Malhotra and Sougata Sen for their help and valuable inputs during the system development and deployment. We would also like to thank Arun Agrahara Somasundara and Chinmoy Mukherjee for their review comments. R EFERENCES [1] R. Schmidt, M. Iyengar, and R. Chu, “Meeting Data Center Temperature Requirements,” ASHRAE Journal, April 2005. [2] (2010) www.synapsense.com. [Online]. Available: http://www.synapsense.com [3] Report from avtech. protect your it facility. [Online]. Available: http://www.avtech.com [4] K. Langendoen, A. Baggio, and O. Visser, “Murphy loves potatoes: Experiences from a pilot sensor network deployment in precision agriculture,” in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. IEEE, 2006, p. 8. [5] X. Wang, G. Xing, Y. Zhang, C. Lu, R. Pless, and C. Gill, “Integrated coverage and connectivity configuration in wireless sensor networks,” in Proceedings of the 1st international conference on Embedded networked sensor systems. ACM, 2003, pp. 28–39. [6] S. Meguerdichian, F. Koushanfar, M. Potkonjak, and M. Srivastava, “Coverage problems in wireless ad-hoc sensor networks,” in INFOCOM 2001, vol. 3. IEEE, 2002, pp. 1380–1387. [7] S. Kumar, T. Lai, and J. Balogh, “On k-coverage in a mostly sleeping sensor network,” in Proceedings of the 10th annual international conference on Mobile computing and networking. ACM, 2004, pp. 144–158. [8] A. Howard, M. Mataric, and G. Sukhatme, “Mobile sensor network deployment using potential fields: A distributed, scalable solution to the area coverage problem,” Distributed autonomous robotic systems, vol. 5, pp. 299–308, 2002. [9] R. Szewczyk, E. Osterweil, J. Polastre, M. Hamilton, A. Mainwaring, and D. Estrin, “Habitat monitoring with sensor networks,” Communications of the ACM, vol. 47, no. 6, pp. 34–40, 2004.
[10] T. Wark, C. Crossman, W. Hu, Y. Guo, P. Valencia, P. Sikka, P. Corke, C. Lee, J. Henshall, K. Prayaga et al., “The design and evaluation of a mobile sensor/actuator network for autonomous animal control,” in IPSN 2007. ACM, 2007, pp. 206–215. [11] R. Szewczyk, A. Mainwaring, J. Polastre, J. Anderson, and D. Culler, “An analysis of a large scale habitat monitoring application,” in Proceedings of the 2nd international conference on Embedded networked sensor systems. ACM, 2004, pp. 214–226. [12] G. Werner-Allen, J. Johnson, M. Ruiz, J. Lees, and M. Welsh, “Monitoring volcanic eruptions with a wireless sensor network,” in Proceeedings of the Second European Workshop on Wireless Sensor Networks, 2005. IEEE, 2005, pp. 108–120. [13] M. Suzuki, S. Saruwatari, N. Kurata, and H. Morikawa, “A high-density earthquake monitoring system using wireless sensor networks,” in Proceedings of the 5th international conference on Embedded networked sensor systems. ACM, 2007, p. 374. [14] B. Arrue, A. Ollero, and J. Matinez de Dios, “An intelligent system for false alarm reduction in infrared forest-fire detection,” Intelligent Systems and their Applications, IEEE, vol. 15, no. 3, pp. 64–73, 2002. [15] E. Jovanov, A. Milenkovic, C. Otto, and P. De Groen, “A wireless body area network of intelligent motion sensors for computer assisted physical rehabilitation,” Journal of NeuroEngineering and Rehabilitation, vol. 2, no. 1, p. 6, 2005. [16] (2010) www.federspiel.com. [Online]. Available: http://www.federspielcontrols.com [17] V. Handziski, A. Kopke, A. Willig, and A. Wolisz, “TWIST: a scalable and reconfigurable testbed for wireless indoor experiments with sensor networks,” in Proceedings of the 2nd international workshop on Multihop ad hoc networks: from theory to reality. ACM, 2006, pp. 63–70. [18] A. Mandal, C. Lopes, T. Givargis, A. Haghighat, R. Jurdak, and P. Baldi, “Beep: 3D indoor positioning using audible sound,” in CCNC. 2005 Second IEEE. IEEE, 2005, pp. 348–353. [19] M. Pan, C. Tsai, and Y. Tseng, “Emergency guiding and monitoring applications in indoor 3D environments by wireless sensor networks,” International Journal of Sensor Networks, vol. 1, no. 1, pp. 2–10, 2006. [20] R. Sharma, C. Bash, C. Patel, R. Friedrich, and J. Chase, “Balance of power: Dynamic thermal management for internet data centers,” Internet Computing, IEEE, vol. 9, no. 1, pp. 42–49, 2005. [21] J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making scheduling cool: Temperature-aware workload placement in data centers,” in Proceedings of the annual conference on USENIX Annual Technical Conference. USENIX Association, 2005, p. 5. [22] B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke, and I. Foster, “Secure, efficient data transport and replica management for high-performance data-intensive computing,” in MSS 2006. IEEE, 2006, p. 13. [23] C. Liang, J. Liu, L. Luo, A. Terzis, and F. Zhao, “RACNet: a highfidelity data center sensing network,” in Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems. ACM, 2009, pp. 15–28. [24] Understanding Fire Hazards in Computer rooms and Data Centers. [Online]. Available: http://www.verst.com.au [25] Afcon control and automation system. [Online]. Available: http://www.jacarta.com [26] interseptor environmental monitoring system. [Online]. Available: http://www.jacarta.com [27] S. P. Kumar Padmanabh, Sunil K Vuppala, “MOJO: A Middleware that converts Sensor Nodes into Java Objects,” in IEEE CON-WIRE, 2010, Zurich, Switzerland. IEEE. [28] U. Report from Energy System Research Unit, University of Strathclyde. Electricity consumption and carbon dioxide. [Online]. Available: http://www.esru.strath.ac.uk [29] C. Jarque and A. Bera, “Efficient tests for normality, homoscedasticity and serial independence of regression residuals,” Economics Letters, vol. 6, no. 3, pp. 255–259, 1980. [30] R. Nickerson, “Null hypothesis significance testing: a review of an old and continuing controversy.” Psychological methods, vol. 5, no. 2, p. 241, 2000. [31] B. Juang, “Hidden markov models,” 1985. [32] Q. Bi, W. Cai, Q. Wang, C. Hang et al., “Advanced controller auto-tuning and its application in hvac systems,” Control Engineering Practice, vol. 8, no. 6, pp. 633–644, 2000.