Real-Time Server Overloaded Monitoring Algorithm ...

2 downloads 0 Views 3MB Size Report
Kentucky State University. Frankfort, Kentucky. Siddhartha Bhattacharyya. Dept. Computer Science. Florida Institute of Technology. Melbourne, Florida.
Real-Time Server Overloaded Monitoring Algorithm Using Back Propagation Artificial Neural Network

Jongouk Choi, Chi Shen, and Jens Hannemann Dept. Computer Science Kentucky State University Frankfort, Kentucky

Abstract— In recent years, downtime and information loss problems of server computers have become more critical. Even if a server has antivirus and CPU overload checking programs, it may occasionally be broken or slowed down. Hypothetically, there are possible indications of the problems under specific external situations such as high temperature, low fan speed, and extreme main board vibration. In order to recognize the correlation between external conditions and the overloaded problems, a monitoring computer collects data from external sensors. By using an accelerometer, an anemometer, and a temperature sensor, the monitoring algorithm is able to predict the target computer's status. A web application has been developed to help server mangers to remotely know how server computers are operating. This paper proposes a server overloaded monitoring algorithm for a target computer, based on the fast Fourier transform, multivariate linear regression, and learning algorithms. As a result, this paper suggests that a monitoring algorithm can be implemented with an artificial neural network that warns of possible malfunction cases. Index Terms— Server Overloaded, Computer Stress, Triaxis accelerometer, Anemometer, Temperature Sensor, Fast Fourier Transform, Multivariate Linear Regression, Back Propagation, Machine Learning, Artificial Neural Network, Classification, Prediction Algorithm, Warning Algorithm. I. Introduction These days, one of the most controversial issues is the one surrounding with Internet devices. For instance, most of home appliances have been connected to each other through Internet, so users can control them on their smart phone. This trend allows people to live more simply, however; it can cause a serious problem because all of the smart devices depend on a server. If the server broke or burned down, it would cause an entire black-out disaster. A server overloaded monitoring algorithm recognizes the possible problems by using external devices such as anemometer, accelerometer, and temperature sensors. This monitoring algorithm has basically six different variables; temperature of a computer, signal vector magnitude of an accelerometer sensor, frequency of the vibration, wind speed, CPU temperature, and Hard Drive temperature. These are all external conditions because the basic concept of this project is not to burden a target computer with any software or hardware operation. The monitoring algorithm classifies

Siddhartha Bhattacharyya Dept. Computer Science Florida Institute of Technology Melbourne, Florida

predicted target computer’s conditions into three status; idle, safe, and unsafe. In order to classify them, the algorithm approaches three different methods; Fast Fourier Transform, Multivariate Regression, and Learning Algorithm. First of all, the monitoring algorithm calls a Fast Fourier Transform function (FFT) to analyze a tri-axis accelerometer sensor’s movement. This method is similar with the “counting steps” application of a smart phone. It processes real time data from the sensor and estimates how fast and regularly the sensor vibrates. For example, a target computer should not be vibrated when its power is off. On the contrary, the harder the target computer works, the more violently the target computer vibrates. Secondly, the monitoring algorithm predicts a CPU temperature by using a multivariate linear regression method. The algorithm analyzes a dataset and defines a linear regression equation, so that the algorithm can predict the temperature inside of a target computer. Thirdly, an artificial neural network (ANN) is a selflearning algorithm. From the real-time dataset, the ANN defines weights’ value on input, hidden, and output layers by using a learning algorithm. The algorithm predicts a target computer’s condition. Once the predicted condition is turned out to be wrong, the monitoring algorithm will update it. In sum, the real-time server overloaded monitoring algorithm recognize a dangerous condition of a server computer by using three different methods; FFT, multivariate regression, and ANN. II. Data Collection The server monitoring system has four layers; Arduino, Python Interface, MySQL database, and Web application (see in fig.1). Basically, all raw data come from four Arduino sensors; two temperature sensors, an anemometer sensor, and a tri-axis accelerometer sensor. On the Python layer, a monitoring system runs three different algorithms to measure a target computer’s status. After classifying the status, the system saves the result into a MySQL database. Finally, on the web application layer, the real-time data are shown on a web page retrieving from the database

an ANN algorithm. This is a self-learning algorithm. Even if it failed to predict an exact condition, the algorithm would be able to fix the error and learn from the experience. A. Fast Fourier Transform A basic concept of this algorithm comes from a counting step application. A tri-axis accelerometer sensor measures its acceleration value, and the value can be converted to the SVM. Because these SVM are continuously measured, they are able to be considered as analog signal outputs (see in fig.2) [12].

Figure 1. Server Monitoring System Block Diagram The wind speed measuring function basically needs to set four constants; minimum voltage signal, minimum wind speed, maximum voltage speed, maximum wind speed. Supposedly, y is a wind speed, and x is an analog output signal data from a sensor. Then, a linear equation for measuring a wind speed is able to be described as following:

y = x×0.004882814 − Vmin ×0123/(6123-6189) (1) The constant conversion value is defined as 0.004882814, it is the result of this calculation as following: 0.5*10/1024 (Arduino Input Voltage: 5V, Analog to Digital Converter: 1024). The temperature sensor is based on the Kelvin scale, so a temperature measuring function converts the output data to Celsius or Fahrenheit type. Furthermore, the constant conversion value, which is used for the anemometer sensor, is also necessary. The tri-accelerometer sensor sends its raw data to an Arduino board through I2C protocol. Since this I2C protocol supports up to 400 kHz communication frequency on an Arduino board, it is able to send messages much faster than a board default setting, 100 kHz. In every fifty milliseconds, the acceleration value in three dimensions is sent. For example, a raw data, “0, 0, 0”, means the origin of three dimensions. This data can be converted to a signal vector magnitude (SVM).[2]

! " + $ " + % " = '() (2) III. Recognition Algorithms This paper suggests the best algorithm to precisely predict a target computer’s condition. The first method is to use a FFT algorithm. This algorithm analyzes movements of an acceleration sensor and checks the movements whether the movement is in idle, safe, or unsafe. The second method is to use a multivariate linear regression. Previous studies have reported that a temperature of a fan is related to a temperature inside of a computer. By using this method, the monitoring system predicts a CPU temperature. The third method is to use

Figure 2. SVM graph By using the FFT algorithm, the analog signal is changed to a digital signal. Finally, from the converted digital signal, the FFT algorithm is able to measure a frequency of the sensor’s movement and the number of vibrations. B. Multivariate Linear Regression The monitoring system has three variables: a temperature of a computer fan, a wind speed, and a SVM value. With these variables, the linear regression model predicts a temperature of CPUs or HDDs which are inside of a target computer. Manufacturers warn that their products should not be heated over a certain temperature. Unfortunately, a safe temperature range of each product is different from each other, so the algorithm should know about computer hardware specifications for the first. The linear regression is the most widely used in the field of machine learning. A model of the multivariate linear regression is given as following:

y = #$ + #& '& + #( '( + ⋯ #* '* + + (3) ℎ" # = %& #& + %( #( + %) #) + ⋯ + %+ #+ = % - # (#& = 1) The sum of squared error should be minimized as following [3]: 1 S = *2&[%& + %( )(* + %+ )+* + ⋯ %- )-* -/* ]( !S !#$

(4)

3

=2

#' + #) *)+ + #, *,+ + ⋯ #$ *$+ -/+ [*$+ ] = 0 +4'

And, a gradient descent algorithm is [4] : 1 1 !" ∶= !" -' (ℎ, (- . )-0 . )-" . ) .23 (j = 0,…n , "#

$

= 1) (5)

By using the multivariate linear regression, the correlation between external factors and computer hardware can be described (see in fig.3).

Figure 5. Artificial Neural Network Map A neural network for this example can be also described in matrix as following: Figure 3. Three Dimension Correlation Graph C. Back Propagation All of variables can be considered as neurons on an input layer. The neurons are connected to another neurons on a hidden layer, and the neurons on the layer are connected to the other neurons on an output layer. Finally, the neurons on an output layer will be able to classify a target computer’s status. Each neuron has an activation function in itself (see in fig.4).

'" !" !# → '# '$ !$ '(

$

ℎ* (!)# → ℎ* (!)$ ℎ* (!)(

$ $ $

By using a multiclass classification method, the monitoring algorithm recognizes three different status; idle, safe, and unsafe. They are described in matrix as following: ! ∈ $% 3 '()**+* ,

1 0 0 -. /. 0 = 34(+, 1 = 5)6+, 0 = 78*)6+ 0 0 1

The forward pass propagation is a base of a back propagation algorithm. From a given training example (x,y), the forward propagation can be explained as following [1]:

!

"

= x

!(#) %&' = ) (*) !(*) +,' Figure 4. Neuron Design The activation function can be the one of four functions: unipolar sigmoid, bipolar sigmoid, tan hyperbolic, and radial basis function. For this project, the monitoring algorithm uses a unipolar sigmoid function as an activation function because it is faster than others on a python layer: 1 1 + # -%&

(6)

The algorithm trains an artificial neural network by updating weights. A vector between two connected neurons is called a weight. For example, there are three input neurons, four hidden neurons, and three output neurons (see in fig.5).

!(#) %&' = g(!(#) *+' )

!(#) %&' = )

*

!(*) +,'

!(#) %&' = ℎ* + = ,(!(#) -.' ) The main objective of this back propagation algorithm is to minimize an error rate. Since the artificial neural network is designed in a way of a forward pass, the back propagation algorithm approaches the network backward. Supposedly, the error rate is, δ, and it can be also described as δ"# %&'()% * +% ,-.&( , . For the neural network, the overall error rates can be found in this way:

($)

δ"

($)

= a" -'" ,

($) δ"

= ("

(#) %

)

()) δ( *+'(-(#) ./0 )

In order to minimize these error rates, the equations above should be differentiated. Finally, the back propagation algorithm is defined as following:

But, the problem is that the algorithm, which is based on the saved dataset, is not applicable to every server computer because it is limited to experimented computers. Therefore, there are three possible prediction error cases; wrong idle, wrong safe, and wrong unsafe status. In order to recognize the possible error cases, a threshold checking module checks if a target computer is still working or in idle state (see in fig.7).

∆"#$ = 0 ()*+ ,-- -, /, 0) ∆"#$ ≔ ∆"#$ + (# $ δ" $*+ D"#$ ≔

1 $ ∆"# + +,"#$ -. / ≠ 0 (

D"#$ ≔ ! &

"#

%$1 $ ∆ *+ , = 0 ( "# ' ( = *+,(7)

To put a sensor inside of a target computer and to run a CPU temperature measuring program on it are not allowed. The linear regression model is limited to specific server computer models which have been tested by this project. However, this learning algorithm is able to adjust to every case by updating its dataset. When the monitoring system starts to run, a learning algorithm sets up a classification model by analyzing a saved dataset. The dataset is saved in a CSV file which consists of all the essential records such as broken conditions of experimented computers, possibly problematic conditions, and safe conditions. These records have been collected since this project started. By using this records, the algorithm predicts a target computer’s condition (see in fig.6).

Figure 6. Monitoring Program Flow Diagram

Figure 7. Threshold Checking Module When one of these problems is found, the algorithm updates itself (see in fig.8).

Figure 8. Updating Dataset Thread Flow Diagram

First of all, the learning algorithm may predict the target computer is in idle state even if it is not. This project defines the idle wind speed as 0.4. If the learning algorithm predicted wrong, it would check a current wind speed and fix the wrong predicted data, and then it would save the updated data into a dataset. Secondly, the algorithm can cause a safe condition error. The classification model and its results are based on previous experiments. Unfortunately, it is possible that a server computer suddenly broke down, and the algorithm kept saying this condition was safe. Or, the computer cooler possibly stopped working all of sudden. The algorithm is able to recognize these problems. If one of these problems happened, the learning algorithm retrieves data for last ten minutes from a database and fix them. Then, the classification model is retrained with new dataset. Thirdly, the algorithm may predict the server is unsafe. Every expected-unsafe condition is temporarily saved into a cache memory. If the target computer works well after ten minutes from the condition, the expected-unsafe condition will be fixed to a safe condition. And then, the updated data is saved into the dataset to train the classification model. In sum, this learning algorithm is getting smarter. When it predicts a target computer’s condition for the first time, the result might not be so accurate. However, after experiencing the prediction error cases, the algorithm will update itself. Hence, the result is also getting more accurate (see in fig.9)

Figure 10. Server Computer Specification Table A. Idle Prediction First of all, the “Idle prediction” experiment was made. Theoretically, when a target computer is in idle state, a wind speed and a SVM value is relatively low, and the temperature outside of a target computer is close to the temperature in the server room. To test the target computers in idle state, the computers reboot and stay. From the Idle prediction experiment, this project was able to prove the self-learning skill of the learning algorithm. In addition, the linear regression had no error because it had a wind speed threshold. However, the learning algorithm had prediction errors for the first time. After updating with real data, the algorithm was able to predict correctly.

Figure 10. #1 Target Computer Idle Experiment Result

Figure 11. #2 Target Computer Idle Experiment Result

Figure 9. Untrained and Trained Learning Algorithm

IV. Experiments In order to test the monitoring algorithm, this project has experimented on three different server computers with an Arduino device in three different conditions and checked whether multiple linear regression algorithm and learning algorithm are able to recognize the conditions (see in fig.10). Basically, each computer has a different CPU, and each CPU has a different maximum temperature threshold.

Figure 12. #3 Target Computer Idle Experiment Result During the second experiment, the learning algorithm mostly made wrong predictions. Because these results were based on previous experiments of another target computers, so it was

possible to make wrong predictions. However, these errors were fixed after updating the learning algorithm with real data B. Safe Prediction Secondly, the “Safe Prediction” experiment was made. In this safe state, a target computer should work as usual, and there should be no stressful task or extreme temperature variation. In order to test server computers in safe state, this project let the computers work normally as servers. And then, this project compares two prediction models with three server computers.

To burden CPU, RAM, GPU, and HD of a target computer, this project runs a stress test software program, Heavyload [5]. This software has been used since this project started. By using this program, the target computers’ components can be overburden. During this experiment, the Heavyload program had been run until a target computer extremely overloaded. As a result of these “Unsafe Prediction” experiments, the first and second server computers worked fine, but the third server computer was broken because of HD burnt.

Figure 16. #1 target computer unsafe experiment result Figure 13. #1 target computer safe experiment result

Figure 17. #2 target computer unsafe experiment result Figure 14. #2 target computer safe experiment result

Figure 18. #3 target computer unsafe experiment result Figure 15. #3 target computer safe experiment result In these experiments, the linear regression model needed each CPU temperature threshold to make a decision tree. Even if a user manually input the thresholds into the model, it would sometimes predict wrong results, and the results are fixed. From these experiments, the vulnerability of the linear regression prediction model was found. Because the linear prediction model was fixed with the saved dataset, it was unable to adjust to different cases which the model had not experienced before. On the other hand, the learning algorithm could fix these wrong predicted results. The updated prediction model was able to recognize the safe condition correctly. C. Unsafe Prediction

Even if the first and second computer had been overloaded by a stress test program, they did work fine as usual. These cases should be considered as safe conditions of the target computers. Even though the linear regression model predicted the first and second computer were in unsafe state, the model should reconsider the cases. The learning algorithm is able to fix the error cases, but the linear regression model is not. Consequently, it is proven that the learning algorithm even adjusts to prediction error cases. V. Conclusion The multivariate linear regression model was not applicable to all computers because every computer have

different CPUs, and each CPU has a different temperature threshold. When a target computer’s CPU temperature threshold is different from a dataset, the linear regression model can cause a prediction error. In addition, the FFT classification model was also unable to predict a target computer’s condition because the range of SVMs was not large enough. A computer’s board does not extremely vibrate on a regular basis even when it is in unsafe state. On the other hand, the learning algorithm was generally able to recognize all of server computers’ conditions. Even if the algorithm needed some time to set up an accurate classification model, it is still valuable. The strongest point of this algorithm is that the more it experiences, the more accurately it predicts. In conclusion, the prediction method by using a linear regression is not enough to monitor a server computer because of various temperature thresholds. Alternatively, this project suggests that to use the learning algorithm is a more accurate and powerful method to predict a server computer’s condition. Acknowledgment First, and most of all, I would like to thank Dr.Shen, for her expertise, assistance, guidance, and patience throughout the process of this research. Without her help this paper would not have been possible. I would like to thank Dr. Bhattacharyya and Dr.Hannemann, for their support, suggestions, and encouragement.

Last of all, I would like to thank my family and everyone who helped contribute to this project, thanks for keeping me company on long walks. References [1] Adler, David. "Simple Back-propagation Neural Network in Python Source Code (Python Recipe) by David Adler ActiveStateCode (http://code.activestate.com/recipes/578148/)." Simple Back-propagation Neural Network in Python Source Code « Python Recipes « ActiveState Code. N.p., 30 May 2012. Web. 06 Oct. 2016. [2] Kim, Yun Kyoung, Sung-Mok Kim, Hyung Suk Lho, and We-Duke Cho. "Real-Time Step-Count Detection and Activity Monitoring Using A Triaxial Accelerometer." Intelligent Automation & Soft Computing 18.3 (2012): 247-61. Web.. [3] Caraciolo, Marcel. "Machine Learning with Python Linear Regression." - Artificial Intelligence in Motion. N.p., 27 Oct. 2011. Web. 06 Oct. 2016. [4] Ng, Andrew. "Machine Learning." Coursera. Stanford, n.d. Web. 08 Oct. 2016. [5] "HeavyLoadv3.4." HeavyLoad. N.p., n.d. Web. 08 Oct. 2016.

Suggest Documents