Neural Computing and Applications https://doi.org/10.1007/s00521-018-3747-z
(0123456789().,-volV)(0123456789(). ,- volV)
ORIGINAL ARTICLE
Imbalanced dataset-based echo state networks for anomaly detection Qing Chen1 • Anguo Zhang2 • Tingwen Huang3 • Qianping He4 • Yongduan Song5 Received: 10 February 2018 / Accepted: 20 September 2018 Ó The Natural Computing Applications Forum 2018
Abstract Anomaly detection is a very effective method to extract useful information from abundant data. Most existing anomaly detection methods are based on normal region or some specific algorithms, which ignore the fact that many actual datasets are mainly imbalanced, resulting in not function properly or effectively in practical, especially in the medical field. On the other hand, imbalanced dataset is also a frequently encountered problem in the learning of neural network because the lack of data in a minority class may lead to uneven classification accuracy. In this paper, inspired by these observations, a novel anomaly detection approach by using classical echo state network (ESN), a brain-inspired neural computing model, is presented. The entire dataset of the proposed method obeys an extremely imbalanced distribution, that is, anomalies are much rarer than normal data. And the training dataset has only the normal data. When the ESN is well trained, the parameters in ESN are the memory of normal data. If the normal data are added into the well-trained network, the error between the input data and the corresponding output is smaller compared with the error between abnormal input data and its corresponding output. Then anomaly behavior is detected if the error between the input data and the corresponding predictive value exceeds a certain threshold. Different from setting an invariable threshold arbitrarily for all of the data, the threshold value used in the proposed method is determined from the analysis of information theory and can be adjust adaptively according to different datasets. Experiments on abnormal heart rate detection are conducted to demonstrate and verify the effectiveness of the proposed detection algorithm and theory. Keywords Anomaly detection Echo state network Imbalanced dataset Adaptive threshold
1 Introduction Data have proliferated over with the development of information technology. How to find out the changing regulations and abnormalities from a large amount of datasets has become an important task. In particular,
& Yongduan Song
[email protected] 1
School of Automation, Chongqing University, Chongqing 400044, China
2
Research Institute of Ruijie, Ruijie Networks Co., Ltd, Fuzhou 350002, China
3
Texas A&M University at Qatar, 5825 Doha, Qatar
4
National Institute of Health, Bethesda, MD 20840, USA
5
Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, School of Automation, Chongqing University, Chongqing 400044, China
anomaly detecting, which devotes to the automatic detection of abnormal data embedded in big data, is a typical and efficient way to acquire useful knowledge in practical applications [1–3]. For example, in the field of health care, abnormal heart rate detecting can help doctors to find the cardiovascular-related diseases of patients. In the field of industrial production, the abnormal temperature or humidity detection of equipment can help operators to find the unsafe operation of system, etc. Because of its wide use in diverse research areas and application domains, the methods of anomaly detection can be divided into different categories such as empirical knowledge-based detection methods, statistical-based detection methods [4, 5], neural network-based detection methods [6–8], nearest neighborbased detection methods [9–11] and clustering-based detection methods [12, 13]. However, how to detect the anomalies in the imbalanced dataset is still a challenging and tricky problem. That means the number of abnormal data is substantially small and sometimes not easy to obtain while the number of
123
Neural Computing and Applications
normal data is much more abundant and much easier to find. There are many real-world datasets coming from imbalanced distribution, such as medical data , oil spill detection and fraud detection [14]. Related works to solve this problem can be divided into two aspects, that is, data level methods and algorithmic level methods. At the data level, over sampling and under sampling are two commonly used methods to modify the prior probability to minimize the imbalanced effect [15, 16]. However, over sampling usually can lead to over-fitting [17], while under sampling may result in the loss of information [18]. Different from balancing train data distributions through different re-sampling strategies, the algorithmic level focuses on some modifications on internal algorithm [19, 20] such as cost-sensitive learning [21] and ensemble learning [22], which includes bagging, boosting and stacking. Contrary to these researchers who focus on avoiding the dataset imbalanced problem, in this paper, inspired by the imbalanced dataset problem encountered in neural network, a neural network-based method is proposed to solve the problem of abnormal data detection. That is, the training set of the proposed method only contains normal data, while the test set contains both normal and abnormal data with imbalanced distribute. Specially, anomalies are much rarer than normal data. Thus, the bias toward the normal data may occur, thereby producing poor accuracy for the abnormal data. When the test data are input to the trained well network, an anomaly is detected if the error between the input data and the corresponding predictive value exceeds the certain threshold value. It is notable that different from the existing neural network-based methods which depend on the fixed network parameters or an invariant decision threshold, we propose an adaptive detecting threshold generation method based on the information theoretical analysis, which makes it more reasonable and adaptive for different anomaly detection tasks. Given that recurrent network can accurately convey and store temporal information, reservoir computing is potentially powerful at performing regression computations. In this paper, a novel anomaly detection method is proposed by using the classical echo state network (ESN) [23], which represents one of the two classic reservoir computing models with a recurrent neural network as a reservoir. Different from online updating all of the synaptic weights in most of the traditional artificial neural networks, in ESNs, synaptic connections in reservoir are usually initialized randomly and fixed during the training process, while only the readout layer is trained via a linear regression method. Because of its outstanding advantages of high accuracy, easy training and global convergence properties [24], ESN has been applied in diverse engineering applications such as time series prediction [25, 26], natural
123
language processing [27], load forecasting [28, 29], speech recognition [30, 31] and health care [32, 33]. The rest of this paper is organized as follows. Problem formulation and threshold analysis are presented in Sect. 2. Section 3 provides a brief description of the echo state network. Section 4 describes our proposed method including data preprocessing, neural network training and testing. Experiment results of abnormal HR detection are illustrated in Sect. 5. At last, conclusion is made in Sect. 6 to end this paper.
2 Problem statement and threshold analysis 2.1 Problem statement Considering that in many domains the data are collected in the form of time series, such as flight safety, health-care and fraud detection, the data used in this method are based on the time series, which is a sequence of data points that are ordered by a uniform time interval [34]. The time series prediction of neural network can be expressed as: y~ðtÞ ¼ fN ð xðt kÞ; xðt k þ 1Þ; . . .; xðt 1ÞÞ ðt1Þ ¼ fN XðtkÞ
ð1Þ
where fN denotes the well-trained neural network function with a neglectable tiny prediction error . ðt1Þ
XðtkÞ ¼ ðxðt kÞ; xðt k þ 1Þ; . . .; xðt 1ÞÞ, where x(t) represents the actual measured time series data at time t, k is the input dimension of the neural network and y~ðtÞ represents the network output. In this paper, the invertible transformation fN is constructed by a well-trained ESN model. Considering the following two situations, y ¼ fN ðxÞ
ð2Þ
and y~ ¼ fN ð~ xÞ
ð3Þ
where the variable x and x~ are sample values of the disturbance-free training dataset (or normal only dataset) X and the disturbance-containing test dataset (or normal and ~ y and y~ denote sample values of the abnormal dataset) X. ~ According to the corresponding ESN outputs Y and Y. invariance property of mutual information [35], it holds that ~ ¼ IðY; YÞ ~ IðX; XÞ
ð4Þ
where the mutual information I(X; Y) is a measurement to quantify the uncertainty of system output X by observing
Neural Computing and Applications
the input Y and IðX; YÞ 0. IðX; YÞ ¼ 0 if and only if Y is independent distributed to X. Based on the assumption that the fine-tuned ESN performs on the time series prediction with much tiny error, which means the network output Y X. Thus, the (4) can be approximately rewritten as ~ ¼ IðX; YÞ ~ IðX; XÞ
ð5Þ
which indicates that it is reasonable and applicable to observe the mutual information between X and Y~ to mea~ sure the similarity of X and X.
First of all, the relative entropy is used to measure the closeness of two time series functions (TSFs). Specifically, given two TSFs of px ðtÞ and py ðtÞ, the Kullback–Leibler information (KLI) between them is computed by Z px ðtÞ dðtÞ px ðtÞ ln Iðpx kpy Þ ¼ ð6Þ py ðtÞ t Further, the Kullback–Leibler divergence (KLD) between px ðtÞ and py ðtÞ can be obtained by Dðpx ; py Þ ¼ Iðpx kpy Þ þ Iðpy kpx Þ
ð7Þ
Assume that px ðtÞ and py ðtÞ are two univariate functions with normal distribution. That is, px N ðlx ; rx Þ, py N ðly ; ry Þ. Then, we have ! ! 2 1 1 r2x r2y 1 Dðpx ; py Þ ¼ þ þ ly lx þ 2 2 r2y r2x r2x r2y According to the above equation, the relative entropy between the test input subsequence X and the corresponding output subsequence Y~ as well as the relative entropy between the X~ and X is computed by
1 1 1 ðlY~ lx Þ2 2 þ 2 2 rx rY~
ð8Þ
It is noted that lY~ ¼ y~ and lX~ ¼ x~ for single step forward prediction task. Thus, the (8) can be written as
2.2 Threshold analysis
DðpY~; pX Þ DðpX~; pX Þ
k DðpY~; pX Þ DðpX~; pX Þ k ! 1 1 1 2 2 k ðlY~ lx Þ ðlX~ lx Þ k þ 2 r2Y~ r2X~ 1 k ðlY~ lx Þ2 ðlX~ lx Þ2 k 2 rY~ 1 ¼ k ðlY~ lX~ÞðlY~ þ lX~ 2lx Þ k 2 rY~ 2 k ðlY~ lX~ÞðlY~ lx Þ k 2 rY~
! ðlX~ lx Þ
2
1 1 þ r2x r2X~
!!
Based on the assumption that the neural network is well trained, it holds r2Y~ r2X~ r2x , and the above expression can be derived to
k DðpY~; pX Þ DðpX~; pX Þ k k ð~ y x~Þð~ y lx Þ k
2 r2Y~ ð9Þ
or k ð~ y x~Þð~ y lx Þ k
r2Y~ k ðDðpY~; pX Þ DðpX~; pX ÞÞ k 2 ð10Þ
Setting the divergence threshold Dthr , the data can be regarded as an anomaly if k ð~ y x~Þð~ y lx Þ k
r2Y~ r2 Dthr X Dthr ; 2 2
ð11Þ
It should be noted that there is a different threshold for different datasets because of r2X .
3 Echo state networks Echo state network (ESN) is a typical reservoir computing network framework proposed by Jaeger et al. [23]. It consists of three components: input, reservoir (an recurrent neural network) and readout component (as shown in Fig. 1). The reservoir is the core of ESN characterized by a
wout
x(n)
Input
Reservoir
Output
Readout
Fig. 1 Network structure of ESN
123
Neural Computing and Applications
fixed, large-scale, nontrainable, but sparse inter-connected neuron ensemble. If the reservoir is sufficiently complex to capture all salient features of the inputs, it can act as a state transition structure through which the input can be mapped into a higher dimensional form. Unlike most of the other recurrent neural networks which are difficult to train or easy to meet the problems of the gradient vanishing/explosion, ESN is an easy-to-train network, in which only the output weights are required to train and even a simple linear regression method is capable of doing the training task. The reservoir state at time step n is updated by sðnÞ ¼ f ðWin xðnÞ þ Wsðn 1Þ þ Wfb yd ðn 1ÞÞ
ð12Þ
and the ESN output is calculated by yðnÞ ¼ Wout sðnÞ
ð13Þ
where s is a matrix concatenating all states of the reservoir layer. x is the input signal. f is a nonlinear activation function. yd is the given training sequence. Win 2 RMN ; W 2 RNN and Wfb 2 RNP represent the connection strength from the input layer to the reservoir, of the reservoir inter-connection and from the output layer back to the reservoir, respectively. M, N and P represent the input dimension, reservoir dimension and the output dimension, respectively. Wout represents the connection weights from reservoir neurons to readout neuron, which is the only one trained by the ESN training algorithm. Usually, there are several linear regression methods to calculate the weights Wout . In this paper, it is computed via pseudo-inverse as follows Wout ¼ ðSþ YÞT
ð14Þ
where Sþ is the pseudo-inverse of S, or via the following ridge regression method, Wout ¼ YST ðSST þ bIÞ1
/0 : ½v1 ðnÞ; . . .; vq ðnÞ T ¼ diag½f 0 ðv1 ðnÞÞ; . . .; f 0 ðvq ðnÞÞ T Thus, the linearized system can be formulated as d^ sðnÞ ¼ Ad^ sðn 1Þ þ B1 dxðnÞ þ B2 dyðn 1Þ
ð16Þ
dyðnÞ ¼ Cd^ sðnÞ
ð17Þ
where the linear matrix is defined by dd dm 0 0 A2R ¼ f ðvðnÞÞW, B1 2 R ¼ f ðvðnÞÞWin , B2 2 T ^ out . Rdm ¼ f 0 ðvðnÞÞWfb , C 2 Rdm ¼ W
3.1 Reservoir controllability Repeat the linearized reservoir state dynamics (16) and network output dynamics (17), yielding that d^ sðn þ 1Þ ¼ Ad^ sðnÞ þ B1 dxðn þ 1Þ þ B2 dyðnÞ ¼ ðA þ B2 CÞd^ sðnÞ þ B1 dxðn þ 1Þ sðnÞ þ ðA þ B2 CÞB1 dxðn þ 1Þ d^ sðn þ 2Þ ¼ ðA þ B2 CÞ2 d^ þ B1 dxðn þ 2Þ ... sðnÞ þ ðA þ B2 CÞd1 B1 dxðn þ 1Þ d^ sðn þ dÞ ¼ ðA þ B2 CÞd d^ þ þ ðA þ B2 CÞB1 dxðn þ d 1Þ þ B1 dxðn þ dÞ where d is the dimensionality of the state space. If the matrix Mc ¼ ½ðA þ B2 CÞd1 B1 ; . . .; ðA þ B2 CÞB1 ; B1
is of rank d, that is, full row rank, the system (18) has an unique representation of d^ sðn þ dÞ in terms of xðn þ 1Þ; xðn þ 2Þ; . . .; xðn þ dÞ with the given values of matrix A, B1 , B2 , C and d^ sðnÞ. If the linearized system is controllable, then the ESN is locally controllable around the equilibrium state [35].
ð15Þ
with b [ 0, which can make the ESN with better generalization ability when the parameter b is chosen properly. Bearing in mind that the activation function f is a continuous differentiable function, thus, we can linearize the system state dynamics. Expanding (12) as a Taylor series around the reservoir state s^ðnÞ and inputting xðnÞ and retaining the first-order term, we get 0
3.2 Local observability By Eq. (17), we get dyðnÞ ¼ Cd^ sðnÞ dyðn þ 1Þ ¼ CðA þ B2 CÞd^ sðnÞ þ CB1 dxðn þ 1Þ sðnÞ dyðn þ d 1Þ ¼ CðA þ B2 CÞd1 d^
d^ sðnÞ ¼ f ðvðnÞÞWd^ sðn 1Þ
þ CðA þ B2 CÞd2 B1 dxðn þ 1Þ þ þ CðA þ B2 CÞB1 dxðn þ d 2Þ
þ f 0 ðvðnÞÞWin dxðnÞ þ f 0 ðvðnÞÞWfb dyðn 1Þ where d^ sðnÞ and dxðnÞ are the small displacements of the state and the input, respectively. The matrix f 0 ðvðnÞÞ 2 Rqq is the Jacobian matrix of f ðvðnÞÞ with vðnÞ ¼ Win xðnÞ þ Wsðn 1Þ þ Wfb yðn 1Þ, f 0 ðvðnÞÞ which takes the form
123
ð18Þ
þ CB1 dxðn þ d 1Þ ð19Þ If the matrix
Neural Computing and Applications
d1 Mo ¼ ½ ðA þ B2 CÞT C; . . .; ðA þ B2 CÞT C; C T
ð20Þ
is of rank d, similar to controllability, we can formally state the local observability criterion as follows by using the inverse function theorem. If the linearized system is observable, then the ESN is locally observable around the equilibrium state [35].
4 Method
Before training the neural network, data preprocessing is necessary to regulate the sample data to satisfy the network input. As shown in the green part in Fig. 2, the preprocessing phase includes training data collection, obvious abnormality elimination, Kalman filtering, Min– Max normalization. First of all, to eliminate the obvious abnormal data, we calculate the mean value of all of the data and then multiplied the mean value by a certain rate to get the threshold. Comparing the actual data with the threshold and the actual data that exceed the threshold will be removed. And then, the Kalman filter [36] is used to glitch noise and smooth the sample data. For example, the process is governed by the linear difference equation zk ¼ xk þ v k
ð21Þ
Here, xk is the kth data, and zk is the measurement value of xk . The random variables w and v represent the process and measurement noise, respectively. They are assumed to be independent of each other and with normal probability distributions pðwÞs Nð0; QÞ pðvÞs Nð0; RÞ
x^kjk1 ¼ x^k1 Pkjk1 ¼ Pk1 þ Q Pkjk1 Kk ¼ Pkjk1 þ R
ð23Þ
x^k ¼ x^kjk1 þ Kk ðzk x^kjk1 Þ Pk ¼ ð1 Kk ÞPkjk1
4.1 Data preprocessing
xk ¼ xkþ1 þ wk
In practice, the process noise covariance Q matrices and measurement noise covariance R matrices might change with each time step or measurement. The update equations are
ð22Þ
where x^k denotes the priori estimate value of kth data based on the measurement value zk . x^kjk1 denotes the posteriori estimated value of kth data based on the knowledge of ðk 1Þth data. Pkjk1 is the priori estimating error covariance. Pk is the posteriori estimated error covariance. K represents the gain that minimizes the posteriori estimate error covariance. Figure 3 shows an example of one time series before and after filter.
4.2 Training the network The steps of training a ESN can be summarized as: Step 1 Initialize the neural network, such as the input and reservoir connection weights. Step 2 Input the series data x(n) to the network, and calculate the corresponding reservoir state s(n) by Eq. (10). Step 3 Concatenate the state vector s(n) to the state collection matrix S in the form of SðnÞ ¼ ½Sðn 1Þ; sðnÞ , and concatenate the target collection matrix Y in the form of YðnÞ ¼ ½Yðn 1Þ; yðnÞ at the same time. Step 4 Repeat the step 2–3 for all the input data. Step 5 Calculate output weights Wout based on Eqs. (12) or (13).
frequence(bpm)
90 before filter after filter
85 80 75 70 65 60
0
50
100
150
200
250
300
time(5min)
Fig. 2 Flow diagram of the proposed method
Fig. 3 An example of one time series before and after filter
123
Neural Computing and Applications Table 1 Heart rate offset degree (y1 ) and float degree (y2 ) of person #1 to person #4 in different days (a)
(b) y1
(c) y2
y1
y2
Person #1
0.0012
4.55e-4
0.0034
0.0014
0.0107
0.037
0.0068
0.0013
Person #2
0.0055
0.0026
0.0037
0.0016
0.0103
0.0056
0.0064
0.0037
Person #3
0.0017
7.9876e-4
0.0014
6.3886e-4
0.0011
6.2542e-4
9.7675e-4
3.9448e-4
Person #4
6.5355e-4
3.419e-4
4.6818e-4
2.386e-4
5.5316e-4
2.6101e-4
4.6921e-4
1.9102e-4
4.3 Testing the network In the test process (the red part in Fig. 2), according to the threshold analysis in Sect. 2, we set a abnormal threshold 2 of threx ¼ r2 s, where parameter s is a positive gain to be designed according to different task and parameter r is the standard deviation of the data to be detected. It is seen that the abnormal threshold varieties as the data to be detected varieties and not involves manual tuning or trial and error process. So it is thought that the proposed one is more reasonable and adaptive compared with some existing anomaly detection techniques based on an invariable threshold. Since the threshold of the proposed method is 2 equal to a variety partðr2 Þ multiplied by a constant s, thus, s has an indirect and smaller influence on the threshold compared with a fixed constant threshold which is totally influenced by the user-designed constant [37]. It means that our method can reduce the wrong predictions due to human factor. On the other hand, in the proposed method, only the parameter s needs to be designed to form an anomaly detection threshold while in some other existing methods more than two parameters need to be designed [37, 38], which increases the complexity to implement the algorithm. Moreover, considering the impact of the coming data on the abnormal alarm, the input data in the test phase consist of two parts. The first part is the current data to be detect and the second part is some data which list behind the current data. For example, the current data are di , and the second part is dip ; d ipþ1 ; . . .; di1 . As follows: Step 1 Consider the time series of dip ; d ipþ1 ; . . .; di1 ; d i as the network input and the corresponding output named as ti ð1Þ. Step 2 Replace the earliest data dip with ti ð1Þ, and then, consider ti ð1Þ; dipþ1 ; . . .; d i1 ; di as the network input, and the corresponding output is named as ti ð2Þ. Step 3 After executing P times, we will get the Pth output of ti ðPÞ. Step 4 Calculate the estimated value of di by the formula P of ti ¼ Pj¼1 rj ti ðjÞ, where r1 þ r2 þ þ rP ¼ 1 and r1 [ r2 [ [ rP .
123
y1
(d) y2
y1
y2
If the error between input data and network output exceeds the abnormal threshold, it indicates that anomalies occur. Moreover, two indexes of offset degree ðy1 Þ and float degree ðy2 Þ have been denoted to assess the abnormality degree, which are calculated by y1 ¼
N 1X ðdðiÞ tðiÞÞ2 N i¼1
ð24Þ
N 1X ðd dðiÞÞ2 N i¼1
ð25Þ
and y2 ¼
Here, N represents the number of input data, d(i) is the ith data to be detect, ti is the ith output of network, and d is the mean value of all of the data to be detected.
5 Results In this section, we construct an ESN of 1-500-1, that is, one input dimension, 500 reservoir dimension and one output dimension. The experiment dataset consists of the real clinical heart rate of 100 clients with different age and gender. And each individual’s dataset is the collection of consecutive 30 days’ heart rate. As expected, the number of healthy days is much higher than unhealthy days. In combination with the clinical manifestations collected from the doctor and related preprocessing, the training dataset is consisting of the normal data of 5 to 7 days, which may different for every individual. After the detection model (echo state network) is trained well for one person, all the sampling data of 30 days will be added into the model to quantify this person’s heart rate condition. To the best of our knowledge, many existing heart rate abnormal detection methods are based on a normal range to determine whether or not the collected data are normal. However, it does not consider the fact that different people tend to exhibit different physical condition. For example, the athlete’s heart rate is relative smooth, while the heart rate of children is much faster. Compared with the people with irregular schedule, the people with good habits have more
Neural Computing and Applications Table 2 Abnormal heart rate detection of person #1 to person #4 in different days
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
smooth heart rate. Therefore, it is more reasonable to consider a method based on the individual characteristics. To demonstrate the effectiveness of the proposed method, we have consciously shown the heart rate status of eight people with different health condition in different days, as shown in Tables 1, 2, 3 and 4, respectively. According to the collected data of each person, we build a special network model and a reference threshold for each person to detect the abnormal heart rate. More specifically, in the first day of person #1 [subfigure (a)], it can be seen that the predicted and actual values are very close, and the whole curve is much smooth. Based on that, we can obtain that the heart rate of this day is in a good healthy status.
While in the second day of subfigure (b), though the predicted and actual values are close, by calculating the offset and float degree (as shown in Table 1), the whole curve has a slight fluctuation compared with that of subfigure (a). As a result, we consider that the heart rate in this day is in a general healthy status. In (c), it can be seen that the first half of the curve is smooth. But starting from the x axis of 200, the heart rate begins to fluctuate dramatically and rises sharply. Eventually, the HR is even up to 110. So we conclude this person is in a bad healthy status and needs to send alarm to medical staff. And in the next day of subfigure (d), it indicates that the values are higher in the first half of the curve, but it exhibits to gradually decrease and
123
Neural Computing and Applications Table 3 Heart rate offset degree (y1 ) and float degree (y2 ) of person #5 to person #8 in different days (a)
(b)
y1
(c)
(d)
y2
y1
y2
y1
y2
y1
y2
Person #5
0.0019
6.2178e-4
0.0031
0.0012
0.0025
0.0013
0.002
7.6697e-4
Person #6
9.4406e-4
3.8999e-4
9.2619e-4
3.7703e-4
0.001
5.2114e-4
9.9733e-4
3.6993e-4
Person #7
0.0017
3.6774e-4
0.0024
7.394e-4
0.0024
7.3863e-4
0.0021
8.1285e-4
Person #8
5.6658e-4
1.9673e-4
6.4179e-4
2.2188e-4
0.0011
5.0215e-4
0.0013
5.092e-4
tends to be stable in the latter half of the curve. So we know this person is in a bad healthy status, but not need to be monitored. Further, it can be seen from the subfigure (a) and (b) of person #2, there is certain fluctuation in
heart rate curves, which means the heart rate of this person is in a general healthy status. And in the subfigure (c) and (d), the fluctuation is more obvious and the number of alarm is gradual increase, which indicates the heart rate
Table 4 Abnormal heart rate detection of person #5 to person #8 in different days
123
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
Neural Computing and Applications
condition of this person is in a serious status and needs to be real-time monitored. For person #3 and person #4, it can be concluded that they are in the healthy heart rate status from their steady heart rate curves and significantly reduced number of alarms. Similarly, the above analysis process can also be applied to assess the heart rate status of person #5 to person #8. For example, it can be concluded that the person #5 is in a bad heart rate healthy status and should be paid more attention because of the high-frequency alarms, while there is no need to pay much attention to person #6 due to the few of alarm. In fact, all of the results and analysis have been sent back to the doctor. And they have considered the proposed method as a candidate to detect heart rate conditions in the future.
6 Conclusion In this paper, a novel anomaly detection method by using classical echo state network based on the imbalanced dataset is proposed. An adaptive detecting threshold is obtained according to related information theory analysis. As the data to be detected varieties, the abnormal threshold varieties, and not involves manual tuning or trial and error process, which makes the proposed method more reasonable and adaptive. The significant advantages of proposed method also lie in where only very few parameters need to be designed for different specific tasks, and only normal data are required for training the neural network, which make it much easier and more feasible to build the computing model. Both theoretical analysis and experimental results demonstrate the effectiveness and flexibility of the proposed method. Acknowledgements This work was supported in part by the National Natural Science Foundation of China (No. 61773081) and Technology Transformation Program of Chongqing Higher Education University (KJZH17102).
References 1. Pimentel MAF, Clifton DA, Lei C, Tarassenko L (2014) A review of novelty detection. Signal Process 99(6):215–249 2. Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839 3. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58 4. Markou M, Singh S (2003) Novelty detection: a review-part 1: statistical approaches. Signal Process 83(12):2481–2497 5. Chen Z, Saligrama V (2012) Video anomaly detection based on local statistical aggregates. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 2112–2119
6. Markou M, Singh S (2003) Novelty detection: a review-part 2: neural network based approaches. Signal Process 83(12):2499–2521 7. Bontemps L, Cao VL, Mcdermott J, Le-Khac NA (2016) Collective anomaly detection based on long short-term memory recurrent neural networks. In: International conference on future data and security engineering. Springer, Cham, pp 141–152 8. Ahmad S, Lavin A, Purdy S, Agha Z (2017) Unsupervised realtime anomaly detection for streaming data. Neurocomputing 262:134–147 9. Zhao M, Tian Z, Chow TWS (2018) Fault diagnosis on wireless sensor network using the neighborhood kernel density estimation. Neural Comput Appl 15:1–12 10. Lin WC, Ke SW, Tsai CF (2015) CANN: an intrusion detection system based on combining cluster centers and nearest neighbors. Knowl Based Syst 78(1):13–21 11. Castillo O, Melin P, Ramrez E, Soria J (2012) Hybrid intelligent system for cardiac arrhythmia classification with fuzzy k-nearest neighbors and neural networks combined with a fuzzy system. Expert Syst Appl 39(3):2947–2955 12. Zhao J, Liu K, Wang W, Liu Y (2014) Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry. Inf Sci 259(3):335–345 13. Kiss I, Genge B, Haller P, Sebestyn G (2014) Data clusteringbased anomaly detection in industrial control systems. In: IEEE international conference on intelligent computer communication and processing, pp 275–281 14. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284 15. Yap BW, Rani KA, Rahman HAA, Fong S, Khairudin Z, Abdullah NN (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proceedings of the first international conference on advanced data and information engineering (DaEng-2013). Springer, Singapore, pp 13–22 16. Cateni S, Colla V, Vannucci M (2014) A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 135(8):32–41 17. Cao H, Li XL, Woon YK, Ng SK (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822 18. Garca S, Herrera F (2014) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17(3):275–306 19. Song Y, Morency L P, Davis R (2013) Distribution-sensitive learning for imbalanced datasets. In: IEEE international conference and workshops on automatic face and gesture recognition, pp 1–6 20. Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246 21. Wang X, Matwin S, Japkowicz N, Liu X (2013) Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In: Canadian conference on artificial intelligence. Springer, Berlin, vol 7884, pp 174–186 22. Li Q, Yang B, Li Y, Deng N, Jing L (2013) Constructing support vector machine ensemble with segmentation for imbalanced datasets. Neural Comput Appl 22(1):249–256 23. Jaeger H (2007) Echo state network. Scholarpedia 2(9):1479–1482 24. Jaeger H (2002) Tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and echo state network approach. GMD Report 159. German National Research Center for Information Technology, Bremen, Germany
123
Neural Computing and Applications 25. Li D, Han M, Wang J (2012) Chaotic time series prediction based on a novel robust echo state network. IEEE Trans Neural Netw Learn Syst 23(5):787 26. Boccato L, Attux R, Zuben FJV (2014) Self-organization and lateral interaction in echo state network reservoirs. Neurocomputing 138(11):297–309 27. Hinaut X, Petit M, Pointeau G, Dominey PF (2014) Exploring the acquisition and production of grammatical constructions through human-robot interaction with echo state networks. Front Neurorobot 8:16 28. Bianchi FM, Santis ED, Rizzi A, Sadeghian A (2015) Short-term electric load forecasting using echo state networks and PCA decomposition. IEEE Access 3:1931–1943 29. Bianchi FM, Scardapane S, Uncini A, Rizzi A, Sadeghian A (2015) Prediction of telephone calls load using echo state network with exogenous variables. Neural Netw 71(C):204–213 30. Salehi MR, Abiri E, Dehyadegari L (2013) An analytical approach to photonic reservoir computing a network of SOA’s for noisy speech recognition. Opt Commun 306(6):135–139 31. Alalshekmubarak A, Smith LS (2014) A noise robust arabic speech recognition system based on the echo state network. J Acoust Soc Am 135(4):2195
123
32. Buteneers P, Verstraeten D, Nieuwenhuyse BV, Stroobandt D, Raedt R, Vonck K (2013) Real-time detection of epileptic seizures in animal models using reservoir computing. Epilepsy Res 103(2–3):124–134 33. Bozhkov L, Koprinkovahristova P, Georgieva P (2016) Learning to decode human emotions with echo state networks. Neural Netw 78:112–119 34. Dickey DA (2013) The analysis of time series: an introduction. Technometrics 33(3):363–364 35. Haykin SS (2009) Neural networks and learning machines. China Machine Press, Beijing 36. Welch G, Bishop G (2001) An introduction to the Kalman filter. University of North Carolina at Chapel Hill, Chapel Hill 37. Lukosˇevicˇius Mantas, Jaeger Herbert (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149 38. Lee S, Kim G, Kim S (2011) Self-adaptive and dynamic clustering for online anomaly detection. Expert Syst Appl 38(12):14891–14898