top to bottom, Solid State Drive (SSD), Hard Disk Drive (HDD) and Tape .... Data mining techniques are used to extract the hidden information from a huge ...
DATA MINING APPROACH FOR COMMERCIAL DATA CLASSIFICATION AND MIGRATION IN HYBRID STORAGE SYSTEM
By Mais Ahmad Haj Qasem
Supervisor Dr. Maen M. Al Assaf
Co- Supervisor Dr. Ali Rodan
This Thesis was submitted in Partial Fulfillment of the Requirements for the Master’s Degree in Computer Science
Faculty of Graduate Studies The University of Jordan
Nov, 2015
iii
Dedication
I dedicate this thesis to the soul of my father and to my pillar in life who believed in me and supported me and brighted every step of my way and who saw the best in me, my older brother Motasem Haj Qasem.
I will be forever thankful for my mother who held me up, never let me fail and saw in me the strength to reach the success of this thesis.
Also thanks to my soul twin, Musab Haj Qasem whom his words supported me to continue my path.
Finally, thanks to my whole family; Rola, Osama, Ismail, Lelas, Suhaib.
iv
Acknowledgement
First, I praise Allah that granted me the power and faith to continue this research.
I acknowledge the time and help given by my two supervisors Dr. Maen Al Assaf and Dr. Ali Rodan. Without their help and their faith and support and their encouragement, I could not reach this level.
Thanks to my Family and friends that stood by me throughout this research for their support and help.
v
Table of contents Subject
Page
Committee Decision
ii
Dedication
iii
Acknowledgment
iv
Table of Contents
v
List of Tables
viii
List of Figures
ix
Abbreviation
xi
Abstract
xii
1 introduction
1
1.1 Overview
…………………………………
1
1.2 Motivation
…………………………………
3
1.3 Research Objective
…………………………………
5
1.4 Thesis Organization
…………………………………
5
2 Literature Review
7 …………………………………
7
2.1.1 Data Mining Task
…………………………………
7
2.1.2 Related Work
…………………………………
8
2.2 Justification
…………………………………
9
2.3 Data Mining Techniques
…………………………………
10
…………………………………
10
…………………………………
11
2.1 Data Mining
2.3.1 Recurrent Neural Network 2.3.1.1 Training Recurrent Neural Network
vi
…………………………………
14
2.3.2.1 ESN Model
…………………………………
15
2.3.2.2 Related Work
…………………………………
17
…………………………………
19
2.3.3.1 SVM Model
…………………………………
21
2.3.3.2 SVM Kernel Type
…………………………………
24
2.3.3.3 Related Work
…………………………………
25
…………………………………
26
2.4.1 Parallel Hybrid Storage System
…………………………………
27
2.4.2 Prefetching In Hybrid Storage System
…………………………………
29
2.4.3 Related Work
…………………………………
29
2.3.2 Echo State Network
2.3.3 Support Vector Machine
2.4 Hybrid Storage System
3 System Model
32
3.1 Introduction
…………………………………
32
3.2 Application
…………………………………
32
3.3 Model Design
…………………………………
33
3.4 Preprocessing Step
…………………………………
35
…………………………………
38
…………………………………
39
3.5.1 ESN
…………………………………
40
3.5.2 CRJ
…………………………………
42
3.5.3 SVR
…………………………………
46
…………………………………
48
3.4.1 Statistics about the Data 3.5 Data Mining Step
3.6 Prefetching Step 4 Experimental Result
50
vii
4.1 Dataset
…………………………………
50
4.2 Data Mining Techniques Experimental Result
…………………………………
51
4.2.1 ESN
…………………………………
51
4.2.2 CRJ
…………………………………
53
4.2.3 SVR
…………………………………
55
4.2.4 Model-based Comparison
…………………………………
57
4.2.4.1 Model-based for Machine01
…………………………………
57
4.2.4.2 Mixture –Model For Machine01
…………………………………
58
4.2.4.3 Model-based for Machine6
…………………………………
59
4.2.4.4 Mixture –Model For Machine6
…………………………………
61
…………………………………
62
4.4.1 Machine01
…………………………………
65
4.4.2 Machine6
…………………………………
69
5 Conclusion and Future Work
…………………………………
73
5.1 Conclusion
…………………………………
73
5.2 Future Work
…………………………………
74
4.4 Parallel Hybrid Storage System Experimental Result
References
75
Abstract in Arabic
78
viii
LIST OF TABLES
NUMBER
TABLE CAPTION
PAGE
3.1
Statistics about the data
38
4.1
ESN Parameter settings and results for Machine01
52
4.2
ESN Parameter settings and results for Machine06
53
4.3
CRJ Parameter settings and results for Machine01
54
4.4
CRJ Parameter settings and results for Machine6
54
4.5
SVR Parameter settings and results for Machine01
56
4.6
SVR Parameter settings and results for Machine6
56
4.7
Value of PHSS Parameters
64
4.8
Experimental Result for machine01
65
4.9
Experimental Result for machine6
69
ix
LIST OF FIGURES
NUMBER
FIGURE CAPTION
PAGE
2.1
Recurrent Neural Network (RNN)
11
2.2
(A)The Original RNN , (B) The Feedforward Network For BPTT
13
2.3
Classic Example of Linear Classifier
19
2.4
SVM Complex Structure
20
2.5
Basic Idea Behind the SVM
20
2.6
Hybrid Storage System
27
2.7
Parallel Hybrid Storage System
28
3.1
The architecture of the proposed approach
34
3.2
Data log file
35
3.3
Output data after the pre-processing step
38
3.4
Data classification process
39
3.5
Data classification using ESN
40
3.6
ESN model design
42
3.7
Data classification using CRJ
43
3.8
CRJ model design
45
3.9
Data classification using SVR
46
3.10
SVR model design
47
3.11
Parallel Hybrid Storage System design
48
x
3.12
Prefetching Mechanism
49
4.1
Accuracy based comparison between the techniques in Model 60% (Machine01) Accuracy based comparison between the techniques in Model 70% (Machine01) Accuracy based comparison between the techniques in Model 80% (Machine01) Testing set based comparison between the techniques in all models (Machine1) Accuracy based comparison between the techniques in Model 60% (Machine06) Accuracy based comparison between the techniques in Model 70% (Machine06)
57
60
4.9
Accuracy based comparison between the techniques in Model 80% (Machine06) Testing set based comparison between the techniques in all models (Machine6) Results of using the proposed approach in Model 60% (Machine01)
4.10
Results of using the proposed approach in Model 70% (Machine01)
67
4.11
Results of using the proposed approach in Model 80% (Machine01)
68
4.12
Results of using the proposed approach in Model 60% (Machine6)
70
4.13
Results of using the proposed approach in Model 70% (Machine6)
71
4.14
Results of using the proposed approach in Model 80% (Machine6)
72
4.2 4.3 4.4
4.5 4.6 4.7 4.8
57 58 59 60 60
61 66
xi
LIST OF ABBREVIATIONS OR SYMBOLS SSD HDD I/O ESN RNN CRJ SCR SVM SVR ICDM KM PPM DG BPTT KF RTRL LSM NMSE DESN MSO FF-ESN SESN RBF PLS Poly-PLS ANNs LS-SVMs HSS PHSS ESP MaxBW
Solid State Drive Hard Disk Drive Input/output Echo State Network Recurrent Neural Network Cycle Reservoir with regular Jumps simple Cycle reservoir Support Vector Machine Support Vector Regression IEEE International conference on Data mining knowledge management Partial Match Dependency Graph Backpropagation Through Time Klamen filter Real Time Recurrent Learning liquid state machine Normalized Mean Square Error decoupled echo state network multiple superimposed oscillator feedforward ESN simple Echo state Network radial basis function Partial Least Squares polynomial Partial Least Squares Artificial Neural Networks Least-Squares Support Vector Machines Hybrid Storage System Parallel Hybrid Storage System echo State property maximum bandwidth
xii
MINING APPROACH FOR COMMERICAL DATA CLASSIFICATION AND MIGRATION IN HYBRID STORAGE SYSTEM
By Mais Ahmad Haj Qasem
Supervisor Dr. Maen M. Al Assaf
Co- Supervisor Dr. Ali Rodan
ABSTRACT A Hybrid storage system consists of a hierarchy of storage devices, which include, from to top to bottom, Solid State Drive (SSD), Hard Disk Drive (HDD) and Tape respectively. These storage devices differ from each other in their performance capabilities, especially in their speed performance, which increases as moving in the hierarchy from bottom upwards. Thus, migrate the application data in Hybrid Storage System levels based on the importance will reduce the elapsed time of the application. In order to migrate the data, in this study, data minning techiques are used to classify data of commericial markting website in parallel with the on-demand request. The result of the classification is then enjected in a parallel hybrid storage system in order to migrate the data based on their importance. Thus, the uppermost levels accommodate important data. For this purpose, Echo State Network (ESN), Cycle Reservoirs with jump (CRJ) and Support Vector Regression (SVR) are used. ESN is a simple, easy, and powerful method that is used to train recurrent neural network (RNN) to predict the future access. CRJ is a simple deterministic reservoir model with highly extent weight values that has better performance than standard ESN. SVR is based on decision planes that define the decision boundaries to separate between groups of objects with different class memberships. To evaluate our method, a real-world parallel hybrid storage system prototype supported by our approach to calculate real-world applications execution elapsed time. The results show that the proposed approach reduce the elapsed time significantly, where the Enahancement of the system for the elapsed time were detected to be between 20 % and 36% in our apporach .
Chapter One Introduction 1.1 Overview A Hybrid storage system consists of a hierarchy of storage devices, which include, from top to bottom, Solid State Drive (SSD), Hard Disk Drive (HDD) and Tape respectively. These storage devices differ from each other in their performance capabilities, especially in their speed performance, which increases as moving in the hierarchy from bottom upwards. Applications continuously ask for their on-demand data reading requests from the storage system. Therefore, its better to store the requested data in the uppermost level to reduce the retrieving time for requested data.
Data mining techniques are able to classify the data based on their importance and future access. This process allows us to migrate the application data in Hybrid Storage System levels based on the importance. Thus, distribute the data among the different level based on their importance (Nijim, et al., 2010). The role of data mining
techniques
are the
classification of the data based on their importance, which will improve the performance of the system as a whole. Data importance variants, by which the data can be classified into groups, include, but not limited to: data that will be accessed in the near future, most frequently used data and data that tend to be accessed in a particular event.
Input/output (I/O) intensive application and their users vary in their access patterns. This makes the process of data classification depends on the application type. Application types include, but not limited to, database applications, image processing applications, marketing,
finance, voice and text recognition, ... etc. Different classification or data mining approach may provide better accuracy for a type of an application than the rest (Nijim, et al., 2011).
Examples of the data mining techniques are: Echo State Network (ESN) that supplies supervised learning standard and architecture for Recurrent Neural Network (RNN). The main idea for ESN is to drive large, random and fixed RNN for input signal. ESN determines the way for prompting each neuron (unit), in this reservoir network, a non-linear response signal and mirage a desired output signal by trainable linear combination of the entire response signal. Cycle Reservoir with regular Jumps (CRJ) is an extension of the ESN shows that a very simple, Cyclic, deterministically generated reservoir can produce better performance than the standard ESN with a model called simple cycle reservoir (SCR) (Rodan and Tino, 2011). Support Vector Regression (SVR) is based on decision planes that define the decision boundaries to separate between groups of objects that have different class membership.
In this study, the aim is to propose an approach to manipulate a hybrid storage system, based on a parallel process and using data mining techniques, that performs the highest possible accurate data classification for specific application type in parallel with the on-demand request. Subsequently, three data mining techniques will be used, which are: 1) Echo State Network (ESN) (Jaeger, 2002), 2) Cycle Reservoir with regular Jumps (CRJ) (Rodan and Tino, 2012), and 3) Support Vector Regression (SVR) (Simon and Koller, 2002). With the utilized data mining techniques, the proposed approach will be implemented in a parallel hybrid storage system. The proposed approach will allow a parallel migration of the data based on their importance. As such, the uppermost levels accommodate important data. Hence; reaching the ultimate goal of the proposed approach which is reducing the application’ execution time by reducing the storage’ I/O stalls. Migrating the data based on
an accurate data classification reduces the application execution time the data of the important requests will be found in the uppermost levels, which as the highest speed performance.
In order to evaluate the performance of the proposed approach, a real-world parallel hybrid storage system will be utilized and supported by the proposed approach to calculate realworld application’s execution elapsed time (Al Assaf, et al., 2011). A commercial marketing website will be used in the performance evaluation. We mainly focus on behavioral targeting of customers who accesses a particular firm or a vendor’s website to browse and purchase products. Subsequently, data mining techniques help in classifying products based on customer demand. Data related to important most demanded products are migrated to the uppermost level of the hybrid storage system.
1.2 Motivation
The motivations behind this work is the ability of data mining approaches to predict the data's importance, the high I/O bandwidth provided by parallel storage system and the increased hybrid storage system’ performance. In order to show the significance of our study, we need first to discuss the key motivations that make our study feasible and important, as follows:
➢ Data mining techniques are able to predict the data's importance and perform classification on them. This enables us to decrease data access time by migrating important data to upper levels of the storage system. This is feasible and applicable for commercial marketing websites.
➢ Based on the literature review, few data mining techniques addressed data classification and migration in the hybrid storage systems. To the best of our knowledge, our study will be the first to use data mining techniques (ESN, CRJ, SVR) for commercial marketing website’s applications in parallel hybrid storage systems. ➢ Parallel hybrid storage systems provide high I/O bandwidth; which enables us to do data migration in parallel with the application on-demand data request. ➢ The hybrid storage system provides an increasing performance in terms of speed as we move upwards in the hierarchy. This makes data migration efficient in reducing application I/O stalls and execution elapsed time. ➢ CPU Processing time is significantly lower than the storage system’ latency. This makes it possible for data mining algorithms to execute and to make decisions concurrently with the data migration process without negative effects on the performance.
Based on our motivations, our study takes the advantage of the above-mentioned points to implement a module that classifies the data based on their importance and perform data migration in a parallel hybrid storage system for commercial marketing websites. As a result, data becomes distributed among the hybrid storage system’ levels from top to bottom based on their importance. This enables the application to find important data in the uppermost levels and to reduce its I/O stalls and execution time.
1.3 Research objective
In this study, we focus on data migration in a parallel hybrid storage system using a data mining approach for commercial marketing websites. We take into consideration the points we mentioned in the previous section. Our study objectives are summarized in the following key points:
➢ Modeling a data mining algorithm that performs the highest possible accurate data classification for specific application type in parallel with the application on-demand request. ➢ Implementing our approach in a parallel hybrid storage system in order to migrate the data based on their importance; were uppermost levels accommodate important data. ➢ We aim to build a prototype that evaluates our proposed approach in a parallel hybrid storage system. Accuracy in data classification is our major performance metric.
1.4 Thesis Organization
This thesis is organized in five chapters, as follows: Chapter II, gives a background on data mining concepts and the techniques that we will use it in our experimental. We also give a background on a hybrid storage system. Chapter III, will discusses the design of our system model and how we used each of data mining techniques (ESN, CRJ, and SVR) to get high accuracy for each one of them. Also, we explain the Parallel Hybrid Storage System that we use to implement our system. Chapter IV, will show our experimental results. Chapter V, conclude our work.
Chapter Two Background and Related Work
1
2.1 Data Mining Data mining techniques are used to extract the hidden information from a huge amount of data. Moreover, data mining techniques are used to predict future trends and behaviors of users/agents/software to facilitate automatic proactive and knowledge driven decision. Data mining techniques can be implemented easily in existing software and hardware platforms to give a value for their data and can be integrated with newly established systems. The most commonly used techniques in data mining are: artificial neural network, decision trees, genetic algorithm, nearest neighbor method and rule’s induction.
2.1.1 Data Mining Tasks ❖
Frequent pattern mining: implements an extraction of a frequent set of items that occurs together in a collection and formed what so-called frequent pattern.
❖
Association rules mining: It creates relationships between the occurrence of an item or items with other sets of items in the same dataset.
❖
Classification: is a supervised learning that categorizes unknown data items into their correspondence groups by creating predefined classes or groups of the categorized items based on some features.
❖
Regression: maps the data item to real prediction value based on a supervised learning. Regression differs from classification in the away that it gives a continuous value for the output, while classification gives the name/number of the group for which the input should be associated.
2
❖
Clustering: is unsupervised learning that groups similar data items together into cluster using segmentation or partition.
❖
Summarization: maps the data into subsets with specific simple description.
2.1.2 Related Work
To highlight the differences between the data mining techniques, Wu, et al., (2007) reviewed the ten significant data mining techniques as specified by IEEE International conference on Data mining (ICDM), which are: k-mean, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, C4.5 and CART. Wu, et al., (2007) provided a description for these techniques, the impact of each of them, the current and future research related to each (Wu, et al., 2007).
A similar, but a field specific review was given by Silwattananusarn and Tuamasuk (2012). Silwattananusarn and Tuamasuk (2012) conducted a review of the applications that use the data mining techniques for knowledge management (KM). The revision process is conducted as follows: First, the data mining and data mining functionality are briefly described. Second, the knowledge management rational and the major knowledge management tools that integrated in a knowledge management cycle are described. Finally, the data mining techniques that are used and integrated with the process of knowledge management are discussed (Silwattananusarn and Tuamasuk 2012).
3
Such reviews and comparisons ease the utilization of the data mining techniques in several fields. As such, using data mining for predictive prefetching has been eased. In this context, Nanopoulos, et al., (2003) presented a new algorithm for predictive web prefetching using Markov predictors called WMo. The proposed algorithm was proven to be a generalization of existing ones and overcoming of their limitations. The performance of WMo was compared with Partial Match (PPM) and Dependency Graph (DG). The experiment results showed that WMo outperforms the existing algorithm for a set of data with variety of order and noise distribution. Mainly, WMo achieved the best prediction with low overhead in network traffic. Overall, WMo was proven to be effective and efficient prediction web prefetching algorithm (Nanopoulos, et al., 2003).
2.2 Justification
Selecting suitable data mining techniques was based on the form, nature, and properties of the input data and the desired output.
RNN has hidden internal state that makes it essential for many temporal processing tasks, similar to the implemented task in our work. Moreover, RNN solves unlearnable tasks like processing of symbolic data. Finally, RNN controls partially observable environment. Thus, chosen ESN and CRJ were essential to the implemented work.
4
SVM is a very specific class of algorithms, featuring by usage of kernels, and effective in high dimensional spaces and memory efficient. SVM makes it possible to specify custom kernels type that comparable with the data type, and can be applied on classification problems and regression problem which make them wider usage.
2.3 Data Mining Techniques 2.3.1 Recurrent Neural Network Recurrent Neural Network (RNN) is one of the main types of Neural Network that operates in sequential time slots. At each time slot, it accepts an input vector and uses it with a nonlinear activation function to update its hidden states, which in term are used to predict the output. RNN makes a rich model class because the hidden states inside it can store a high– dimensional distributed representation of information. Also, the nonlinear dynamics of RNN can achieve a powerful and rich computation that allows RNN to model and predict data in highly complex structure of sequences. Figure 2.1 illustrates the structure of RNN.
5
Figure 2.1: Recurrent Neural Network (RNN)
2.3.1.1
Training Recurrent Neural Network
Over the last decade, several methods for training the RNN have been proposed, such as: Backpropagation Through Time (BPTT), which extends kalmen filtering based techniques (EKF) that use the famous Klamen filter (KF) presented by Singhal and Wu (1989). Real Time Recurrent Learning (RTRL), which is an online gradient–descent method that proposed by Williams and Zipser (1989). In this work, the focus will be on BPTT, because it is the most widely used.
The main building blocks of RNN are the neurons (units), which are connected by twisted links with weights that refer to the strength of the twisted links. The network is described by a set of inputs {A}, a set of hidden units {M} and the output layer units {S}. The value of the network units, given by the activation function, are referred to as 𝑎(𝑛) for the input, where n is the time slot, 𝑏(𝑛) for internal unit, and 𝑐(𝑛)for output layer. The activation is
6
operated in discrete time models, which is iterated over discrete time slots n={1,2,3,…}. The activation of the network is expressed as vectors for the input, hidden and output layers as given in Equation 2.1.
𝑎 (𝑛) = (𝑎1 (𝑛) , … . . 𝑎𝑘 (𝑛))𝑇 , 𝑏 (𝑛) = (𝑏1 (𝑛) , … . . 𝑏𝑘 (𝑛))𝑇 ,𝑐 (𝑛) = (𝑐1 (𝑛) , … . . 𝑐𝑘 (𝑛))𝑇 (2.1)
where T indicats a matrix transpose. The connection weights between the input units and all the units in the internal layer are represented in MXA size matrix. As mentioned, M is the hidden layer and A is the input. The connection weights between in internal units are represented in MXM size matrix. The connection weights between internal units and the outputs are represented in SX(M+A) size matrix of the following form (𝑊 𝑖𝑛 = (𝑊𝑖𝑗𝑖𝑛 ), 𝑊 = (𝑊𝑖𝑗 ), 𝑊 𝑜𝑢𝑡 = (𝑊𝑖𝑗𝑜𝑢𝑡 ) ). The output units may contain optional connection back to the internal units that represented in AXS size matrix of the form (𝑊 𝑏𝑎𝑐𝑘 = (𝑊𝑖𝑗𝑏𝑎𝑐𝑘 )).
The recurrent connections inside the hidden units must be unfolded during the train process. The connections are unfolded by stacking identical copies of the network and updating the connections inside the network by the weight matrices that links between inputs, internals and output’s units. The same processes are applied optionally to connect the outputs unit with the internal units. Copies are made because the feedforward backpropagation cannot be directly transferred to the recurrent neural network due to the expected error backpropagtion.
7
A
B Figure 2.2 A: The original RNN, B: The feedforward network for BPTT
Then, the activation between units in each time slot must be updated. The updated input activation is implemented as given in Equation 2.2.
𝑏(𝑛 + 1) = 𝑓(𝑊 𝑖𝑛 𝑎(𝑛 + 1) + 𝑊𝑥(𝑛) + 𝑊 𝑏𝑎𝑐𝑘 𝑐(𝑛))
(2.2)
where f is the activation function that it usually a non-linear sigmoid function. The updated output activation is implemented as given in Equation 2.3.
𝑐(𝑛 + 1) = 𝑓 𝑜𝑢𝑡 (𝑊 𝑜𝑢𝑡 (𝑎(𝑛 + 1), 𝑏(𝑛 + 1), 𝑐(𝑛))
(2.3)
Where f out is a non-linear sigmoid function that equal tangent hyper function (tanh) or 1 if a linear unit is exist.
Finally, the error between the network output and the desired output is computed as given in Equation 2.4.
8
𝐸 = ∑𝑛=1,…𝑇‖𝑑(𝑛) − 𝑦(𝑛)‖ 2 ∑𝑛=1,…..,𝑇 𝐸(𝑛)
(2.4)
Bearing in mid that the objective of the training is to minimize the squared error, which insured by incrementally modifying the new weights of the network along error gradient “w.r.t.” using a small learning rate “Ɣ”, as given in Equation 2.5.
ʚ𝐸
𝑛𝑒𝑤 𝑤𝑖𝑗𝑚 = 𝑤𝑖𝑗𝑚 − Ɣ ʚ𝑤𝑚
(2.5)
𝑗𝑖
2.3.2 Echo State Network (ESN)
Echo State Network (ESN) supplies supervised learning standard and architecture for recurrent neural network (RNN). ESN is backpropagation decorrelation learning rule for RNN that subsumed under a Reservoir computing name. The aim of ESN is to drive large, random and fixed RNN for the input signal. The neurons (units) are promoting in the reservoir network using a non-linear response signal. The desired output signal is merged by trainable linear combination of all the response signal. The basic idea of ESN are shared with liquid state machine (LSM), which developed independently by Wolfgang Maass (Maass W, et al., 2002).
9
2.3.2.1
ESN Model
ESN deals with discrete time neural network that have input units {A}, internal units {M} and output units {S} over discrete time slots n= {1,2,3,…..}. The activation of the ESN is expressed using a vector for each layer, as given in Equation 2.6.
𝑎 (𝑛) = (𝑎1 (𝑛) , … . . 𝑎𝑘 (𝑛))𝑇 , 𝑏 (𝑛) = (𝑏1 (𝑛) , … . . 𝑏𝑘 (𝑛))𝑇 ,𝑐 (𝑛) = (𝑐1 (𝑛) , … . . 𝑐𝑘 (𝑛))𝑇 . (2.6)
The linking weights between the neurons are gathering in MxA size matrix, for the input, which is referred to by 𝑊 𝑖𝑛 = ( 𝑤𝑗𝑖𝑖𝑛 ), MxM size matrix for the internal weight, which is referred to by W = (w𝑖𝑗 ) , Sx(A+M+S) size matrix for the output, which is referred to by 𝑤𝑜𝑢𝑡 = 𝑤𝑖𝑗𝑜𝑢𝑡 , and NxL size matrix for the connection that projects back from output to the internal unit, which is referred to by 𝑤𝑏𝑎𝑐𝑘 = 𝑤𝑖𝑗𝑏𝑎𝑐𝑘 . Unlike RNN, where all the weights for the inputs, internals and output are adaptable. Typically, thier reservoir connection weights as well as the input weights are randomly generated, and reservoir weights are then scaled. Subsequently, the spectral radius of the reservoir’s weights of the internal units weight matrix will be less than 1 to ensure a sufficient condition for the “echo state property” (ESP). By doing so, ESN ensures the reservoir state is an “echo” for the all input history. The ESN is memoryless read out so it can trained either offline (Batch) or online by minimizing loss function. The offline training (batch) keeps the system weight constant while computing the error and build the model. The online learning (RLS) updates its weights and error by using different weights for each new input data.
10
The internal units are updated, when moving fromtime slot t to time slot t+1, according to Equation 2.7.
𝑏(𝑛 + 1) = 𝑓(𝑊 𝑖𝑛 𝑎(𝑛 + 1) + 𝑊𝑏(𝑛) + 𝑊 𝑏𝑎𝑐𝑘 𝑐(𝑛))
(2.7)
where f is the reservoir activation function that usually tangent hyper function (tanh) and z is optional small white noise that might be needed in some cases for solving the over fitting problem. The linear output is computed using Equation 2.8.
𝑐(𝑛 + 1) = 𝑓 𝑜𝑢𝑡 (𝑊 𝑖𝑛 𝑎(𝑛 + 1), 𝑊𝑏(𝑛), 𝑐(𝑛))
(2.8)
1 𝐿 ) Where 𝑓 𝑜𝑢𝑡 = (𝑓𝑜𝑢𝑡 , … … 𝑓𝑜𝑢𝑡 are the output units function and (𝑎(𝑛 + 1), 𝑏(𝑛 + 1),
𝑐(𝑛 + 1)) are a concatenation for the inputs, internals and the pervious output activation vectors.
Evaluation of the model performance will be done in most cases via Normalized Mean Square Error (NMSE), as given in Equation 2.9.
𝑁𝑀𝑆𝐸 =
〈‖𝑐̂(𝑡)−𝑐(𝑡) ‖2 〉 〈‖𝑐(𝑡)−〈𝑐(𝑡)〉‖2 〉
(2.9)
where 𝑐̂ (𝑡) is a predicted output, 𝑐(𝑡) is a desired output, ‖ . ‖ is Euclidean norm and is the empirical mean.
11
2.3.2.2
Related Work
Xue, et al., (2007) proposed a decoupled echo state network (DESN) enclosed with the use of lateral inhibition for solving the multiple superimposed oscillator (MSO) problems. The purpose was to reduce the computational complexity of the traditional ESN and to improve the probability of obtaining a satisfactory network under the condition of ‘random’ parameter selections. Two schemas for low-complexity were developed, namely: DSEN+RP, which is DESN with reservoir prediction and DSN+MaxInfo, which is DSEN with maximum available information. For computer-generated and real-life noisy, these new schemas were able to provide low cost complexity with efficient solution for the MSO problem and high prediction modeling compared with the traditional ESN. Moreover, the performance of the DESN with lateral inhibition, was proven to be highly string and adhere to various reservoir configurations and randomly generated connection weights ( Xue, et al., 2007).
Deng and Zhang (2007) presented a scale free highly clustered echo state network (SHESN) as an extension of ESN model. SHESN has small world and scale-free state reservoir. SHESN uses incremental growth rules that have several natural characteristics, which are: high clustering coefficient, network degree with scale-free distribution, short path length and hierarchical and distributed architecture. The experimental results showed that SHESN network precisely improves the ESN property. SHESN provided accurate approximation of a highly complex nonlinear dynamic system, such as an RNN system with more than thousands of neurons for some neurobiological feature like scale-free distribution and small world property (Deng and Zhang 2007).
12
Rodan and Tino (2011) presented Cycle Reservoir with Jumps (CRJ), a reservoir topology of ESN that extends a previously proposed simple deterministic reservoir, called Simple Cycle Reservoir (SCR). SCR implements a cyclic reservoir weight with the same value and input connections with the same value. CRJ has a highly constrained weight value that has excellent performance compared to the standard ESN.. The node in CRJ reservoir are connected in the uni-directional cycle like SCR and allocated the same weight in the bidirectional (jumps). The experimental results showed that the addition of the regular jumps in the reservoir topology gave the SCR and the traditional ESN a better performance (Rodan and Tino 2011).
Cernansky and Makula (2005) proposed a simple architecture, called feedforward ESN (FFESN), which is a simple modification of ESN. In FF-ESN, there is no cycle implementation, so all the units are fed from pervious units only. ESN units are connected to pervious units to keep longer history of activities. The weights of the connections in FF-ESN reservoir used the value of spectral radius. The experimental result showed that the prediction output of FFESN, using this nonlinear combination, produces very high accuracy compared to the original ESN (Cernansky and Makula 2005).
Fette and Eggert (2005) presented a new recurrent model, called Simple Echo State Network (SESN), to replace ESN and liquid state machine (LSM). The purpose of SESN was to reduce the memorization capacity in linear operation mode and pattern matching in nonlinear operation mode. SESN model had only diagonal weight matrix connectivity for the hidden layer, where each node is only connected to itself recreantly. The experimental results
13
showed the SESN reduced complexity and learning cost compared with ESN and LSM (Fette and Eggert 2005).
2.3.3 Support Vector Machine (SVM)
The concept of Support Vector Machine (SVM) is based on decision planes that define the decision boundaries between different data categories. A decision plan differentiates between a group of objects with different class memberships. A classic example of linear-plan classifier is illustrated in Figure 2.3, where Green and Red circles represent objects of two classes, the line between these objects is called separating line that defines the boundaries on the right side of all objects that belong to the green class and to left side for all the object the belong to the red class. Any new object should be classified both as Green and as fall in right side or classified as RED and fall in the left side.
Figure 2.3 : classic example of linear classifier
14
Most of the classification tasks using SVM are not that simple like the one given in the previous example. Thus, a more complex structure are needed to make optimal separation. This case of SVM-based classification will be drawn as illustrated in Figure 2.4 which required a curve separation, which is more complex than the straight line.
Figure 2.4 : SVM complex structure
The classification of complex cases using SVM is illustrated in Figure 2.5 The original object in the left side mapped and rearranged by using a mathematical function known as kernels and the process of object rearranging is known as mapping or transformation. The mapped objects, in this new setting, as given in the right side, is linearly separable instead. Subsequently, mapping objects are used instead of constructing a complex curve.
Input space
Feature space
Figure 2.5 : basic idea behind the SVM
15
Support vector machine (SVM) is the first classifier method that constructs hayperplanes in multidimensional space to perform classification in order to separate cases of different class labels.
2.3.3.1 SVM Model
To build optimal hayperplane SVM, an iterative training algorithm, to minimize the error function, has to be used. Based on the form of the error function, SVM is classified into four different model groups, two for classification and two for regression.
• Classification Model ❖ C-SVM Classification:
In this type, the classifier training implements by minimization the error function, as given in Equation 2.10 and is subject to constraints, as given in Equation 2.11.
1 2
𝑑 𝑇 𝑑 + 𝐴 ∑𝑁 𝑖=1 𝜉𝑖
𝑦 𝑖 (𝑑 𝑇 ∅(𝑠𝑖 ) + 𝑐) ≥ 1 − 𝜉𝑖 𝑎𝑛𝑑 𝜉𝑖 ≥ 0 , 𝑖 = 1, … , 𝑁
(2.10)
(2.11)
16
Where A denote to the capacity constant, d is the vector coefficient, c is small constant and 𝜉𝑖 is a parameter for handling nonseparable input data. The index of the N training cases is represented by i. The value (𝑦 (є ± 1) stands for the class label and 𝑠𝑖 stands for the independent variable. The kernel ∅ is used for transforming the input data from the independent input space into the feature space. It is to be noted that, as 𝐴 is getting larger, a more penalized will be presented, so 𝐴 should be chosen carefully to avoid over fitting.
❖ nu-SVM classification:
In this type, the classifier training implements by minimization the error function, as given in Equation 2.12 and is subject to constraints, as given in Equation 2.13.
1 2
1
𝑤 𝑇 𝑤 − 𝑣𝜌 + 𝑁 ∑𝑁 𝑖=1 𝜉𝑖
𝑦 𝑖 (𝑤 𝑇 ∅(𝑥𝑖 ) + 𝑐) ≥ 𝜌 − 𝜉𝑖 , 𝜉𝑖 ≥ 0 , 𝑖 = 1, … , 𝑁 𝑎𝑛𝑑 𝜌 ≥ 0
(2.12)
(2.13)
• Regression
Support vector regression (SVR) uses the same principles of SVM with few differences because the output is a real number, This type of prediction is very difficult as it has infinite possibilities. SVR have to estimate the functional dependence between the dependent
17
variable y (output) and the independent variables x (input). In most of the regression problems, the relationship function that connects the independent and dependent variables, must be given by deterministic function f plus the addition for some additive noise. SVR have two model type, which are epsilon and nu.
❖ Epsilon-SVM Regression:
In this type, the function training implements by minimization the error function, as given in Equation 2.14 and is subject to constraints, as given in Equation 2.15.
1 2
𝑁 ∙ 𝑤 𝑇 𝑤 + 𝑐 ∑𝑁 𝑖=1 𝜉𝑖 + 𝑐 ∑𝑖=1 𝜉𝑖
(2.14)
𝑤 𝑇 ∅(𝑥𝑖 ) + 𝑏 − 𝑦𝑖 ≤ 𝜀 + 𝜉𝑖∙ 𝑦𝑖 − 𝑤 𝑇 ∅(𝑥𝑖 ) − 𝑏𝑖 ≤ 𝜀 + 𝜉𝑖 𝜉𝑖 , 𝜉𝑖∙ ≥, 𝑖 = 1, … , 𝑁
(2.15)
❖ nu-SVM Regression:
In this type, the function training implements by minimization the error function, as given in Equation 2.16 and is subject to constraints, as given in Equation 2.17.
18 1 2
𝑁 ∙ 𝑤 𝑇 𝑤 + 𝑐 ∑𝑁 𝑖=1 𝜉𝑖 + 𝑐 ∑𝑖=1 𝜉𝑖
(2.16)
𝑤 𝑇 ∅(𝑥𝑖 ) + 𝑏 − 𝑦𝑖 ≤ 𝜀 + 𝜉𝑖∙ 𝑦𝑖 − 𝑤 𝑇 ∅(𝑥𝑖 ) − 𝑏𝑖 ≤ 𝜀 + 𝜉𝑖 𝜉𝑖 , 𝜉𝑖∙ ≥, 𝑖 = 1, … , 𝑁
(2.17)
2.3.3.2 SVM Kernel Type
There are different kernel types that can be used in SVM models include linear, polynomial and radial basis function (RBF), which is the most popular because their localized and finite response across all ranges of the real x-axis and sigmoid. The kernels functions are given in Equation 2.18.
𝐾(𝑋𝑖 , 𝑋𝑗 ) =
𝑋𝑖 ∙ 𝑋𝑗 (𝛾𝑋𝑖 ∙ 𝑋𝑗 + 𝐶) 2 exp(−𝛾|𝑋𝑖 − 𝑋𝑗 | 2 ) { tanh(𝛾𝑋𝑖 ∙ 𝑋𝑗 + 𝐶)
𝐿𝑖𝑛𝑒𝑎𝑟 𝑃𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 𝑅𝐵𝐹 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 }
(2.18)
Where 𝐾(𝑋𝑖 , 𝑋𝑗 ) = ∅(𝑋𝑖 ) ∙ ∅(𝑋𝑗 ) is a kernel function that represents a dot product of the input data points mapped into higher-dimensional feature space by the transformation ∅.
19
2.3.3.3 Related Work
Balabin and Lomakina (2011) presented a general comparison, in terms of accuracy and robustness, between Partial Least Squares (PLS) regression or projection to hidden structure, polynomial Partial Least Squares (Poly-PLS) regression, Artificial Neural Networks (ANNs) Support Vector Regression (SVR) and Least-Squares Support Vector Machines (LS-SVMs). The comparison was supported by imperial results of fourteen different datasets. The results showed that the accuracy of the SVM-based standardization model, SVR and LS-SVM is identical to the ANN-based approach in the accuracy. Also, there is a relation between ANN and SVM approach in relative accuracies. However, for highly nonlinear objects, the SVMbased regression models are better than the neural networks. The regression based on SVM ideology is recommended for practical implementation (Balabin and Lomakina 2011).
Amari and Wu (1999) presented a method for modifying the kernel function of the SVM classifier in order to improve the performance. The kernel function was built based on the structure of Riemannian geometry. The idea of the new kernel is to increase the spatial resolution around the disconnect boundary surface by conformal mapping to increase the separability between the classes. The simulating results supported by artificial and real data proves the efficiency of the proposed kernel (Amari and Wu 1999).
Tong and Koller (2002) presented a new algorithm for active learning with SVM with application of choosing the instance of the next request. The experimental results showed that the proposed algorithm precisely reduces the need for labeled training instance for standard inductive and transductive setting (Tong and Koller 2002).
20
Collobert and Bengio (2001) presented a new decomposition algorithm called SVMTorch, which is analog to SVM-Light that proposed by Joachims (1999), for classification and regression problems. The proposed algorithm solves the a large-scale regression problem with more than 20000 examples. The experimental results, for comparison between the proposed algorithm and another algorithm, showed that the proposed algorithm gives a sufficient time improvement for a large-scale problems (Collobert and Bengio 2001).
Trafails and Ince (2000) made a comparison between SVM and other techniques, such as Backpropagation and Radial Basis Function (RBF) network for financial forecasting application. The results showed that SVM regression is the most powerful technique for function approximation (Trafails and Ince 2000).
2.4
Hybrid Storage System (HSS)
Hybrid storage system consists of hierarchy of different storage devices that vary in their speed performance, energy consumption, and size capacity. Hybrid storage system provides efficient solution for accommodating large amount of data, which low cost, without affecting the I/O response time. In particular, speed performance and energy consumption are increased as we go down in the hierarchy, while the size is increased as go up in the hierarchy.
Modern Hybrid storage system is a typical storage device that includes main memory, solidstate disk (SSD), hard disk drive (HDD) and magnetic tape. The SSD resides at the top of
21
storage hierarchy followed by the HDD of the bottom. Thus, SSD is a better data reader in term of performance with less energy consumption and HDD is better in memory capacity and worst in thee reading performance compares to SSD (Nijim, et al., 2011).
In this multi-level storage system, when the application requests data, the system checks their upper-level storage device (SSD) to locate if the data is not found in SSD, then the system checks in the next storage level (HDD), directly if the data is also not found in this level the system checks the lower level. This process is repeated until the data is located.
Figure 2.6: Hybrid Storage System
2.4.1 Parallel Hybrid Storage System (PHSS)
The Parallel Hybrid Sotrage System approach contains more than one storage device of SSD, HDD, tape to store the data that offer high I/O bandwidth and high performance. Parallel
22
storage system is usually deployed in super computer and the storage devices (SSD, HDD) each of which is called a disk array, are the most important components. This approach is reliable because they have a fault tolerance features that offered by data redundancy.
Parallel storage devices are able to provide the I/O parallelism by stripping techniques : In these techniques each data block is split into smaller data blocks and stripped through different disk arrays. The disk array are controlled by the disk controller that can response and coordinate to the request in parallel. Thus, when some data block is requested, the request will be directed by the controller to multiple disk arrays.
Figure 2.7 : Parallel Hybrid Storage System
23
2.4.2 Prefetching in Hybrid Storage System
Prefetching are well-known techniques for solving the I/O bottleneck problems for the dataintensive computing. Prefteching techniques have two main type: The first type, is a predictive prefetching scheme that predicts the I/O access type based on the information history for that application, The second type, is informed prefetching schema that used the ability of an application to detect hints for the future I/O accesses to prefetch the data before that application submitt it .
Prefetching in a hybrid storage system is solution that promised to reduce the latency of data transforming among SSD, HDD, and tape, by caching the most accessed and popular data in SSD. Prefetching techniques in a hybrid storage system work for predicting the midst priority requested data in HDD and predicting the most-requested data with high priority in SSD while leave the minor requested data in tape. The approach reducing the application elapsed time while is being requested.
2.4.3 Related Work
Al Assaf, et al., (2013) proposed a parallel energy-aware informed prefetching techniques that called ECO-storage that makes the application access the most disclosed patterns using the informed prefetching techniques over classes to data block. Since the SSD is more
24
energy-efficient than HDD layer, the HDD must be standby because in order to save power by prefetching the most-requested data into SSD layer (Al Assaf, et al., 2013).
Nijim, et al., (2010) proposed, for the first time , a multi-layer prefetching algorithm called (PreHySys) that prefetches data from tape to HDD and from HDD to SSD, respectively. PreHySys algorithm reduces the missing rate of high-end storage components that lead to reduce the response time for the requested data in a hybrid storage system (Nijim, et al., 2010).
Al Assaf, et al., (2012) proposed, informed prefetching techniques, called IPODS that prefetch hinted blocks that makes the application access the most disclosed patterns using the informed prefetching techniques over to data block for among a distributed multilevel storage system. They developed a prefetching pipeline in IPODS where the process of informed prefetching are divided into separate prefetching steps among the multiple level storage in distributed system. They improve I/O performance as the data block are prefetched from hard disk to memory buffer in the remote storage server. The data blocks that are buffered in server prefetched through the network to client’s local cache. Subsequently the pipelining manner improve the I/O performance (Al Assaf, et al., 2012).
Jiang, et al., (2013) proposed a thermal model for hybrid storage system that includes HDDs and SSDs. The generated thermal profiles for HDD and SSD results show that both HDD and SSD deeply affected by the temperature of the storage node. The two types builds for hybrid storage system are called inter-node and intra-node hybrid storage cluster to estimate the cooling cost of the storage cluster armed with hybrid storage node . Their thermal model
25
offer two benefits: First, their model make it possible to reduce the cost of thermal monitoring. Second, their model enable data center designer to make intelligent decision to thermal management during the design step (Jiang, et al., 2013)
Saha, et al., (2014) proposed new data prefetching algorithm called (DM-PAS) to meet the growing demand for a high-speed data fetching on a large cloud system. The Purpose of the proposed algorithm is to achieve the energy performance of data prefetching in a hybrid storage system. The simulation shows that the new algorithm offers better performance, reliability and energy efficiency in a multi-layer storage system (Saha, et al., 2014).
26
Chapter Three System model 3.1 Introduction The aim of this chapter is to propose an approach for a fast-response single or multi user’s application that runs on the top of a two levels hybrid storage system (SSD and HDD). To facilitate a fast response, a data mining module will be utilized to continuously building a history and issuing data migration decisions based on data importance criteria.
Data mining techniques are able to classify the data based on their importance. In this study, a data mining based approach that performs the highest possible accurate data classification for a commercial marketing website’s applications in parallel with its on-demand request is proposed. The proposed approach can also be applied on other applications that include, but not limited to database applications, image processing applications, marketing, finance, voice and text recognition and etc.
3.2 Application
The proposed hybrid storage system will be devoted for commercial marketing website. The focus will be on behavioral targeting for customers who access a particular firm or a vendor’s website to browse and purchase products.
27
The proposed approach analyzes and process the server log for commercial marketing website for some marketing company. Server log is a log file or several files that are automatically created and preserved, by the company server that owned the website, which contains a list of activities performed. The main property of the log file that it contains a repeated pattern for the requested page reflecting the customer behavior, which will help to predict the most-requested data in the storage.
3.3 Model Design
The proposed approach consists of an application that access a multi-level hybrid storage system which is maintained by data mining techniques. The application issues and response to data requests and runs on the top of a two levels hybrid storage system (SSD and HDD). The uppermost levels will be used to accommodate important data in order to decrease data access time by migrating important data to upper levels of the storage system. Figure 3.1 illustrates the architecture of the proposed approach.
Figure 3.1: The architecture of the proposed approach
28
The proposed approach is executed as follows: When the application begins the execution, it continuously issues on-demand data reading requests. Each request is oriented to the hybrid storage system in order to be fulfilled from the particular level where the requested data block is currently allocated. Each request is also tackled by our solution module where it invokes the appropriate data mining and classification algorithm. The classification algorithm issues a data migration decision that migrates a particular or a set of data blocks from a level to another, based on the tracked history.
The data accessing and reading speed from the storage devices in the hybrid system increases as go up in the hierarchy. Migrating important data blocks to the uppermost levels will decrease the application on-demand data reading time; and hence; decreasing its execution elapsed time. The accuracy of data migration decisions is the most important performance metric that our system is evaluated by.
The proposed approach involves classifying data based on information systems, applications, data related to each type and the data importance measurements. For this purpose, several classification algorithm are being tested and evaluated.
Building the proposed system is passed by three main steps, which are:
•
Data preprocessing
•
Data Mining
•
Data Prefetching
29
3.4 Preprocessing Step
The machine log file that will be used in the proposed approach contains multiple system calls made by users. Part of the log file is illustrated in Figure 3.2. Besides, many other data, the log file contains an open system call, which is equivalent to requested URLs. These calls occurs with repeated patterns that is useful for the implemented approach. However, before using such data, it should be normalized and pre-processed. Thus, a two phases preprocessing are proposed.
Figure 3.2: Data log file
30
❖ Pre-processing Phase 1:
The purpose of this phase is filtering out the information in the log file. Algorithm 1 describes the processes in this phase. This phase is implemented in two steps, which are:
•
Filtering out the open calls by searching the file for the keyword "open".
•
Extract the URLs from the founded open calls.
Algorithm 1: Preprocessing Phase 1 Input: File file Output: URL[1..n] i = 1; // Counter line = readNextLine(file) // Read file line by line WHILE (line != null) IF (line.contains("open")) URL[i]= ExtractURL(line) i++ EndIF EndWHILE
❖ Pre-processing Phase 2:
The purpose of this phase is to convert the URL extracted in phase 1 to identification numbers. Algorithm 2 describes the processes in this phase. This phase is implemented in two steps, which are:
31
•
Loop through the filtered calls and give sequential numbers for each URL.
•
Assign the same number for the identical calls.
Algorithm 2: Preprocessing Phase 2 Input: URL[1..n] Output: URLNumbers [ 1..m] Number:=1 UniqueURL [] FOR(i:= 1 to n) IF(UniqueURL.contains (URL[i])) URLNumbers[i]=UniqueURL.IndexOF(URL[i]) ELSE URLNumbers[i]= Number Number++ UniqueURL.ADD(URL[i]) EndIF EndFOR
The output of the preprocessing step is a file of numbers that represented the identities of the system call as given in Figure 3.3.
Figure 3.3: Output data after the pre-processing step
32
3.4.1 Statistics about the Data In this paragraph we want to check the percentage of the repeated patterns in the testing dataset for machine01 and machine06 that already exists in the training set that the model was based on showed in tabel 3.1.
In machine01, we have found that the repeated patterns in model 60% is better than model 70% and model 80% the repeated patterns gave the biggest percentage.
In machine06, we have noticed that the repeated patterns in model 60% was the best the percentage followed by 70% then 80% , and this was noticed because in the end of the dataset new repeated patterns exist and the model will have to learn them .
Table 3.1 : Statistics about the data Model
60 %
70 %
80 %
76.84 %
78.77 %
64.61 %
47.07 %
Machine01 Testing Average Repeated Pattern
77.99 % Machine6
Testing Average Repeated Pattern
72.16%
33
3.5 Data Mining Step
The purpose of the data mining step is to classify the data, while has been normalized in the previous step, based on their importance. In general, the classifier builds a model based on the training data and use it for prediction the classes of the unknown class data in the testing set. This process is illustrated in Figure 3.4. Several methods for data classification are used, as will be discussed in the following subsections.
Figure 3.4: Data classification process
The processed data inserted into the data mining techniques is related to each other in a way that a request is followed by another and another. Thus, the purpose of the utilized data mining techniques is to train a model that is able to figure out these relations. Subsequently, for a new request, the model will be able to predict the most probable next request(s). In this
34
these, the following data mining techniques are used to find out a model for the mentioned purpose. We call such predicted data, the significant data of that request.
3.5.1 ESN
ESN is used to classify the preprocessed data in the proposed approach. ESN is a simple, easy, and powerful method that is used to train RNN to predict future access. However, to make the ESNs gives satisfactory results in the proposed approach, some parameters need to be initialized. As the parameters initialized, the process of classification is conducted the same way as in any classification process. ESN based classification process is illustrated in Figure 3.5.
Figure 3.5: Data classification using ESN
35
The parameters required by the ESN in the proposed approach is discussed in the following. First: the size of predicted set is initialized. The size of the predicted set refers to the size of the data block that will be predicted after the actual request is made. Second, the size of the of the training dataset is determined. ESN is one of the supervised learning that required a training dataset. Third, a bias value for the input data is determined. The bias values allow for a space of unmatching between the actual class and the formed class in the training step. Dealing with a space of error in the training phase decreases the prediction error later on. Fourth: Before we generated the ESN model, the reservoir size, N, is determined. N refers to the size of internal units resides in the ESN model.
In order for ESN principle to work, the reservoir must have an Echo State Property (ESP), which is related to algebraic properties of the weight’s matrix of the internal unit. ESP is violated for zero inputs of the spectral radius of the reservoir weight matrix larger than unity. Spectral radius value must be less than 1. ESN sparsely connecting the internal units through links, to make the ESN work correctly; the connectivity between the units should not exceed the 50%; that mean connectivity must be less than or equal to 0.5.
In the proposed approach, ESN input weights are randomly generated. These weights are scaled by multiplying them with some input weight that will be less than 1, for obtaining the best weight for the input. The generated model of the ESN is illustrated in Figure 3.6.
The ESN stop updating the weights and terminates when the weight values go out of the range of internal unites we determined previously or the number of iteration goes beyond 1000.
36
Figure 3.6: ESN model design
3.5.2 CRJ
CRJ is also used to classify the preprocessed data in the proposed approach. Standard ESN has serious of a problem affects its acceptability. The problems of ESN are: The reservoir is not easy to understand. Also, the reservoir specification and input connections required many trials and luck. Moreover, forcing a constraint for spectral radius of the reservoir matrix is a helpless tool to set the reservoir parameter. Finally, the reservoir connectivity and weight structure is far to be optimal and no clear view for the reservoir dynamic organization. Thus, the CRJ was proposed by Rodan and Tina (2011) to solve these problems.
37
CRJ is a simple deterministic reservoir model with highly extent weight values (Rodan and Tino, 2011). CRJ deterministically generates reservoir that can produce performance better than standard ESN, and their model called Simple Cycle Reservoir (SCR) (Rodan and Tino, 2011). CRJ based classification process is illustrated in Figure 3.7.
Figure 3.7: Data classification using CRJ
38
To implement CRJ in the proposed approach, first, the size of the predicted set is determined. The training set is also determined. Then, the reservoir size N, similar to ESN, is determined. Finally, the number of input and output units with the added bias value to input units are also determined.
Unlike ESN, CRJ has a simple regular topology with full connectivity between the input and reservoir, there is no need to specify a value for connection. Also the reservoir nodes connected via uni-directional cycle with the same value, called rc, for all the connection. Thus, the value of rc should be determined to be greater than 0.
CRJ also has a bi-directional shortcut (jumps) between the reservoir units. These jumps increase the density of the connections in the internal units, which in term facilitates a good training. These jumps have a similar weight value called, rj. The number of jumps from an internal unit to the others called step1. Unlike ESN, which generates the input weight randomly, CRJ required to set its input weight value, called v. These parameters have to be determined before begins with the training process.
The only random element in CRJ is the distribution of input weight. Later on, in the experiments, we detect that trying to force a regular pattern in the input weight sign, like using periodic structure form, such as "+ - + -" or "+ - - +" leads to drop the performance.
Subsequently, the
sufficient way to relate the sign pattern is the single generated
deterministic sequence. For this purpose, we have generated a universal input sign pattern from a decimal illogical expansion and we have decided that this number will be Ԉ . The N
39
digits chosen were the decimal parts from Ԉ . For d1,d2,d3, … dN
a threshold value were
chosen at 4.50 ; if the value of dn is between 0