Leveraging Machine Learning for optimize predictive ... - IEEE Xplore

IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014),May 09-11,2014,Jaipur,India

Leveraging Machine Learning for optimize predictive classification and scheduling E-Health Traffic. Madhumita Kathuria

Dr. Sapna Gambhir

Research Scholar, Department of Computer Science, YMCAUST Faridabad, Hary ana, India [email protected]

Associate Professor, Department of Computer Science, YMCA University of Science and Technology, Faridabad, Hary ana, India [email protected]

Abstract- Wireless Body Area Network (WBAN) is a special kind of autonomous sensor network evolved to provide wide variety

of

services.

Nowadays

WABN

becomes

an

integral

component of healthcare management system where a patient needs to be monitors both inside and outside home or hospital. These applications are responsible for gathering and managing heterogeneous data in terms of both for real time and non-real time

traffic.

Heterogeneous

traffic

classification

plays

an

important role in various application of WBAN. Due to the ineffectiveness

of

traditional

port-based

and

payload-based

methods, recent work were proposed using machine learning

technology implant on body of a patient can monitor physiological signals such as BP, ECG, Glucose level and Temperature. Data transmitted from each of the sensor can be collected at sink or controller unit and forward via a gateway to remote monitoring centers. This paper has been organized as follows: Section 1 signifies a brief introduction to subject matter. Section 2 discusses the concept of machine learning and classification techniques. Section 3 provides concise information about E-Health traffic management. Section 4 provides the conclusion part.

methods to classify flows based on statistical characteristics. In this paper, we evaluate the effectiveness of integral concept of machine learning in terms of binary decision tree and genetic algorithm

for

classification

of

heterogeneous

traffic

flow

according to rules. We have also designed an Earliest Deadline

I

I

I

I

�

,

[Ji

based flexible dynamic scheduling algorithm, which has been

Database ,

proven to be an optimal prioritized scheduling for problem like starvation.

.On Body Sensors Keywords- E-Health; WEAN; Decision tree; Genetic algorithm; Starvation; Earliest Deadline

I.

INTRODUCTION

Demand of wearable wireless tiny devices with body centric network is rising day by day and this demand evolves a new technology called WBAN. Applications focus on monitoring health status demanding for a pervasive autonomous monitoring of patient both in home or hospital. This paper presents a hybrid method for classification of heterogeneous health data by considering both decision tree and genetic algorithm schemas. Knowledge derived from this proposed method provides high classification accuracy along with the ability to identify the most significant gene. Decision tree is used to create rules based on set of IF-THEN statements to maximize interpretability. Decision tree generate training and testing data sets from the monitored health or clinical data. The applications of health informatics in health care decision making are categorized into: Retrieval, Alerting, Prediction, Suggestion and Reminders to health related information. We have proposed an intelligent health care decision support system which indicates the critical situation through decision tree algorithm and genetic algorithm. Machine learning techniques have shown significant improvement in E-Health application in terms of prediction and decision making with respect to various diseases. WBAN system for E-Health application shown in Figure-l is a trustworthy system that assures reliable and manageable heterogeneous packet. Sensor

[978-1-4799-4040-0/14/$31.00 ©2014 IEEE]

Fig.l. Application of WBAN: An E-Health system

II.

MACHINE LEARNING FOR E-HEALTH

One of the major objectives of many WBAN applications is to enhance and optimize the performance of the entire network. Machine learning schemes that are suitable for traffic classification can be generally divided into two categories as supervised classification and unsupervised classification. Supervised learning builds a model (e.g. a decision tree or classification rules) from a training set of pre-labeled instances, which is then used to classify unseen instances. The main problem of heterogeneous WBAN is that traffic generated by each kind of applications has unique statistical property and need different kinds of quality of services. Machine learning techniques are suitable for pulling out such distinct patterns automatically.

A.

Binary Decision Tree (BDT)

Decision tree is a very popular machine learning technology in the practical field. Decision tree is a ty pe of classification method that uses top down construction process.


It classifies the given data items using the values of its attributes. The decision tree initially constructed from a set of predefmed data. The size of decision tree is depending upon the data set, if data set is large the number of branches of tree will increases. A node of a decision tree specifies an attribute by which the data is to be partitioned. The basic idea of Binary Decision Tree algorithm is to construct a binary tree by employing a top-down search through the given sets of rule or policy to test each attribute at every node of tree. [n a decision tree each intermediate node represents an option between a number of choices, and each leaf node represents a result. The search starts with a root node and proceeds towards the leaf node to take an action. Our packet classifier classifies packets based on some features of packet headers and categorized them into training datasets and test data sets. Rules can be directly induced from Training data using a variety of rule based algorithm. An IF-THEN rule is selected at each node to find the final outcome or class. Figure 2 provides a Binary Decision Tree algorithm. Terms: Let F: is the set of feature vectors, C: is the set of classes, I: F-> C is an ideal classifier for F and S: Training set, which is a subset of {F * C}. Characteristic: Each splitting is based on one nominal feature and considers its complete domain. Splitting based on feature A (input feature set) with domain a belongs to {ai, a2, .... ,aj}.

B.

Genetic Algorithm (GA)

[t is an optimized technique that mimics the process of natural selection. Genetic algorithm is a part of evolutionary computing, which have also been used for learning sets of rules. In this paper GA is applied on training sets to generate optimized training data sets. The whole chromosome can represent a complete set of If-Else rules. [nitially, the genetic algorithm begins with a primary population including random chromosomes that consist of genes with a sequence of 0 s or 1 s. [n the next step, the algorithm biases individuals toward the optimum solution through repetitive processes such as crossover, mutation and selection operators. A new population can be produced by two methods: steady-state GA and generational GA. [n the first case, one or two members of population are replaced, while in the second case GA replaces all of the produced individuals at each generation. [n this paper, the second method is adopted so that the GA keeps the specified qualified individuals from the current generation and copies them into the new generation as part of the solution. Other individuals of the new population are obtained by crossover and mutation functions. An algorithm for genetic algorithm is shown in Figure 3.

Algorithm (or Binary Decision Tree (BDT): BDT(S, A, T) liS: Training Data Set IIA:Input Feature Set IIT:Target Feature Set 1. Create a new tree T with node n as root node 2. [f stopping condition is false, then 2. [. [f all examples in S are positive, then Return the single node tree T with label "+" 2.2. [f all examples in S are negative, then Return the single node tree T with label "-" 2.3. Mark n as leaf node and Label n with most common value of target feature set in S 3. Else 3.1 Find a discrete function F(A) of input attributes values, such that splitting S according to F(A)'s outcomes {a 1 ,a2, ....aj}gain the best splitting metric. 3.2 If best splitting metric> Threshold, then 3.2.1 Label n with F(A) 3.2.2 For each outcome a in F(A), do 3.2.2.1 Add a new tree branch below n, corresponding to the test data F(A)=a I!Let S' be the subset of S that has value a for F(A) 3.2.2.1 [f S' is empty, then Add a leaf node with label of the most common value of target set in S Else Add the subtree BDT(S', A- F(A), T) 4. Return T Fig. 2. Algorithm for Binary Decision Tree

Algorithm (or Genetic Algorithm (GA): 1. Randomly initialize a population of individual solution 2. Randomly select individuals from populations 3. Compare these individuals with respect to their fitness 4. While (termination criteria=false) 4.1Modify these individuals using some or all of the following operations. 4.1.1 Reproduction: Copy an individual without change. 4.1.2. Crossover: Exchange sub structure between two individuals. 4.1.3. Mutation: Exchange a single unit in an individual at a random position. Fig. 3. Algorithm for Genetic Algorithm

III.

PROPOSED TRAFFIC MANAGEMENT SCHEMA

Traffic F[ow management module: [t is responsible for manage traffic flows at different levels so that both real time traffic and non-real time health related traffic get benefit. It is responsible to manage four kinds of works: Packet classification, queuing, scheduling and dropping as given III Figure 4.


tests the test dataset against the training data set and assign an r;:::====::;----, appropriate priority to each packet. Incoming Packets

Algorithm (or packet classification module: For each arrival packet II check packet features from packet header 1.1 Make a training data set from these features 1.2 Apply BDTO on these training data set to generate rules. 1.2 Apply GAO on these rule to generate optimized rules a. Generate initial population (binary decision tree) randomly b. Evaluate each binary decision tree c. While termination condition satisfied II number of generation =4 I. Select two parents (nodes from binary decision tree) ii. Perform crossover: exchange their subtree iii. Perform mutation: exchange either two intermediate nodes or leaf nodes of these subtrees iv. Evaluate fitness End while 2. Match the result or fitness with rule sets or policies present in dataset 3. Calculate match ratio probability 4. If match ratio probability is up to mark, then a. Assign priority 5. Else a. Modify the population b. Goto step 3.

l.

Test data set

Low priority

High priority

(priority no=Q or 1)

(Priority no=O or 1)

Store the

Store the

packet into

packet into

Low priority

High priority

queue

queue

Forward higher

Calculate

priority packets for

Deadline of each

servicing

low priority packets

Fig. 5. Algorithm for packet classifier module Forward Low priority packets for servicing

L.------l

Fig. 4. Traffic flow managing module

a) Packet classifier module: Decision tree based packet classification system constructs a decision tree from the rules in the rule-set where the leaves of the tree contain rules or subsets of rules. Packet header fields are used to traverse the decision tree to find a matching rule. Packet classification compare each incoming packet against a rule based optimized training data set. Packets are classified into appropriate class by examining specific fields such as < Source_address, Destination_address, Source_port, Destination_port, Packet type, Packet size, Critical level > in the packet header. Work flow diagram of packet classification module is given in Figure 5 classifies packets into different flows and assigns them different priorities. This module use a Genetic programming based binary decision tree algorithm, which

I. Population: Individuals are selected by randomly selecting binary decision trees, created from the features given in header field of packets. It scans sample, select records randomly from training data set, where class attributes are matched. Each chromosome can represent by a rule of the form If-Else, The initial fitness is assigned to zero, 2. Evolution: This process calculates each attribute with gain rate, encodes the data and generates initial population, 3. Fitness: The fitness function used for calculation of classification accuracy, tells about the number of cases correctly classified. It measures ratio of the number of patterns in the training sets are correctly classified with total number of patterns in training set. It is directly proportional to Accuracy and Gain Ratio, The function for accuracy gain ratio and fitness function are given in equation (1), (2) and (3).

(1)


Gain ratio=square root (T_T/(T_T+FJ)+TJ/(TJ+F_T)) (2) Packet type =Real Time

Fitness function = a* Accuracy + p*Gain ratio Where

0