Discovery Protocols based on Machine Learning

1

Discovery Protocols based on Machine Learning techniques in Wireless Sensor Networks with Mobile Elements A simulation study Master Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Information Systems by

Ernesto Quisbert Trujillo

Thesis Director Prof. Paul Cotofrei, University of Neuchâtel Expert Dr. Iulian Ciorascu, University of Neuchâtel

Neuchâtel Switzerland, July 2017

2

3

ABSTRACT In recent years, the research community has been interested in Wireless Sensor Networks and their applications in several fields such as the industrial and health care monitoring, environmental sensing, or smart applications of the Internet of Things. In this context, the introduction of mobility in sensor networks imposes challenges for discovery protocols between static and mobile devices with random motion. The aim of this Master Thesis is to establish a ranking of sensor network metrics used in distance estimation and to train an artificial neural network in order to propose a discovery protocol in a hybrid topology. The prototype simulates a synchronization system of transmissions between static nodes and a mobile data collector with OMNeT++ and MiXiM framework.

RÉSUMÉ Ces dernières années, la communauté scientifique s’intéresse aux réseaux de capteurs sans fil et ses applications dans plusieurs domaines tels que la surveillance médicale et industrielle, le monitorage environnemental ou les applications intelligentes de l’Internet des Objets. Dans ce contexte, l’introduction de mobilité à des réseaux de capteurs impose de défis pour les discovery protocoles entre capteurs statiques et capteurs avec mouvement aléatoire. Le but de ce mémoire académique est d’établir un classement des indicateurs de réseaux utilisés pour l’estimation de distances et d’entraîner un réseau de neurones artificielle afin de proposer un discovery protocole dans une topologie hybride. Le prototype émule un système de synchronisation de transmissions entre capteurs statiques et un collecteur de données mobile avec OMNeT++ et MiXiM framework.

KEYWORDS Wireless sensor networks with mobile elements, ranging techniques, discovery protocol, random motion, machine learning, artificial neural network, random forest tree, OMNeT++ simulator, MiXiM framework, mobile data collector, hybrid topology, grid topology, received signal strength indicator, time of arrival, latency, delay, link quality indicator

4

.

5

CONTENT 1.

2

INTRODUCTION ............................................................................................................ 15 1.1

Discovery Protocols and Ranging Techniques .......................................................... 15

1.2

Motivation ................................................................................................................. 15

WIRELESS SENSOR NETWORKS WITH MOBILE ELEMENTS .............................. 17 2.1

Access network technology and standards ................................................................ 17

2.2

WSN Topologies ....................................................................................................... 18

2.3

Discovery routines and Data collection ..................................................................... 18

2.4

Ranging techniques.................................................................................................... 19

2.4.1

Time of Arrival................................................................................................... 19

2.4.2

Received Strength Signal Indicator .................................................................... 20

2.4.3

Link Quality Indicator ........................................................................................ 20

2.5 3

Transmission states .................................................................................................... 20

MACHINE LEARNING TECHNIQUES ........................................................................ 21 3.1

CART and Random Forest Tree ................................................................................ 21

3.1.1 3.2

Artificial Neural Networks ........................................................................................ 22

3.2.1

Perceptron........................................................................................................... 22

3.2.2

Back propagation and gradient descent .............................................................. 24

3.2.3

Learning rate ...................................................................................................... 24

3.2.4

Model validation: Early Stopping ...................................................................... 25

3.3

4

Variable importance ........................................................................................... 22

Hierarchical Clustering .............................................................................................. 26

3.3.1

Similarity metrics ............................................................................................... 26

3.3.2

Linkage methods ................................................................................................ 27

3.3.3

Defining the optimal number of clusters ............................................................ 27

METHODOLOGY ........................................................................................................... 29 4.1

System deployment.................................................................................................... 29

4.2

Software deployment ................................................................................................. 30

4.3

Data preparation ........................................................................................................ 32

4.3.1 4.4

Data transformation ............................................................................................ 33

Variable importance analysis..................................................................................... 34

4.4.1

Mapper function ................................................................................................. 34

4.4.2

Reducer function ................................................................................................ 35

6

4.5

5

6

Neural Network Model .............................................................................................. 36

4.5.1

Model construction ............................................................................................. 36

4.5.2

Model validation ................................................................................................ 39

4.5.3

Evaluation........................................................................................................... 39

4.6

Discovery protocol simulation................................................................................... 40

4.7

Programming approach.............................................................................................. 41

4.7.1

MDC application layer ....................................................................................... 42

4.7.2

Static node application layer .............................................................................. 43

4.8

Discovery process ...................................................................................................... 44

4.9

Simulation experiments ............................................................................................. 46

RESULTS ......................................................................................................................... 49 5.1

Variable importance .................................................................................................. 49

5.2

Simulation results ...................................................................................................... 49

5.2.1

Mobility behavior of the MDC........................................................................... 50

5.2.2

Distance recognition accuracy ........................................................................... 51

5.2.3

Error prediction analysis .................................................................................... 53

DISCUSSION ................................................................................................................... 55

CONCLUSION ........................................................................................................................ 55 REFERENCES ......................................................................................................................... 57

7

LIST OF TABLES AND FIGURES Table 4.1 Network configuration 1 .......................................................................................... 46 Table 4.2 Network configuration 2 .......................................................................................... 46 Figure 2.1 Most common grid topologies ................................................................................ 18 Figure 2.2 Discovery routine .................................................................................................... 19 Figure 2.3 Usual radio states and its related energy and transition costs ................................. 20 Figure 3.1 the single layer perceptron ...................................................................................... 23 Figure 3.2 Sigmoid logistic activation function ....................................................................... 23 Figure 3.3 an error curve based on the weight values of a single predictor ............................. 24 Figure 3.4 Error curve with local and global minimums ......................................................... 25 Figure 3.5 Early stopping approach ......................................................................................... 25 Figure 3.6 An example of a dendrogram representing the clusters of a simple dataset ........... 26 Figure 3.7 Cosine distance interpretation ................................................................................. 27 Figure 3.8 Interpretation of the UPGMA linkage method ....................................................... 27 Figure 4.1 Deployment diagram of the system ........................................................................ 30 Figure 4.2 Diagram components of the subsystems modules .................................................. 31 Figure 4.3 Delay text file for a distance experiment D with n network configuration sets ..... 33 Figure 4.4 Metadata source file of the Distance D and the Run 1 ........................................... 33 Figure 4.5 ANN model 1 with the feed forward training matrix operations ............................ 37 Figure 4.6 ANN Model 2 with the back propagation and gradient matrix operations ............. 38 Figure 4.7 Validation of ANN model 1.................................................................................... 39 Figure 4.8 Validation of ANN model 2.................................................................................... 39 Figure 4.9 Evaluation results of the ANN model 2 .................................................................. 39 Figure 4.10 Proposed hybrid topology and its implementation in Omnet ............................... 40 Figure 4.11 Inheritance diagram of the simulation NED components ..................................... 41 Figure 4.12 Class diagram of the MDC application layer ........................................................ 42 Figure 4.13 Class Diagram of the Customized Packet ............................................................. 43 Figure 4.14 Class diagram of the static node application layer ............................................... 44 Figure 4.15 Finite state machine for the discovery process ..................................................... 45 Figure 4.16 transmission synchronization of the discovery prototype..................................... 45 Figure 5.1 Variable importance results .................................................................................... 49 Figure 5.2 Mobility behavior of a MDC moving at 0.2 mps ................................................... 50 Figure 5.3 Mobility behavior of a MDC moving at 0.2 mps (Amplified view) ....................... 50

8

Figure 5.4 Mobility behavior of a MDC moving at 1.4 mps ................................................... 51 Figure 5.5 Mobility behavior of a MDC moving at 18.8 mps ................................................. 51 Figure 5.6 Distance recognition accuracy of a MDC moving at 0.2 mps ................................ 51 Figure 5.7 Distance recognition accuracy of a MDC moving at 1.4 mps ................................ 52 Figure 5.8 Distance recognition accuracy of a MDC moving at 18.8 mps .............................. 52 Figure 5.9 Standard deviation of the error prediction at different velocities ........................... 53

9

ACKNOWLEDGEMENTS

I would like to express my special thanks to my thesis advisor; Professor Paul Cotofrei for discussions concerning several aspects of my thesis. Every results described in this thesis was accomplished with his help and this achievement would not have been possible without his support.

10

11

To my sister

12

13

GLOSSARY WSN

Wireless Sensor Network

WSN-ME

Wireless Sensor Network with Mobile Elements

RSSI

Received Strength Signal Indicator

ANN

Artificial Neural Network

MDC

Mobile Data Collector

ToA

Time of Arrival

LQI

Link Quality Indicator

IoT

Internet of Things

CSMA/CA

Carrier Sense Multi Access – Collision Avoidance

MAC

Medium Access Control

TDoA

Time Difference of Arrival

AoA

Angle of Arrival

CART

Classification And Regression Trees

ID3

Induction of Decision Trees

UPGMA

Unweighted Pair Group Method with Arithmetic Mean

AWS

Amazon Web Services

EC2

Elastic Computing

RDS

Relational Database Services

S3

Simple Storage System

VPC

Virtual Private Cloud

14

15

1. INTRODUCTION The past decade has seen the rapid expansion of Wireless Sensor Networks (WSN) in several areas such as the medical, military and industrial fields. These sensor networks are commonly organized in fixed topologies, but sensor technology is not limited to static scenarios. A number of authors have proposed mobile elements as a fundamental component of research methodology or business models. By way of illustration, Anastasi, Borgia, Conti, & Gregori, (2010) used people's motion to collect pollution data of critical contamination points in a city. Another example in the business field is the study conducted by Abdelzaher et al., (2007) which proposed an online service platform based on sensor data collected by users’ personal devices.

1.1 Discovery Protocols and Ranging Techniques Regardless of the field of application, Wireless Sensor Networks with Mobile Elements (WSN-ME), involve additional discovery protocols to synchronize data transfer between static and mobile devices. These discovery routines have been exploited in several studies, but the majority of contributions rely on a single or few WSN metrics (Di Francesco, Das, & Anastasi, 2011).

At the same time, numerous node localization or ranging techniques based on WSN metrics have been proposed over the last years. However, the discussion about the reliability of such metrics is often reduced to the Received Signal Strength Indicator (RSSI). The limited literature about the accuracy and consistency of other metrics leads to questioning their usefulness when designing new ranging techniques. Clearly, the benefits of knowing the relevance of WSN metrics in distance estimation could offer a standpoint for future discussions and support the construction of more efficient techniques.

1.2 Motivation This thesis investigates the relevance of usual distance estimators and network metrics to build an Artificial Neural Network (ANN) model with the most relevant predictors. The trained network is applied to a discovery routine prototype which synchronizes transmissions between static nodes and a Mobile Data Collector (MDC) with random motion. The analysis

16

relies on machine learning applications and an extensive dataset containing transmissions of two CC2420 transceivers placed at different distances. An Omnet simulation prototype evaluates the distance prediction accuracy in two sensor network configurations.

An important requirement of the methodology is to build a simple ANN model based on matrix operations in order to have access to the weight matrices all the time. For this reason, the construction of the model is limited to back propagation and gradient descent without bias. Moreover, due to practical constraints, the simulation modeling of radio transition states is omitted, and a simple system based on Omnet signals is applied instead.

Next chapters are organized as follows: Chapter 2 introduces the knowledge background of WSN fundamentals, components, discovery protocols and classical ranging techniques. Chapter 3 covers the machine learning techniques used for the discovery prototype. Chapter 4 describes the methodology and the software implementation, Chapter 5 presents the results, and finally, Chapter 6 provides a discussion about the findings of the study. Conclusions and future work suggestions are given in the last section of this document.

Note: Throughout this study, all the software and hardware components are signaled by using this text font.

17

2 WIRELESS SENSOR NETWORKS WITH MOBILE ELEMENTS A WSN is a self-organized network composed of sensing nodes with a finite source of energy. These devices cooperate to transfer sensing data towards a base station, commonly known as sink nodes or data collectors. The purpose of a sensor is monitoring a phenomenon or modifying its environment by interacting with other devices in the network. A sensor node is composed of a sensor unit, a CPU, a power unit, a transmission unit, and an analog to digital converter. In a WSN-ME, any sensor node can move in a controllable or uncontrollable fashion, introducing advantages for several contexts (Kansal et al., 2004; Anastasi, Conti & Francesco, 2009).

2.1 Access network technology and standards Representative standards for WSN are Bluetooth 4.0 for the medical field, IEEE 802.15.4 for WSN in the industry, and WLAN IEEE 802.11 oriented to Internet of Things (IoT) applications (International Electrotechnical Commission, 2014). The most relevant standard is IEEE 802.15.4 which is mainly focused on two aspects of WSN communication: the physical layer and the link sublayer.

Due to the fact that sensors share a common medium, an access control mechanism or MAC protocol is implemented in the link sublayer. These protocols decide when nodes can transmit and resolve conflicts when collisions are produced. In this context, the Carrier Sense Multiple Access Collision Avoidance or CSMA/CA protocol is widely used in WSN (Dargie & Poellabauer, 2010). This contention protocol is included in the IEEE 802.15.4 standard, which specifies, in its section 6.4, some of the metrics found in the experimental data of this study. 

macMaxFrameRetries: maximum number of retransmission of the same packet.



macAckWaitDuration: maximum period to wait for an acknowledgement frame to arrive following a transmitted data frame.



aMaxMACPayloadSize: maximum size of data information that can be transmitted in the MAC payload field.

18

Additionally, we define the following Network configuration parameters that can be set in a MAC protocol. 

Packet inter-arrival time: waiting period in which the transmitter sensor waits to send the next packet.



Maximum queue size: buffer size. It is a strategic parameter to avoid overflow or congestions.

2.2 WSN Topologies Conventional WSN topologies are the single hop star, hierarchical cluster, and mesh or grid topologies (Sharma D., Verma, & Sharma K., 2013). Multi hop-mesh and grid topologies are commonly preferred for vast sensing areas. In these topologies, nodes usually cooperate to transmit their sensing data towards a base station, so specific routing protocols are often implemented.

As noted by Tian, Shen, & Matsuzawa, (2005) the most common grid topologies are the hexagon, square, and triangle topologies, as seen on Figure 2.1. Each one of them offers advantages based on the numbers of neighbors around a static node. An experimental study conducted by Zhang, Zhao, Zhu, & Li, (2010) concludes that hexagon topology reaches high received packets rates only with three sensor neighbors, despite its short transmission range.

Figure 2.1 Most common grid topologies (Tian et al. 2005)

2.3 Discovery routines and Data collection Mobility can be applied to sensing nodes, relays or data collectors. A Mobile Data collector (MDC) visits the sensing nodes to gather data and are widely studied in (Wang et al., 2005; Rao et al., 2008). An MDC relies on two important routines called discovery and data

19

transfer. Discovery protocols involve a preliminary contact step in which a fixed node detects the MDC presence in its communication range as showed in Figure 2.2. (Di Francesco et al., 2011).

Figure 2.2 Discovery routine (Di Francesco et al., 2011)

When the MDC moves in a random manner, the discovery mechanism of the static nodes relies on two strategies. The first one exploits some knowledge about the mobility patterns (Jun et al. 2005). The second strategy depends on an auxiliary short-range radio that regularly listens to the medium and turns on the main long-range radio when the MDC enters its transmission range (Khan, Gansterer & Haring, 2007; Denkovski, Mateska & Gavrilovska, 2010). Once contact is established, data transfer begins within a period called residual contact, which is usually shorter than the duration of the discovery routine.

2.4 Ranging techniques Numerous node localization or ranging techniques are based on the RSSI and Time of Arrival (ToA) metrics. Dargie et al. (2010) define these ranging techniques as follows:

2.4.1 Time of Arrival This ranging technique is based on two components: the signal propagation time and the signal velocity. For instance, if a sound signal takes 30 milliseconds to travel 10 meters at 20 degrees Celsius, it is possible to determine the distance between two nodes in a specific radio signal propagation by knowing how long the packet takes to arrive at the destination. Other variants of this method exist, such as Time Difference of Arrival (TDoA) and Angle of Arrival (AoA).

20

2.4.2 Received Strength Signal Indicator Widely used in the research community, the RSSI metric is considered as an essential parameter in many ranging techniques. The RSSI value is typically measured in dBm and can be directly exported from the wireless interface cards. The principle behind the ranging techniques based on RSSI is that signal strength decays as the distance between sender and receiver increases.

2.4.3 Link Quality Indicator Besides the RSSI and ToA metrics, a number of ranging techniques rely on the Link Quality Indicator (LQI). This metric estimates the incoming modulation of successful packets in the physical layer and it is widely used in ZigBee networks. A reason to use it, is that the LQI metric may be highly correlated with the distance (Yang, Yang, & Cheng, 2015).

2.5 Transmission states The radio unit of a sensor can be set in different transmission states and discovery protocols exploits this feature to synchronize communication and save energy. The different radio states can be configured based on the WSN transmission activity, but the costs of the transition between states are extra power consumption and latency (Dargie & Poellabauer, 2010). Figure 2.3 shows a generic schema of the latency and energy costs related to every transition state of a radio unit.

Figure 2.3 Usual radio states and its related energy and transition costs (Dargie & Poellabauer, 2010)

21

3 MACHINE LEARNING TECHNIQUES This new section begins by laying out the basic principles of two supervised learning techniques. The first one is the Random Forest Tree algorithm; this technique is useful to estimate the attributes’ importance of extensive datasets and will support the metric relevance analysis of this study. The second supervised learning technique is the Artificial Neural Network (ANN) algorithm based on back propagation and gradient descent. This technique will be implemented with the best distance predictors found by the Random Forest Tree analysis. In the end of this chapter, a brief review of the hierarchical clustering algorithm is given. This unsupervised technique will be useful to reduce the experiment parameters in the simulation step.

3.1 CART and Random Forest Tree The Induction of Decision Tree or IDE3 algorithm (Quinlan, 1986) is an iterative machine learning technique that predicts a target class by splitting data predictors according to a purity criterion. The goal is to produce the smallest trees with the purest nodes. This simple algorithm has suffered several variants such as the Classification and Regression Trees (CART) algorithm (Breiman, 1984). Unlike IDE3, CART is suitable for numerical classification and uses binary decision trees.

An interesting approach is to apply collections of CART decision trees on random samples of equal size with slight modifications such as deleting and replacing instances so that the trees produce their predictions and summarize results by a voting or averaging system. This process is known as bagging (Breiman, 1996) and does not improve significantly the prediction accuracy because choosing the best splitter between all predictors produces similar trees. Consequently, the Random Forest Tree technique (Breiman, 2001) introduces a mechanism which selects iteratively random subsets of predictors and computes their best splitters producing trees with high variances and therefore, a robust prediction model. To determine the number of predictors at each iteration, the author suggests applying the square root of the total number of predictors.

22

3.1.1 Variable importance Breiman (2001) found that “A forest of trees is impenetrable as far as simple interpretations of its mechanism go.” (p.18). However, the author explains that every time a node splits the data, the impurity criterion of the child nodes is less than the parent node. Thus, if one aggregates the decreased impurity of a particular feature over all trees, one can obtain a consistent estimation of its importance. Louppe, Wehenkel, Sutera, & Geurts, (2013) provide a comprehensible interpretation of the variable importance of any attribute Xm by adding up the weighted impurity decreases p(t)∆i(st,t) for all nodes t where Xm is used, averaged over all Nt trees in the forest. This can be expressed in Equation (1) (as cited in Breiman 2001, 2002)

(1)

Where p(t) is the proportion Nt/N of samples reaching t, v(st) is the variable used in split st, and i is a impurity measure such as the Gini index.

3.2 Artificial Neural Networks Artificial Neural Networks (McCulloch & Pitts, 1943) predict a target class of new instances by a previous supervised or unsupervised training process. Supervised learning models are commonly composed of an input layer that contains a vector with all the predictors, n hidden layers formed by m neurons or units, and an output layer which includes the target class or dependent variable. Full or partial connections are possible between neighboring layers or distant layers, and every connection has a weight value. The more layers and neurons the network has, the more patterns it is able to recognize.

3.2.1 Perceptron The perceptron (Rosenblatt, 1958) is the most basic form of a neural network which classifies patterns said to be linearly separable. Its goal is to classify a set of externally applied stimuli X1, … Xn into one of two classes C1 or C2 (Haykin, 2009). Figure 3.1 shows the principal components of a single layer perceptron.

23 Update weights

X1

Error

w1

...

...

Xn

wn

Σ

Figure 3.1 the single layer perceptron

The general learning process of this basic model is: 

computes the weighted sum



Apply an activation function to the result



Calculate the error between the predicted and the desired class



Update the weights by aggregating the previous weight value with the multiplication of the error and a learning rate.



Repeat until error tends to zero

Obviously, this basic model can be improved by adding more hidden layers or units, and the essential learning process becomes a feed forward learning process in which the activation function is applied to every hidden layer. This model is commonly known as the multilayer perceptron (Bishop, 1995; Fausett, 1994; Reed and Marks 1999). Debate continues about the right number of hidden layers and units. A rough estimation given by Hecht-Nielsen (1987) suggests, based on the Kolmogorov theorem (1957), that any continuous function could be represented by one hidden layer with 2n+1 units where n is the number of predictors in the model. However, an equilibrated model can be obtained by experimentation and validation.

A popular activation function used in back propagation is the sigmoid logistic function (Rojas, 2013). This function is convenient because its derivative can be expressed as a function of the sigmoid function itself (Konomi & Sacha, 2014) and its values vary between 0 and 1 as showed in Figure 3.2.

Figure 3.2 Sigmoid logistic activation function

24

3.2.2 Back propagation and gradient descent Neural networks usually learn in two iterative phases, the first one is a training feed-forward process and the second one is a learning process. The first step computes predictions from the input layers to the output layer, passing by all hidden units. The second phase computes the error prediction and propagates it backward through the network, adjusting the weights. This last stage is known as back propagation (Rumelhart, Hinton, & Williams, 1986) and can be executed in two ways: a batch learning process that uses the entire training set at each iteration, or an online training. In this thesis, the batch training approach is used. Back propagation involves a gradient descent optimization procedure that aims at minimizing the overall error contribution of every weight. In Figure 3.3, the gradient is depicted as the tangent of the error curve for a specific weight value W. The more the error increases, the more vertical the slope is.

Figure 3.3 an error curve based on the weight values of a single predictor J.

So the goal is to reach the global minimum of the error by changing the direction of the weight value W, which is achieved by applying the negative derivative of the activation function. In practice, a multilayer network with back propagation and gradient descent follows the chain rule principle to obtain the error contribution at each network layer (Werbos. 1974, 1994). After several iterations or training epochs, the error no longer decreases and the model converges.

3.2.3 Learning rate Figure 3.3 shows a simple situation in which the global minimum of the error is evident. However, a real learning process could present more complex scenarios. Consider the error function depicted in Figure 3.4.

25

Figure 3.4 Error curve with local and global minimums (Kriesel, 2007)

Here, depending on the initial weight value, there is a high probability that the model converges to a local minimum. A possible solution to this problem is applying a learning rate to the weight adjustment. This learning rate determines how quickly the weight changes direction. Notice, for example in Figure 3.4; that small learning rates produce minor changes in weight directions, and therefore, there is a high risk of blocking the model in a local minimum. Conversely, a large learning rate means a significant change in weight directions, which adds a risk that the model will never converges.

3.2.4 Model validation: Early Stopping Overfitting and overtraining are common issues in neural networks. Overfitting may occur in complex models, and overtraining may happen if the training process is not stopped at the right epoch. When overtraining happens, the learning curve of the validation set starts to increase even if the learning curve of the training set continues to decrease. At this point, it is considered that the network has sufficiently learned and training must stop. This process is known as the Early Stopping criteria (McCord-Nelson & Illingworth, 1991) and it is depicted in Figure 3.5.

Figure 3.5 Early stopping approach (Haykin, 2009)

26

The early stopping approach is a visual method that determines the optimal number of epochs in a training stage using a separate validation set. The general steps of early stopping are the following: 

Divide available data into training and validation sets.



Use small random, initial weight values.



Use a slow learning rate.



Compute the validation error rate during training.



Stop training when validation error rate starts increasing, even if training error rate continues to decrease.

3.3 Hierarchical Clustering A Hierarchical Clustering classification is a recursive process in which data is divided or agglomerated into similar groups. During the first iteration of an agglomerative clustering algorithm, each instance is considered as a cluster of its own, and subsequent clusters are merged by a similarity criterion (Maimon & Rokach, 2010). The results are represented in a tree schema called dendrogram in which horizontal lines represent the combined clusters, and vertical lines represent the distance at which they have been merged, as showed in Figure 3.6.

Figure 3.6 An example of a dendrogram representing the clusters of a simple dataset.

3.3.1 Similarity metrics In a bidimensional space, the simple Euclidian distance is applied, but a multidimensional space requires other distance metrics. In this study, the cosine metric is used. This metric

27

determines the distance between two multidimentional instances by the angle formed by their vectors. A vector represents all the features of an instance and the angle formed with another vector varies between 0 and 180 degrees, regardless of the dimensionality of the space (Rajaraman & Ullman, 2011). Figure 3.7 shows the cosine distance between instance A and B in a tridimentional space.

Figure 3.7 Cosine distance interpretation

3.3.2 Linkage methods The distance between instances can be directly computed by the cosine metric to form the first clusters at the first iteration. But a linkage criterion is required to determine the distance between the formed clusters in later iterations. The Unweighted Pair Group Method with Arithmetic Mean (UPGMA) linkage method (Sokal & Michener, 1958) considers the distance average of all pairs in which one member of the pair is from each of the clusters (Norusis & SPSS, 2011) as shown in Figure 3.8.

u

v

Figure 3.8 Interpretation of the UPGMA linkage method

3.3.3 Defining the optimal number of clusters Similar members of a cluster are grouped at small merging distances, but long distances are required to force the fusion between different clusters. Pham & Afify (2006) claims that the optimal number of clusters can be determined by setting a cut-off distance line where merging distances increase in a dendrogram. For instance, The dendrogram in Figure 3.6 shows short merging distances to form the green, red and cyan clusters, but a significant distance is required to merge the red and cyan clusters. Therefore, clusters above the cut-off line are highly dissimilar.

28

29

4 METHODOLOGY The research data of this thesis is an extensive WSN experimental dataset of 200 million sensor packets collected over six months under 50’000 configuration parameters1. For this reason, a preliminary task is to organize this data in a provider/consumer subsystem on a cloud platform. This subsystem extracts and prepares the datasets required for the machine learning applications.

The next step includes a global analysis of eight WSN metrics: RSSI, delays, LQI, ToA, noise, buffer overflow, the actual number of retransmissions, and queue size. The goal is to obtain a ranking of distance estimators and training an artificial neural network with the best predictors. A parallelizable version of the Random Forest Trees technique and an ANN model with back propagation and gradient descent were implemented on the cloud platform to accomplish these objectives.

A simulation prototype based on Omnet and Mixim evaluates the distance prediction accuracy of the neural network by modeling thirteen static nodes and a mobile data collector on a grid topology. The idea is to implement the ANN weight matrices in the application layer of the Mobile Data Collector, so that distance can be predicted by interpreting the incoming metrics of the sensing node packets. The simulation comprises six experiments based on three categories of speed and two network configurations obtained by a preliminary hierarchical clustering task. The prototype simulates the CC2420 Texas Instruments transceivers.

4.1 System deployment The cloud provider is Amazon Web Services (AWS). The specific cloud infrastructure is the Elastic Computing (EC2), the Relational Database Services (RDS), and the Simple Storage System (S3). The Subsystems are located in a Virtual Private Cloud (VPC) network with a public and private subnet. Only two instances have static IPs: the EC2 instance, namely Data_Consumer

and the RDS instance, namely Data_Provider. Figure 4.1 shows the

deployment diagram of the system.

1

Data provider: Songwei Fu, Yan Zhang, CRAWDAD dataset due/packet‑delivery (v. 2015‑04‑01), downloaded from http://crawdad.org/due/packet‑delivery/20150401, https://doi.org/10.15783/C7NP4Z, Apr 2015. This experimental data considers the distance attribute as a configuration parameter. However, in this thesis; the distance attribute is the target feature. For this reason, only 8064 network configuration instances are available.

30

EC2_Optimized Amazon Linux R3.2xlarge EC2

Public Subnet: Public 2

ANN_Model

Data_Consumer

Elastic IP

Amazon Linux t2.micro EC2 Data Consumer

Development

Client

Private Subnet: Private 1 Static IP

S3 Bucket Input / Output Folders Static IP

MS SQL Server Amazon RDS Data Provider

Static IP Elastic IP

Elastic IP

Data_Provider

Elastic IP

Elastic IP

AWS Feature_Cluster Amazon Linux t2.micro EC2 Hadoop Cluster FeatureSelection

VPC: WSN_Packets

Figure 4.1 Deployment diagram of the system.

The r3.2xlarge EC2 instance is an optimized memory server which runs the ANN_Model subsystem. Apache Spark is installed on the EC2 t2.micro Data_Consumer instance and a Hadoop cluster composed of four t2.micro EC2 instances is configured for the FeatureSelection

subsystem. Because data storage is ephemeral in EC2 instances, the system

uses an input and output folders in S3. In the next section, a deep overview of the subsystems is presented.

4.2 Software deployment The component diagram of the subsystems is depicted in Figure 4.2. The subsystem Data_Provider/Data_Consumer WSN_Conf_Set, FeatureSet,

is composed of four SQL Spark applications: DataPrep,

and ANN_set. DataPrep is a primary component which prepares,

organizes, cleans and reduces raw data contained in delay_Xm.txt and Meta_Xm_RunN.txt sources files. Notice that the “Xm” represents a distance experiment executed in N runs. DataPrep

outputs a vast structured data frame containing all packet instances which are saved

into the database component Data_Provider. In fact, this is the Microsoft SQL Server RDS instance mentioned in section 4.1.

31

Figure 4.2 Diagram components of the subsystems modules

32

The WSN_Conf_Set and FeatureSet spark applications are implemented in the Data_Consumer instance and consume data from Data_Provider. WSN_Conf_Set extracts and standardizes the network configuration parameters: period, packet length, queue size, retry delay and transmission level. The results are applied to the local Python application clusterApp. On the other side, the FeatureSet spark application selects and provides all the WSN metrics and the distance target for the FeatureSelection subsystem which runs the RandomForest Python application over the Hadoop cluster Feature_Cluster.

The FeatureSelection subsystem provides a ranking of distance estimators so that the ANN_Set

application, which also belongs to Data_Consumer subsystem; selects and normalizes

the most relevant attributes and divides results into three sets: the training set, the validation set, and the testing set. Finally, the ANN_Model subsystem trains, validates, and evaluates a neural network model and provides the weight matrices which are implemented in the local subsystem Omnet++Sim.

4.3 Data preparation Raw data contains information about the delivery performance of a line-of-sight WSN topology composed of two CC2420 transceivers developed by the Texas Instruments Company. Several distance experiments are performed over this simple topology in order to assess the communication performance in several environment conditions such as human or technology interference. The experiments separate sensors in six different distances (10, 15, 20, 25, 30, and 35 meters). Every experiment was performed in n runs of 300 packets.

On top of the WSN metrics explained in Chapter 2, this dataset contains the following additional metrics: Buffer Overflow, a boolean feature which indicates if a queue overflow is produced, Acknowledgement (ACK), a boolean feature that confirms a successful packet, Arrival_time, an integer feature which indicates the time of arrival in milliseconds; Tx_level, an integer feature that indicates the maximum level of energy of the radio transmission, Average_SNR, an integer feature which indicates the signal-to-noise ratio average of runs experiments; Noise, an integer feature which indicates any electrical or magnetic signal interference in dBm, and Delays, defined as the time between the packet generation at the sender and the packet reception at the receiver.

33

Raw data is organized in two categories. The first one consists of transmission packet delays and the second one consists of the transmission metrics or metadata. Both are related and share the same network configuration parameters. The first category is composed of six text files corresponding to the six distance experiments with all the runs executed in the test. Figure 4.3 shows the structure of a text file corresponding to this category.

Conf n

Conf 1

Network configuration Period

Period

Target

SNR

Packet

Queue

Max

Retry

TxP

Distance

Average

lenght

size

tries

delay

level

D

SNR

Packet

Queue

Max

Retry

TxP

Distance

Average

lenght

size

tries

delay

level

D

SNR

Run 1

Run n

Delay

Delay

Delay

Delay

Delay

Delay

Packet

Packet .

Packet

Packet

Packet .

Packet

1

….

300

1

….

300

Delay

Delay

Delay

Delay

Delay

Delay

Packet

Packet .

Packet

Packet

Packet .

Packet

1

….

300

1

….

300

Figure 4.3 Delay text file for a distance experiment D with n network configuration sets organized by runs

The second category is composed of sixty-two text files which represent the n runs for every distance experiment. Figure 4.4 shows a metadata source file of the Distance D and the run 1. Notice that only metadata of the first packet is depicted, but this structured is repeated 300 times for the 300 packets. Metadata Packet 1 A C K

RSSI

noise

LQI

Arrival time

A C K

RSSI

noise

LQI

Arrival time

Conf 1

Target

No. Exp.

No. Run

Period

Packet lenght

Queue size

Max tries

Retry delay

TxP level

Distance D

No. Packet

buffer overflow

actual queue size

Conf n

Network configuration

No. Exp.

No. Run

Period

Packet lenght

Queue size

Max tries

Retry delay

TxP level

Distance D

No. Packet

buffer overflow

actual queue size

Figure 4.4 Metadata source file of the Distance D and the Run 1

The data preparation strategy focuses on organizing delay and metadata files in a global dataset containing individual packet instances. To reach this goal eight dataframe operations are executed with apache SQL spark. These operations are detailed in the Annex 1.

4.3.1 Data transformation The WSN_Conf_Set application extracts and standardizes the clustering dataset with the Z-score method. The ANN_Set application extracts the training, validation, and testing set and scales the values to a 0-1 range. The normalization method is the min-max linear scaling method. This application also adds a -45 offset value over RSSI original data as recommended by the official data provider.

34

4.4 Variable importance analysis The FeatureSelection subsystem initializes a Hadoop cluster composed of four EC2 instances and runs a MapReduce version of the random forest tree algorithm provided by the sklearn.ensemble.RandomForestClassifier

Python library. The following sections provide

more details regarding the mapper and reducer functions.

4.4.1 Mapper function The FeatureSet application provides a dataset composed of the distance and the WSN metrics of all packets. This dataset is separated into two subsets: X subset which contains all the WSN metrics; and y subset which contains only the target distance. X = SubSet[:,0:9] y = SubSet[:,9]

The

next

step

is

to

define

sklearn.ensemble.RandomForestClassifier

the

parameter

configuration

of

the

function.

forest = RandomForestClassifier(n_estimators=250, Criterion = 'gini', max_depth = None, min_samples_split = 2, min_samples_leaf = 1, max_features = 'sqrt', min_impurity_split = 1e-07, random_state = 0)

This configuration indicates the number of tree estimators in each data subset with n_estimator.

The splitting criterion is the Gini index and splitting is forced until obtaining

pure nodes by setting the max_depth parameter to none. The min_sample_split parameter defines the minimum of instances to split a node, and min_sample_leaf specifies the minimum of instances needed to consider a node pure. Both parameters must be set to 2 and 1 respectively to match the previous parameter max_depth.

The parameter max_features and its value sqrt define the maximum number of random attributes by the root square of the total number of predictors. The configuration above also defines a grow-stopping limit with a small impurity threshold parameter namely min_impurity_split

and sets a random_state parameter to 0 to obtain the same results. This

model is finally implemented with the following line:

35 forest.fit(X, y)

Feature Importance When RandomForestClassifier runs the model, it also calculates the feature importance by the

impurity

criterion

RandomForestClassifier

described

in

the

equation

(1)

of

the

section

3.1.1.

initializes a zero array called feature_importances[i] that contains

all the features. The algorithm scans the splitting nodes, multiplies the error reduction with the number of samples below the splitting node and adds the result to the feature importance array. The mapper function converts this array into a key-value pair relation. The following code shows this approach which outputs the intermediate key-value pair as a stdout printing. importances = forest.feature_importances_ std = np.std([tree.feature_importances_ for tree in forest.estimators_], axis=0) indices = np.argsort(importances)[::-1] for f in range(X.shape[1]): print indices[f], "\t", importances[indices[f]]

4.4.2 Reducer function The reducer function only splits each incoming key-value pair into the feature index and its importance value and aggregates this value when the feature key is the same. The following lines show this process. feature, importance = line.split('\t', 1) importance = float(importance) . .. ... if current_feature == feature: current_importance += importance

Finally, the reducer function prints out the key-value pair results as follows: print '%s\t%s' % (current_feature, current_importance)

The mapper and the reducer entire functions are presented in Annex 2. The FeatureSelection subsystem outputs three relevant metrics: RSSI, ToA and Delays. The detailed results are given in the Chapter 5.

36

4.5 Neural Network Model The model construction is based on matrix operations in standard Python programming language. The idea is having complete access to weight values, which will be further translated in C++ matrices. The activation function is the logistic sigmoid function which is applied to the dot product of the weight and layers matrices. The learning approach is back propagation and gradient descent. The weight matrix adjustment is governed by a learning rate β.

The neural network model relies on an extensive dataset of 130 million of packet instances containing the arrival time, RSSI, and delay metrics. This data is separated into three groups. The first group comprises 60%, and it is destined to train the network; 20 % of data is reserved for validation, and the remaining data is used for evaluation purposes. The network is trained in batch mode.

4.5.1 Model construction The RSSI, arrival time and delay inputs are applied on two ANN models. As described by the Hetch-Nielsen (1987) theorem related to the optimal number of units, the first model is a simple one-hidden-layer architecture with seven units. Due to the reduced space, Figure 4.5 only shows the matrix operations of the feed forward training step on the right side of each matrix representation.

37

Delays

RSSI

Arrival Time

Input Matrix (I) nx3

Weight Matrix (W0) 3x7

Hidden Layer 1 (H1) nx7


Predictions matrix (P) nx1

W0 = I · W0

Activation function (W0)

W1 = H1 · W1

Activation function (W1)

Distance Target (O) nx1

Error Matrix (E) nx1

Error O-P

Figure 4.5 ANN model 1 with the feed forward training matrix operations

The second model presents a two-hidden-layer architecture of seven and six units respectively, and it is depicted in Figure 4.6 with the back propagation and gradient descent matrix operations. Notice that the error of each layer is obtained by multiplying the previous error layer with the derivative of the transfer function. This step also facilitates the weight

38

matrices update by multiplying a small learning rate β with the negative of the dot product of the error and the training values at each layer.

RSSI Input Matrix (I) nx3






Predictions matrix (P) nx1

Delays

Arrival time

Update W0 (UW0) += [β * -(H1.T · DH1)]

Delta H1 (DH1)

ƒ'sigmoid(H1) x EH1

Update W1 (UW1) += [β * -(H2.T · DH2)]

Error H1 (EH1) DP · W0.T

Delta H2 (DH2)

ƒ'sigmoid(H2) x EH2

Update W2 (UW2) += [β * -(P.T · DP)]

Error H2 (EH2) DP · W1.T

Delta prediction (DP)

ƒ'sigmoid(P) x E

Distance Target (O) nx1

Error Matrix (E) nx1

Figure 4.6 ANN Model 2 with the back propagation and gradient matrix operations

39

4.5.2 Model validation The two models were validated with the early stopping approach. The weight matrices were initialized with small random values, and the learning rate ß were set to 0.001 for each model. 100’000 epochs were executed in batch mode. The results of the validation step for the first and second model are depicted in Figure 4.7 and 4.8 respectively.

Figure 4.7 Validation of the ANN model 1

Figure 4.8 Validation of the ANN model 2

As observed in Figure 4.7, the validation set shows a slight increase of the error before the epoch 20’000. On the other hand, the model two depicted in Figure 4.8; shows that the error increases before the iteration 5’000. The stopping point is defined at 3’400 epochs for both models. At this point, model two obtains a training error of 0.2575 whereas model one presents a training error of 0.2639. This comparison suggests that model two is a more efficient model because it takes only 3’400 iterations to obtain a smaller training error than model one.

4.5.3 Evaluation Figure 4.9 shows the evaluation results of the ANN_Model application.

Figure 4.9 Evaluation results of the ANN model 2

40

The Error of the testing set is 0.2590 at epoch 3’400 (early stopping point). The weight matrices are extracted at this point and applied to the Customized_AppLayer_MDC.cc simulation class.

4.6 Discovery protocol simulation To evaluate the accuracy of the ANN_model application, a discovery routine prototype based on Omnet and Mixim is proposed. Omnet is a general purpose C++ simulator oriented to network computers, and Mixim is a specific framework for mobility and WSN simulations. This framework is suitable for this study because it contains a predefined module called CSMA and the decider 802154_TI_CC2420_Decider which emulates a CC2420 radio transceiver.

The simulation model is based on a hexagon topology composed of thirteen static sensing nodes and a MDC. According to the experimental data, the maximum transmission range of static nodes was configured to be 35 meters, and an overlapping critical area of 2 meters is defined between senders. Figure 4.10 shows a schema of the proposed topology with its implementation in Omnet.

Figure 4.10 Proposed hybrid topology and its implementation in Omnet++

41

In Omnet, all the components of a network are organized hierarchically; these components can be single or compound modules which are defined by the Omnet NED files (Network Definition files). These files describe all the parameters of a module and are linked with a C++ class. Six NED files are defined to build the simulation prototype: two Customized_AppLayers

to implement the particular discovery operations of static and MDC

nodes; a Customized_MacLayer and Customized_Nic which establish the link layer and are contained in the Customized_node; and finally a WSN-ME_network which implements the necessary components of a wireless network. Each of these modules inherits from Mixim NED modules and are described in the following inheritance diagrams:

CSMA

BaseNetwork

WirelessNode

Customized_MacLayer

WSN-ME_Network

Customized_Node

WirelessNic

WirelessNicBaterry IBaseApplLayer

Customized_Nic

Customized_AppLayer_Node

Customized_AppLayer_MDC

Figure 4.11 Inheritance diagram of the simulation NED components

4.7 Programming approach To obtain the RSSI and delay metrics, Omnet’s signals approach was used. This instrument is a simple notification mechanism which establishes a virtual communication between modules or nodes. It works as a subscriber-emitter system in which a module containing a variable of interest must register it and emit it in order for another subscriber module to be able to listen to it. Throughout this thesis, signals will be declared with the simSignal_ prefix whereas subscribers will be declared with the Listener_ prefix.

The

prototype

uses

Decider802154Narrow.cc

the and

Nic802154_TI_CC2420_Decider

the

which

implements

Customized_DeciderResult802154Narrow.cc

Customized_DeciderResult802154Narrow

the

classes.

registers and emits the SimSignal_rssiPhy signal

42

by its getRSSI() function. The prototype also contains a Customized_Phylayer.cc class which registers and emits a simSignal_Delay signal by using the getDelay() function of the extended Mixim class MappingUtils.cc. Finally, three more signals are registered and emitted in the Customized_AppLayer_MDC.cc class: the simSignal_neighborWakeUp and the simSignal_neighborSleep

for the discovery mechanism; and the simSignal_distanceStats

signal for later analysis. The Customized_PhyLayer.cc class diagram is available in Annex 3.

Two classes are defined at the application layer of the simulated devices: the customized_AppLayer_Node.cc

in the static nodes, and Customized_AppLayer_MDC.cc in the

MDC. Both are presented and explained in the next sections.

4.7.1 MDC application layer Figure 4.12 shows the principal components of the Customized_Applayer_MDC.cc class and its programming code is presented in Annex 4. simtime_t

1 BaseLayer

1 simsignal_t

*

1

#upperLayerIn : int #upperLayerOut : int #lowerLayerIn : int #lowerLayerOut : int #upperControlIn : int #upperControlIn : int #lowerControlIn : int #lowerControlOut : int +initialize() +finish() +handleLowerMsg() +handleSelfMsg()

Customized_AppLayer_MDC -simSignal_distanceStats -simSignal_NeighborWakeUp -simSignalNeighborSleep -Listener_rssiPhy -Listener_DelayPhy -simtime_Arrival_time +ANN_Discovery() 1

1 1

1

Listener_rssiPhy

Listener_DelayPhy

-rssiPhy : double

-DelayPhy : double

cListener +receiveSignal()

Figure 4.12 Class diagram of the MDC application layer

This class extends the Mixim class BaseLayer.cc and subscribes to two signals: Listener_RssiPhy

and Listener_DelayPhy. Notice that the Phy termination denotes the

43

origins of these signals. The ANN_Discovery() function contains the ANN weight matrices and estimates distances based on the normalized simtime_Arrival_time value of the incoming packet,

and

the

Listener_DelayPhy

normalized

values

obtained

from

the

Listener_RssiPhy

and

subscribers. ANN_Discovery() determines if the MDC position is in a

critical area and emits the simSignal_NeighborWakeUp and simsignal_NeighborSleep signal based on the source address of the last received packet. Once a neighbor wakes up, it continuously emits a customized packet with its address as the source and the MDC address as the destination. Figure 4.13 shows the Customized_packet.cc components.

simtime_t

1 1 ApplPkt -destAddr_var -srcAddr_var +getDestAddr() +getSrcAddr() +setDestAddr() +setSrcAddr()

Customized_PacketNode -PacketID : int +getPacketID() +setPacketID()

1 * SimpleAddress -L3Type : SimpleAddress

Figure 4.13 Class Diagram of the Customized Packet

4.7.2 Static node application layer Finally, the customized_AppLayer_Node.cc principal components are depicted in Figure 4.14 and its C++ code is provided in Annex 5. This class also extends the BaseLayer.cc Mixim class. It contains the Network configuration parameters: PacketInterArrivalTime that defines the interval in which the packets are sent; and PacketPayloadSize that configures the size of the packet. Listener_NeighborWakeUp and Listener_NeighborSleep are subscribers of their corresponding signals emitted at MDC application layer.

44

cListener +receiveSignal()

1

Listener_NeighborWakeUp

Listener_NeighborSleep

-LAddress_Neighbor

-LAddress_Neighbor

1

1

1 1

*

SimpleAddress -L3Type : SimpleAddress

Customized_AppLayer_Node 1

cMessage

*

1

1

simtime_t

*

1

-startSending : bool -packetid : int -packetInterArrivalTime : int -cMessage_timeout_PacketInterArrivalTime -PacketPayloadSize : int -LAddress_SourceAddr -LAddress_DestinationAddr -Listener_NeighborWakeUp -Listener_NeighborSleep +generatePacket() +forwardPacket() +wakeUp() +Sleep()

BaseLayer #upperLayerIn : int #upperLayerOut : int #lowerLayerIn : int #lowerLayerOut : int #upperControlIn : int #upperControlIn : int #lowerControlIn : int #lowerControlOut : int +initialize() +finish() +handleSelfMsg() +sendDown()

MacToPhyInterface +setRadioState()

Figure 4.14 Class diagram of the static node application layer

Four primary functions are found in customized_AppLayer_Node.cc: generatePacket() which generates a new packet when the message cMessage_timeout_PacketInterArrivalTime reach to zero, forwardPacket() that sends down the generated packet to the lower layers, and Wakeup()

and Sleep() functions which set the radio state in TX (transmitting) or SLEEP mode

by the setRadioState() function of the extended class MactoPhyInterface.cc.

4.8 Discovery process The discovery process is explained by a finite state machine in Figure 4.15. The MDC continuously receives packets from the static senders and puts the contiguous neighbors in sleep or transmitting mode according to the predicted distance.

45

SENDING NODE

Sleep

MDC simSignal_neighborSleep

Packet

Transmitting Tx

Receiving Rx

simSignal_neighborWakeUp

Figure 4.15 Finite state machine for the discovery process

Therefore, the discovery routine consists in classifying predicted distances into two categories: a single transmission range, and a critical transmission range. Only one node can transmit in its transmission range whereas two or three nodes may transmit in a critical range. Figure 4.16 shows the general view of the proposed discovery system.

Figure 4.16 transmission synchronization of the discovery prototype

In Figure 4.16, circles represent the transmission range of 35 meters; the blue point represents the MDC which establishes communication with the static nodes A, B, and C (central point of the circles). The MCD identifies each static node with its LAddress_SourceAddr attached in every sending packet. The left image shows an MDC moving in A’s single transmission range (dark green area). A critical area is depicted in the middle illustration: the mobile element has already detected a critical distance and emits two signals, a simSignal_neighborWakeUp to B and a simSignal_neighborSleep to A. The right image shows the final step in which the previous sender A is put in SLEEP mode, and next sender B starts to transmit.

46

4.9 Simulation experiments Experimental data contains 8’064 network configuration instances (excluding distances). This data consist of combinations of the period, packet length, queue size, maximal number of retransmissions, retry delay, and transmission power parameters. From this data, only 5’966 configurations provide successful packets. This data will be grouped into representative clusters by the subsystem Clustering in order to reduce the number of experiments.

The Clustering subsystem runs a local application ClusterApp which consists of an agglomerative hierarchical clustering algorithm based on the scipy.cluster.hierarhy Python library. Because instances are composed of six attributes, cosine distance is applied. The application obtains two clusters with the UPGMA linkage method, the complete clustering analysis is presented in Annex 6. The network configuration of Cluster 0 and Cluster 1 are depicted in Table 4.1 and table 4.2 respectively.

Table 4.1 Network configuration 1 Network configuration 1 (Cluster 0) Parameters (Dataset)

WSN_prototype parameters

Implemented in

Value

Unit

period

packetsInterArrivalTime

Customized_AppLayer_Node.cc 28

Ms

packet_length

packetPayloadSize

Customized_AppLayer_Node.cc 72

Bytes

queue_size

queueLength

CSMA.cc

12

Packets

retry_delay

macACKWaitDuration

CSMA.cc

44

Ms

Max_tries

macMaxFrameRetries

CSMA.cc

3

Packets

TxP_level

TxPower

CSMA.cc

19

mW

Acknowledgement

useMacAcks

CSMA.cc

true

TxP Maximum

MaxTXPower

Customized_PhyLayer.cc

19

mW

Table 4.2 Network configuration 2 Network configuration 2 (Cluster 1) Parameters (Dataset)

WSN_prototype parameters

Implemented in

Value

Unit

period

packetsInterArrivalTime

Customized_AppLayer_Node.cc

28

Ms

packet_length

packetPayloadSize

Customized_AppLayer_Node.cc

55

Bytes

queue_size

queueLength

CSMA.cc

51

Packets

retry_delay

macACKWaitDuration

CSMA.cc

44

Ms

Max_tries

macMaxFrameRetries

CSMA.cc

3

Packets

TxP_level

TxPower

CSMA.cc

19

mW

Acknowledgement

useMacAcks

CSMA.cc

true

TxP Maximum

MaxTXPower

Customized_PhyLayer.cc

19

mW

47

Notice that the Additional parameter useMacAcks is needed to enable acknowledgment frames for packet reception and MaxTXPower to set the maximum transmission range of 35 meters.

Each node has an incorporated mobility module which updates its position in an interval defined by the user. stationaryMobility

The simulation model uses two mobility modules: the

Mixim module for the static nodes, and the MassMobility Mixim module

for the MDC. The latter offers 2D mobility with random motions. MassMobility contains the following parameters: changeInterval which defines the frequency of changing trajectory, changeAngleBy

that determines the direction of the node, speed which sets the velocity of the

node, and updateInterval which describes the changing interval of the node's position. To define the motion behavior of the MDC, the changeInterval value is set to 5’000 seconds, changeAngleBy

value to forty-five degrees, and three speeds are defined: a minimum speed of

0.2 mps, a human preferred walking speed of 1.4 mps (Browning, Baker, Herron, & Kram, 2006), and a standard car speed of 18.8 mps (Mehar, Chandra, & Velmurugan, 2013). All configuration parameters are defined in an Omnet INI file under the specific network WSNME_Network.

Every experiment runs in batch mode.

48

49

5 RESULTS 5.1 Variable importance The results of the RandomForest.py application are presented in Figure 5.1.

Figure 5.1 Variable importance results

The most relevant predictors for distance estimation are ToA followed by the RSSI and delays. The LQI metric is positioned at fifth place after noise. All the other metric are considered irrelevant in this study.

5.2 Simulation results The true and predicted distances were collected in a time series plot to analysis the mobility behavior of the MDC. To evaluate the accuracy of the discovery prototype, the mean of the error predictions and the true distances are calculated in intervals on 5 meters and the standard deviation of the error prediction is also computed to define an interval of confidence of 95%. Finally, a complementary analysis of 5 additional experiments of different speeds is presented at the end of this section in order to determine the influence of speed in distance recognition. Because the experiments of the network configuration one and two are similar, only results of the configuration one are presented in this section. The results for the configuration two are available in Annex 7.

50

5.2.1 Mobility behavior of the MDC Figure 5.2 shows the mobility behavior of the MDC moving at 0.2 mps. The predicted distances are depicted as the red points and the blue points represent their true distances.

Figure 5.2 Mobility behavior of a MDC moving at 0.2 mps

The estimated distances decrease whenever the MDC gets close to the static nodes (distance 0) and reach high values when the MDC enters a critical area. Figure 5.2 shows that transmissions are absent when the MDC enters a critical transmission area. At this point, the MDC executes the discovery routine based on Omnet signals. Next Figure 5.3 presents an amplified view of transmissions occurred between the 2’650 and 3’150 seconds of the simulation run.

Figure 5.3 Mobility behavior of a MDC moving at 0.2 mps (Amplified view)

Figure 5.4 and Figure 5.5 show the mobility behavior of a MDC moving at 1.4 mps and 18.8 mps respectively.

51



5.2.2 Distance recognition accuracy Figure 5.6 shows the distance recognition accuracy of the MDC moving at 0.2 mps. Mean True Distance

Mean Error Prediction

35 30 25 Distance

20 15 10 5 0 (0 - 4.9999] -5

[5 - 9.9999]

[10 - 14.9999]

[15 - 19.9999]

[20 - 24.9999]

[25 - 29.9999]

Interval Samples

Figure 5.6 Distance recognition accuracy of a MDC moving at 0.2 mps

[30 - 35)

52

The mean of the error prediction tends to zero in all the intervals and the standard deviations vary on a range of 4.4 to 5.22 meters. This implies that the true value of the distance falls, with a confidence of 95% in the interval [estimated_distance – 10.45 meters, estimated_distance + 10.45 meters]. Figure 5.7 shows the distance recognition accuracy of the MDC moving at 1.4 mps. Mean True Distance


40 35 30 Distance

25 20 15 10 5 0 -5

(0 - 4.9999]

[5 - 9.9999]

[10 - 14.9999]

[15 - 19.9999]

[20 - 24.9999]

[25 - 29.9999]

[30 - 35)

Interval samples Figure 5.7 Distance recognition accuracy of a MDC moving at 1.4 mps

Similar the previous results, the mean of the error prediction tends to zero in all the intervals but the standard deviations vary on a range of 9.58 to 10.14 meters. This implies that the true value of the distance falls, with a confidence of 95% in the interval [estimated_distance – 20.29 meters, estimated_distance + 20.29 meters]. Finally, the results of a MDC moving at 18.8 mps are presented in the Figure 5.8.

Distance

Mean True Distance

50 45 40 35 30 25 20 15 10 5 0 -5

(0 - 4.9999]

[5 - 9.9999]

[10 - 14.9999]


[15 - 19.9999] [20 - 24.9999] Interval samples

[25 - 29.9999]

Figure 5.8 Distance recognition accuracy of a MDC moving at 18.8 mps

[30 - 35)

53

For a MDC moving at 18.8 mps, the mean of the error prediction tends to zero in all the intervals and the standard deviations vary on a range of 12.37 to 13.87 meters. This implies that the true value of the distance falls, with a confidence of 95% in the interval [estimated_distance – 27.75 meters, estimated_distance + 27.75 meters].

5.2.3 Error prediction analysis Figure 5.9 shows the standard deviation of the error prediction for every experiment. The analysis includes 5 additional experiments with a MDC moving at different velocities. The shape of the graph suggests that the increase of the standard deviation due to velocity augmentation is upper bound. 16 14 S.D. Error Prediction

12 10 8 6 4 2 0 0

5

10

15

20

25

Speed

Figure 5.9 Standard deviation of the error prediction at different velocities

30

54

55

6 DISCUSSION The variable importance analysis confirms the relevance of RSSI and ToA for distance estimation. The analysis also highlights the importance of latency which can be explained by the close relation with the ToA metric. Additionally, the results suggest that new ranging techniques should consider noise as a potential metric for distance estimation. On the other side, the results presented here appear to support the assumption that the LQI metric is an overstated metric because it ranks after the noise, a variable almost ignored by the research community.

Additionally, the results obtained from simulation experiments suggest that ranging techniques based on ToA, RSSI and Delay need more accurate and complex machine learning models. The discovery prototype is inaccurate compared to other techniques based on a single parameter. However, the prototype in this study could improve if additional metrics such as noise floor and LQI could be integrated into sophisticated ANN models.

Furthermore, one can also admit that acceleration of mobile data collectors plays an important role in distance estimation. In Figure 5.9, a huge variation of the deviation statistics for error estimations occurs between 0.2 and 1.4 mps. This suggests that one of the biggest challenges of discovery protocols in grid topologies is the random mobility; since a MDC could enter to a transmission range at random accelerations. Finally, results of the network configuration two suggest that network configuration parameters do not play an important role in ranging techniques.

CONCLUSION Six experiments were performed to assess the distance estimation accuracy of the discovery prototype based on ANN models. The prototype was built with Omnet signals which retrieve three distance predictors: the RSSI, ToA and delay metrics. The relevance of these metrics was estimated with a variable importance study of eight WSN metrics: delay, buffer overflow, arrival time, RSSI, retransmissions, queue size, noise floor, and LQI. This study shows that delay and ToA are related and that RSSI is an important predictor regardless of the controversy about its importance in the research community. It is observed that uncommon metrics such as noise could support the development of new multivariable ranging techniques and finally it is not recommended to use the LQI metric as a unique estimator.

56

Simulation experiments also highlight the importance of speed in discovery routines and ranging techniques in WSN-ME. Future works based on neural networks should consider additional WSN metrics and more sophisticated optimization methods than simple back propagation and gradient descent.

57

REFERENCES

Anastasi, G., Borgia, E., Conti, M., & Gregori, E. (2010). A Hybrid Adaptive Protocol for Reliable Data Delivery in WSNs with Multiple Mobile Sinks. The Computer Journal to appear. Anastasi, G., Conti, M., and Di Francesco, M. (2009). Reliable and energy-efficient data collection in sparse sensor networks with mobile elements. Performance Evaluation 66, 12 (December), 791- 810. Abdelzaher, T., Anokwa, Y., Boda, P., Burke, J., Estrin, D., Guibas, L., Kansal, A., Madden,S., and Reich, J. (2007). Mobiscopes for human spaces. IEEE Pervasive Computing 6, 2(April-June), 20 - 29. Bishop, C.M. (1995), Neural Networks for Pattern Recognition. Oxford: Oxford University Press. Breiman, L. (1984). Classification and regression trees. Wadsworth International Group. Breiman, L. (1996). Bagging Predictors. https://doi.org/10.1023/A:1018054314350 Breiman, L. (2001). Random Forests. https://doi.org/10.1023/A:1010933404324

Mach.

Machine

Learn.,

Learning,

24(2),

123–140.

45(1),

5–32.

Browning, R. C., Baker, E. A., Herron, J. A., & Kram, R. (2006). Effects of obesity and sex on the energetic cost and preferred speed of walking. Journal of Applied Physiology (Bethesda, Md.: 1985), 100(2), 390–398. https://doi.org/10.1152/japplphysiol.00767.2005 Dargie, W., & Poellabauer, C. (2010). Fundamentals of Wireless Sensor Networks: Theory and Practice. John Wiley & Sons. Denkovski, D., Mateska, A., Gavrilovska, L. (2010). Extension of the WSN Lifetime through Controlled Mobility. In Proceedings of the Seventh IEEE International Conference on Wireless On-Demand Network Systems and Services (WONS), Kranjska Gora, Slovenia, 3–5 February 2010, pp. 151–156. Di Francesco, M., Das, S. K., & Anastasi, G. (2011). Data collection in wireless sensor networks with mobile elements: A survey. ACM Transactions on Sensor Networks (TOSN), 8(1), 7.

58

Fausett, L. (1994), Fundamentals of Neural Networks. Englewood Cliffs, NJ: Prentice Hall. Haykin, S. S. (2009). Neural Networks and Learning Machines. Prentice Hall. Hecht-Nielsen, R. (1987). Kolmogorov’s Mapping Neural Network Existence Theorem, Proc. 1987 IEEE International Confirence on Neural Network, IEEE Press, New York, III(11-13). International Electrotechnical Commission (2014). Internet of Things: Wireless sensor networks. White paper. Jun, H., Zhao, W., Ammar, M., Zegura, E., and Lee, C. (2005). Trading latency for energy inwireless ad hoc networks using message ferrying. In Proceedings of the 1st IEEE Workshop on Pervasive Wireless Networking (PWN 2005). 220-225. Kansal, A., Somasundara, A., Jea, D., Srivastava, M., and Estrin, D. (2004). Intelligent fluid infrastructure for embedded networks. In Proceedings of the 2nd ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2004). 111- 124. Khan, M.I., Gansterer, W.N., Haring, G. (2007). Congestion avoidance and energy efficient routing protocol for wireless sensor networks with a mobile sink. J. Netw., 2, 42–49.

Kolmogorov, A.N. (1957). On the Representation of Continuous Functions of Many Variables by Superposition of Continuous Functions of One Variable and Addition. Doklady Akademii Nauk SSSR, 144, 679-681. American Mathematical Society Translation, 28, 55-59 [1963]. Kriesel, D. (2007). A http://www.dkriesel.com

Brief

Introduction

to

Neural

Networks.

available

at

Konomi, M., & Sacha, G. M. (2014). Influence of the learning method in the performance of feedforward neural networks when the activity of neurons is modified. ArXiv Preprint ArXiv:1404.5144. Retrieved from https://arxiv.org/abs/1404.5144 Louppe, G., Wehenkel, L., Sutera, A., & Geurts, P. (2013). Understanding variable importances in forests of randomized trees. In Advances in neural information processing systems (pp. 431–439). Maimon, O., & Rokach, L. (2010). Data Mining and Knowledge Discovery Handbook. Springer Science & Business Media. McCord-Nelson, M., & Illingworth, W. T. (1991). A Practical Guide to Neural Nets. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.

59

McCulloch, W.S., Pitts, W.H. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biophysics volume 5, 1943. Mehar, A., Chandra, S., & Velmurugan, S. (2013). Speed and acceleration characteristics of different types of vehicles on multi-lane highways. Trasporti Europei (Online), (55). Retrieved from http://www.istiee.org/te/papers/n55/et_2013_55_1_mehar.pdf Norusis, M., & SPSS, I. (2011). IBM SPSS Statistics 19 Advanced Statistical Procedures Companion. Prentice Hall. Pham, D.T., Afify, A.A. (2006). Engineering applications of clustering techniques, IntelligentProduction Machines and Systems, pp. 326-331. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Rajaraman, A., & Ullman, J. D. (2011). Mining of Massive Datasets. Cambridge University Press. Rao, J., Wu, T., and Biswas, S. (2008). Network-assisted sink navigation protocols for data harvesting in sensor networks. In Proceedings of the 2008 IEEE Conference on Wireless Communications and Networking (WCNC 2008). 2887-2892. Reed, R.D., and Marks, R.J, II (1999), Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, Cambridge, MA: The MIT Press, ISBN 0-262-18190-8. Rojas, R. (2013). Neural networks: a systematic introduction. Springer Science & Business Media. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. In D. E. Rumelhart, J. L. McClelland, & C. PDP Research Group (Eds.) (pp. 318–362). Cambridge, MA, USA: MIT Press. Retrieved from http://dl.acm.org/citation.cfm?id=104279.104293 Sharma, D., Verma, S., Sharma, K. (2013). Network Topologies in Wireless Sensor Networks: A Review. International journal of electronics & communication technology (April – June), 93 – 97. Sokal, R. R., & Michener, C. D. (1958). A Statistical Method for Evaluating Systematic Relationships. University of Kansas.

60

Tian, H., Shen, H., & Matsuzawa, T. (2005). Developing energy-efficient topologies and routing for wireless sensor networks. In IFIP International Conference on Network and Parallel Computing (pp. 461–469). Springer. Retrieved from http://link.springer.com/chapter/10.1007/11577188_66 Wang, Z. M., Basagni, S., Melachrinoudis, E., and Petrioli, C. (2005). Exploiting sink mobility for maximizing sensor networks lifetime. In Proceedings of the 38th Hawaii International Conference on System Sciences (HICSS 2005). Werbos, P. J. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University. Werbos, P. J. (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting,Wiley Interscience. Yang, T., Yang, Q., & Cheng, L. (2015). Experimental Study: A LQI-based Ranging Technique in ZigBee Sensor Networks. Int. J. Sen. Netw., 19(2), 130–138. https://doi.org/10.1504/IJSNET.2015.071634 Zhang, Z., Zhao, H., Zhu, J., & Li, D. (2010). Research on Wireless Sensor Networks Topology Models. Journal of Software Engineering and Applications, 03(12), 1167– 1171. https://doi.org/10.4236/jsea.2010.312137

Discovery Protocols based on Machine Learning

Discovery Protocols based on Machine Learning

Suggest Documents

Service Discovery Protocols for Constrained Machine ...

Service Discovery Protocols for Constrained Machine ...

Machine Learning for Materials Discovery

Machine Learning for Materials Discovery

Machine Learning for Materials Discovery

Learning Automaton Based On-Line Discovery and

Neighborhood-based Route Discovery Protocols ... - Semantic Scholar

COMPONENT-BASED MACHINE LEARNING

Meaning-Based Machine Learning

Scalable DDS Discovery Protocols Based on Bloom Filters

Robust Machine Learning-Based Correction on Automatic ...

Cheminformatic models based on machine learning ...

Robust Machine Learning-Based Correction on Automatic

Machine Learning Based On Big Data

Machine Comprehension Based on Learning to Rank

Consensus methods based on machine learning

Machine Learning for rock classification based on

A New Machine Learning Technique Based on

Predictive Classifiers Based on Machine Learning Methods

Obstacle Recognition Based on Machine Learning for

Database dependency discovery: a machine learning approach

Machine learning in chemoinformatics and drug discovery

Machine learning applications in anthropology: automated discovery ...

Why Discovery Learning, Problem-Based Learning, Experiential ...