ScienceDirect Empirical Model Development for ...

Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 36 (2014) 353 – 358

Complex Adaptive Systems, Publication 4 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 2014- Philadelphia, PA

Empirical Model Development for Message Delay and Drop in Wireless Sensor Networks Gursel Serpen* and Zhenning Gao Electrical Engineering and Computer Science, University of Toledo, Toledo, Ohio 43606, USA

Abstract Simulation of wireless sensor networks with very large number of motes poses significant challenges with respect to computational complexity. Application level code prototyping with reasonable accuracy and fidelity can however be accomplished through simulation that models only the effects of the wireless and distributed computations which materialize as delay and drop for the messages being exchanged among the motes. This study pursues that idea of empirical modelling of delay and drop and employs such a model to affect the reception times of wirelessly communicated messages. The delay and drop is therefore, modelled as random variables with probability distributions empirically approximated based on the data reported in the literature. The paper concludes with a case study that employs the proposed empirical delay and drop models for multilayer perceptron neural networks distributed across a wireless sensor network for a classification task on the Isolet dataset. © 2014 B.V. 2014 The TheAuthors. Authors.Published PublishedbybyElsevier Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of scientific committee of Missouri University of Science and Technology.

Peer-review under responsibility of scientific committee of Missouri University of Science and Technology Keywords: Wireless sensor network application; simulation; message packet drop and delay; artificial neural networks; empirical probability distribution model

1. Introduction Simulation of wireless sensor networks (WSN) may incur very high computation cost depending on the size of the sensor network in terms of the mote count and the detail level of simulation such as hardware emulation, bitlevel simulation, packet-level simulation or application-layer simulation. Determining the appropriate level of simulation is of paramount importance to manage the computational complexity for establishing the feasibility of a simulation. Simulators that are intended for bit or packet level emulation or simulation are unnecessarily detailed for prototyping distributed application-level code. Therefore, it is desirable to develop simulation frameworks or tools for distributed applications to maintain the balance between good accuracy and reasonable computation cost.

* Corresponding author. Tel.: 1-419-530-8158; fax: 1-419-530-8140. E-mail address: gursel dot serpen at utoledo dot edu

1877-0509 © 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Peer-review under responsibility of scientific committee of Missouri University of Science and Technology doi:10.1016/j.procs.2014.09.005

354

Gursel Serpen and Zhenning Gao / Procedia Computer Science 36 (2014) 353 – 358

WSN simulators can be classified into three major categories based on the level of complexity: bit level, packet level and algorithm level. As the complexity goes up such as for a bit-level emulation, the time and memory consumption of the simulation grows. It is desirable to select the level of simulation based on the rigor requirements of the experiment. For instance, a timing-sensitive MAC protocol would probably need a bit level simulation while an algorithm level simulation is sufficient to test the prototype developed for an agriculture management application. Bit-level simulators model the CPU execution at the level of instructions or even cycles, and they are often regarded as emulators. Examples are TOSSIM [1], which is a both bit and package level simulator for detailed simulation of TinyOS-based motes, and Avrora [2] which is another bit-level simulator that is open source and built using the Java programming language. Packet level simulators implement the data link and physical layers in a typical OSI network stack. The most widely used simulator is ns-2 [3], which is an object-oriented discrete event network simulator built in C++. A second example OPNET [4] is a commercial simulator, which provides a simulation environment for a variety of networking environments. Algorithm level simulators focus on the logic, data structure and presentation of algorithms. They do not consider detailed communication models. For instance, Shawn [5] is a simulator implemented in C++, which has its own application development model or framework based on so-called processors. The development of Shawn is predicated on the premise that there is no (or barely any) difference between a complete simulation of the physical environment (or lower-level networking protocols) and the alternative approach of simply using well-chosen random distributions on message delay and drop for algorithm design on a higher level, such as localization algorithms. Shawn simulates the effects caused by a phenomenon instead of the phenomenon itself. For example, instead of simulating a complete MAC layer including the radio propagation model, its effects (packet drop and delay) are modeled by Shawn. Therefore, the simulation time and the computation effort through Shawn is significantly reduced compared to those of other simulators. Although Shawn’s simulation philosophy is appropriate for application-level code development, being a generic simulator to accommodate a comprehensive set of simulation cases for a large selection of applications, it suffers to some degree from overhead due to its generic nature. Much of this overhead can be eliminated if the philosophy of simulating the “effects rather than the phenomenon itself” is implemented directly into the algorithm of the application that needs to be simulated for distributed computation on a wireless sensor network. The proposed approach that trades generic nature of Shawn with more efficient domain-specific implementation would mainly entail the following steps. Any given application algorithm that is to be implemented through distributed computation would first be partitioned into subparts as in functional decomposition, and the communication interface among these subparts would be identified. Consequently, the messaging requirements are established and delay or drop is injected into the communication pathways through which packets carry those messages. Accordingly, for the purposes of verification and validation of the application layer code, modeling of effects of wireless communication as “delay and drop” is promising to deliver the appropriate level of accuracy and computational cost for simulations. 2. Reduced Complexity Simulator Model This section presents the empirical models for message packet delay and drop phenomena in wireless sensor networks (WSN). The probability of packet drop or delay during wireless transmission in WSNs is highly dependent on the specific implementation of the network and its protocol stack. There are many factors at play, such as the topology of a network, routing and MAC protocols, network traffic load, etc. It is not desirable to have the model for the probability distribution for drop or delay limited to a certain scenario (using certain protocols, set for a number of nodes, or set for a topology, etc.) since the results of such a study would not be applicable in general terms. The model to be developed instead should be generalized enough to be applicable for the widest variety of WSN realizations, implementations and applications possible. One readily available option to develop or formulate a model for packet delay and drop in general terms is to leverage the empirical data reported in the literature, which is the venue pursued in our study. We conducted a survey to compile the empirical data of delay and drop reported in the literature [6]. In the process, we studied the simulation scenarios and compiled a record of the simulation settings and results. The simulation settings included routing protocols, MAC protocols, simulator type, number of motes, field size, radio range and other settings (traffic, source count, dead node count etc.). The simulations of interest reported the delivery rate (which is related to the probability of drop) and delay, which were extracted from tables and figures in the surveyed literature.

355


The packet delivery ratio or rate that was recorded in each study [6] is considered as the main variable of interest. Denoting the packet delivery ratio as p delivery , the probability of drop, p drop , can be calculated through p drop =1-p delivery . Specific values for the packet delivery ratio versus node count for a number of WSN topologies and protocol stack implementations, which were used as the data to build the empirical model for the probability of drop, are retrieved from the same studies cited herein. The data points are chosen based on the following specifications: x The node count is one of the primary independent variables, which means the data is collected for different node count values. x The density of node distribution within the WSN topology will stay “approximately” the same although the node count may vary. This means that the area of deployment for the network or the transmission range should change to keep the node distribution density the same. x Other factors such as the changing network traffic load or the static or time-varying percentage of dead nodes could not be considered due to lack of sufficient empirical data, relevant simulations or experiments. Establishing the above specifications is intended to ensure that packet drop probability is fundamentally affected by the number of transmission hops only, which is assumed to approximate the distance between the sender and receiver mote pair under the assumption that the motes are uniformly distributed across the deployment field. The relationship between the probability of drop and mote count for a variety of routing protocols is reported in a number of studies in the literature [6]. The routing protocols included QoS [9], Speed [9], GBR [7], LAR [10], LBAR [10], AODVjr [10], BVR [8], DD [11], and EAR [7]. Denoting the node count in a WSN as n nodes , analysis of empirical data shows that p delivery decreases when the value of n nodes increases. The relationship appears to be linear in general. Since these data are due to specific experiments, in order to generalize, it may not be a good idea to make the model fit the data precisely. Therefore, the linear regression (versus a polynomial) for fitting these data points is a reasonable option. Then the resultant empirical model is given by pdrop 1 pdelivery 1 [( E 0 E1 u nnodes ) / 100] , where coefficients E 0 and E 1 are real numbers for the linear model and n nodes ࣅ>@. The coefficients E 0 and E 1 calculated for each routing protocol case are shown in Table 1. Table 1. Linear regression model coefficient values for p drop Routing protocol

E0 E1

EAR 100.82 -0.0107

GBR 94.50 -0.1130

BVR 94.44 -0.0760

QoS 97.00 -0.0980

Speed 97.40 -0.0840

LBAR 95.79 -0.0198

LAR 92.57 -0.0154

AODVjr 90.57 -0.0154

DD 89.60 -0.0440

The number of hops can be used as the primary factor affecting the probability of drop. For a two-dimensional deployment topology for a WSN, let n hops denote the average hop count between a source and a destination mote pair. Defining the p drop in terms of n hops is of interest. Given that we know the value for n nodes , which is the number of motes in the WSN, a relationship between n nodes and n hops needs to be derived for a given specific deployment topology. For instance, n nodes nodes may be assumed to be uniformly randomly distributed across a square deployment area. Consequently, the number of motes along any edges will be approximately ¥n nodes . Assume the sink mote is located at the center, while the source node is close to one of the corners. Average hop count for any given message, n hops , can be approximated by the length of the diagonal in terms of number of hops divided by two. Then the relationship between n nodes and n hops is given by n nodes =2(n hops )2. In the worst case for a source and sink mote pair where the motes are located at the end points of a given diagonal, the relationship between these two variables becomes n nodes =(n hops )2/2. Consequently, the hop count values are bounded as follows: 1n hops ¥ n nodes ). The square topology assumed for the above analysis is a reasonable approximation for many of the deployment realizations. If necessary, other topologies can also be readily analyzed following a similar approach. In more general terms, the relationship between n nodes and n hops can be represented as n nodes =W×n hops , where W is the coefficient whose value is positive and will vary based on a number of WSN-related parameter settings including the shape of the topology and the density of mote deployment. In the linear regression curves obtained for each routing protocol earlier, the coefficients E 0 and E 1 and the parameter n nodes values are substituted to yield the empirical models shown in Table 1. The calculations of W are done based on a specific topology implemented in the literature. When any of these models is employed in a simulation study, specific E 0 and E 1 values can be generated based on how “similar” the routing protocol to one of the given in Table 1.

356


Delay is an inherent property of WSN operation with respect to wireless communications. As any literature survey will indicate, the length of delay varies for different implementations (such as variations in number of nodes, routing protocol, MAC protocol, packet length, traffic load etc.). Figure 1 shows the histogram for the delay based on the literature survey data. In a real scenario where one or more neurons are embedded into a given mote, the exchange of neuron outputs among motes will be subject to certain delay that is inherent in wireless communications. This delay, which will dictate the duration of a waiting period by a given neuron for its inputs to arrive from other neurons on other motes is not a fixed value but rather a random variable. The delay-induced wait time will be denoted as t wait . Note that t wait is both application dependent and network dependent: its value was found to vary from 10 ms to 3000 ms per the literature survey. For a specific network, the delay between for a pair of motes varies substantially from one pair to another, and even for the same pair of motes the delay variance is significant. Additionally, the maximum delay could be much larger than the mean delay. In simulating a neural network embedded across motes of a wireless sensor network, the t wait is set according to the mean delay value and the specific network topology to make sure that a good number of inputs successfully arrive for any given neuron to be able to calculate its own output. Per the literature, a specific delay distribution is highly dependent on many factors such as the MAC protocol, traffic, queue capacity, channel quality, back-off time setting in MAC protocol, etc. [12,13,14,15, and 16]. It is impossible to get a highly accurate model of delay distribution considering so many factors play a role in affecting its value. A reasonably good but approximate model however can be formulated by using the Gaussian distribution which has been used in the literature to model the delay distribution [13, 14], and has also been shown to be relevant in other literature studies [16, 17 and 18]. Empirical evidence suggests that the delay distribution is truncated and heavytailed [16, 18]. Consequently, a truncated Figure 1. Histogram of Delay for Empirical Data Reported in Literature Gaussian distribution for modeling the delay variance can be employed. The truncated Gaussian distribution is the probability distribution of a normally distributed random variable whose value is bounded. Suppose xaN(P,V2) has a normal distribution and lies within a range of (a, b), then x conditional on a