The automatic evolution of distributed controllers to ...

2 downloads 16646 Views 2MB Size Report
The automatic evolution of distributed controllers to configure sensor network operation 3. FIGURE 2: The ...... two or more nodes attempt to send their messages.
The automatic evolution of distributed controllers to configure sensor network operation Andrew Markham and Niki Trigoni Oxford University Computing Laboratory Email: [email protected] Tuning the parameters that control the operation of a wireless sensor network, such as sampling rate, is not a simple task. This is partly due to the distributed nature of the problem, but is also a result of the time-varying dynamics that a network experiences. Inspired by the way in which cells alter their behaviour in response to diffused protein concentrations, an abstract representation, termed a discrete Gene Regulatory Network (dGRN), is introduced. Each node runs an identical dGRN controller which controls node activity and interaction. The controllers are authored automatically using an evolutionary algorithm. The communication that occurs between nodes is neither specified nor designed, but emerges naturally. As a particular example, we illustrate that our approach can generate effective strategies for nodes to cooperatively track a moving target. The obtained strategies vary according to the user’s accuracy requirements and the speed of the target, and are similar to those which would be expected from a network engineer. We also present results from our proof-of-concept dGRN implementation on T-Mote Sky nodes. Our approach takes high level user application requirements and from these, automatically generates distributed parameter tuning algorithms. The dGRN framework thus greatly reduces the amount of effort involved in adjusting a sensor network’s operation. Keywords: Configuration; Genetic Algorithm; Target Tracking; Wireless Sensor Network; Distributed Control Received xxx; accepted xxx

1.

INTRODUCTION

The functionality of a sensor network is determined by a number of parameters acting at various layers. The radio duty cycle at the medium access and control (MAC) layer, the frequency of neighbour discovery beacons at the network layer, the sampling rate at the application layer, all affect the ability of a network to meet user-specified requirements for accuracy, latency and energy usage. Tuning these parameters is a complex task, especially if one tries to exploit opportunities for cross layer optimization. Expert knowledge is needed to design code that exhibits the correct amount of communication between nodes such that they can collaboratively configure their parameters. The tuning problem is compounded by the need for this code to be adaptive to changes in network conditions, and to dynamics in the sensed phenomenom. This paper presents a novel method of automatically designing a controller which configures node parameters to satisfy the requisite application objectives. Consider, for example, a target tracking application, The Computer Journal,

where fixed nodes are equipped with sensors that can register the presence of the target and its speed. The user of the application specifies certain constraints that must be met, for example, that the target must be monitored by at least one sensor for 90% of the time. The objective is to meet the constraints by minimizing the average sampling rate, i.e. the frequency at which they power their sensors to search for a target. According to the user, each node should tune its sampling rate (actuator variable), by taking into account its remaining energy and the most recent reading about target presence and speed (sensor variables). The question is: how is this best done? A centralized optimization approach, which finds the best configuration parameters for each node, can be utilized. However, centralized solutions present a single point of failure and also incur a high communication overhead. The alternative is for an expert to design a distributed algorithm for tuning parameters based on each node’s knowledge of its local state and interactions with neighbours. Designing a distributed algorithm that achieves good global performance is a non-trivial Vol. ??,

No. ??,

????

2

Markham and Trigoni

task that generally requires expert knowledge and takes many design-test-rewrite cycles. However, the distributed method is preferable to the centralized method in that it is scalable and can dynamically adapt to changes in the sensor environment. We take a step back from this and examine what is actually required in Wireless Sensor Network (WSN) decision making. In essence, we require a controller (black box module) that consumes sensor variables (which can also include inputs from peers) and based on these and its current state, decides how to alter the values of actuator variables. In order to simplify the task of the network engineer, it would be desirable to design the structure of this controller automatically to meet global goals. To accomplish this, we turn to Nature for inspiration. Consider a simple multicellular organism. It is able to perform tasks such as climbing a chemical gradient (chemotaxis) or reacting to external stimuli such as the presence of light. The organism is able to coordinate the actions of multiple cells without requiring a central controller, and it does so by using protein exchange. Proteins can be regarded as chemical messengers which act within the cell and can also diffuse to neighbouring cells. Proteins are produced by genes in the nucleus of the cell, and genes can be viewed as rules that take as input the current protein concentrations and control the further production or suppression of proteins in a time dynamic manner. This mechanism is referred to as a Gene Regulatory Network (GRN), as genes can be viewed as rules that regulate cell behavior through protein production [1, 2]. By means of protein diffusion, a cell can influence its neighbour’s behaviour, as shown in Fig. 1. The precise behaviour of these extremely complex networks has been created over many millennia by Nature’s search algorithm, evolution. We take an abstraction of the protein regulatory mechanism and use this to control the actions undertaken by each node in a sensor network. Each node is equivalent to a cell and proteins are diffused between neighbouring nodes to communicate state. However, rather than representing protein concentrations as continuous variables, we restrict them to a set of discrete values. This simplifies the rule structure, the computation, and also reduces the communication overhead. We thus refer to our approach as a discrete Gene Regulatory Network (dGRN) to emphasize the fact that protein concentrations are discrete in value. Every node runs identical dGRN rules that take as input sensor variables (such as target speed) and control the values of actuator variables (such as sampling rate). To design dGRN rules, we again use a nature-inspired approach, namely an evolutionary algorithm. In particular, the contributions of this paper are as follows: •

We propose a novel paradigm for controlling sensor network operation inspired by Gene Regulatory The Computer Journal,

FIGURE 1: Highly simplified diagram showing protein exchange within an organism. The proteins are represented by the labelled boxes, and the number of these is indicative of the concentration. Assume that an exogenous protein, X, is applied to cell 3. Based on this input, the cell increases its production of protein B. It also diffuses protein C to its neighbours. These cells can respond differently to incoming protein C, as cell 2 has a lower concentration of protein A compared with cell 1.









Networks. We illustrate how to automate the process of network configuration, starting from random dGRN rules and refining them through evolution. We show that communication is an emergent property of the evolved solutions and does not need to be specified by human experts. We demonstrate the feasibility of the dGRN methodology in a target tracking application. We show that it is possible to automatically generate parameter tuning code that exhibits properties desirable by human experts. We evaluate the resulting code in a simulation environment, and in a real environment with eight T-Mote Sky nodes tracking a light-emitting target.

The paper is structured as follows: Sec. 2 presents our dGRN framework and explains how code is randomly generated and evolved. Sec. 3 shows how dGRNs can be trained to track a moving target through a sensor network. Sec. 4 evaluates the resulting dGRN solution in a simulation environment and a real sensor testbed. Sec. 5 discusses related work and Sec. 6 concludes the paper and proposes directions for future work. 2.

DGRN METHODOLOGY

In this section, we describe in detail the process of generating and evolving dGRN rules that control parameters of node operation. dGRNs can be applied to many different tuning problems, ranging from boundary detection to routing control. However, throughout this paper we use the target tracking problem introduced in Sec. 1 as an example application scenario. A pictorial representation of the phases involved in generating Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation 3

FIGURE 2: The proposed approach to the node configuration problem, evolving dGRN controllers. In Phase 1, the user defines the application task and turns it into a fitness function. In Phase 2, a large number of random candidate solutions are created, which are evaluated by running simulations. In Phase 3, an evolutionary algorithm creates new dGRN controllers to minimize the fitness function score. Lastly, the network uses the best dGRN controller to self-configure and adapt to changing conditions, as shown in Phase 4.

dGRN code is shown in Fig. 2. These phases are outlined below: Phase 1: In the first phase, high level application requirements are defined and node variables are associated with inputs and outputs of the dGRN controller. A fitness function is also specified which measures how well a particular dGRN controller performs at the specific task. Phases 2 through 4 are then automatically conducted by the dGRN evolutionary process. Phase 2: A number of random dGRN controllers is generated. Each controller consists of a number of rules which specify how protein concentrations are updated in the next time step, based on current protein concentrations and sensory input. Phase 3: Each controller is executed, and its overall performance measured against the fitness function specified in Phase 1. An evolutionary algorithm is used to create new dGRN controllers based on controllers which performed well in the last iteration. Phase 4: Once an acceptable controller has been found, it is then disseminated into the network where it runs on each node, dynamically adapting to changing conditions. These phases are now discussed in detail. 2.1.

Phase 1: High level problem definition

Firstly, the user needs to decide how to interface the dGRN controller to the node itself. There are three different types of proteins in a dGRN controller: 1) sensor proteins which bind to a sensor variable (such as the speed or presence of a target) generated by the node, 2) actuator proteins which control actuator variables The Computer Journal,

(such as the sampling rate) of the node, 3) hidden proteins which reflect the dGRN’s current state. Note that a dGRN controller can have multiple inputs (sensor proteins) and outputs (actuator proteins). In essence, the user defines the variables that should be controlled, and which variables should affect the parameter tuning. Once suitable node variables have been chosen, they need to be translated or mapped from their real world values to discrete protein concentrations. An example of mappings for the target tracking scenario is provided in Fig. 3. Target presence and speed, which are captured by a single sensor variable, are mapped to protein P 4. For instance, if the node detects a target which is moving faster than 1 m/s, it sets P 4 to 3. Similarly, protein P 1 concentrations are mapped to sampling rate values. For instance, if P 1 = 2, the node’s sampling rate (i.e. the probability of it taking a sensor reading in a particular time step) is set to 50%. The coarseness of the mapping depends on the base, B used for the dGRN, which is chosen by the user. For example, a base B = 2 results in two possible protein concentrations (0,1), whereas a base B = 4 results in four values (0,1,2,3). Other parameters that a user needs to specify are the number of hidden proteins in the dGRN controller, the maximum length of update rules, as well as variables related to the evolutionary process, such as the population size. The role of these parameters is discussed in Phase 3. In the next step the user constructs network scenarios that can be used to examine the performance of various dGRN controllers in simulation. These scenarios must be sufficiently realistic to separate slight differences in dGRN performance, but must be relatively quick to execute as they are the most time-consuming portion of the evolutionary process. In our example problem, a suitable scenario might be to execute the dGRN controller on a wireless network with N = 64 nodes for T = 1000 units of time with a specific target mobility pattern. The specification of the fitness function is the last step in this phase. The fitness function returns a value which indicates the relative performance of a particular dGRN controller at the specified task. In the target tracking example, it is required that a target must be detected by at least one node for 90% or more of the simulation time (the constraint) and to do so using the smallest average sampling rate (the objective). In general, we assume that the constraint(s) must be met and the objective(s) minimized. Thus it makes sense in this example to use a piecewise defined fitness function to account for the two conditions1 . If a particular dGRN’s accuracy, Qa meets or exceeds the constraint of 90%, we use the mean sampling rate 1 Note that for simplicity, we do not include the energetic cost of communication in the fitness function as we assume the energy used in sampling is large (e.g. the sensor is a camera). However, it is a simple matter to modify the fitness function to reflect the total energy used by each node.

Vol. ??,

No. ??,

????

4

Markham and Trigoni

FIGURE 3: Example of sensor and actuator variable mappings. The sensor values are mapped to protein concentrations of protein P4. The dGRN controls the concentration of protein P1 which is mapped to the node’s sampling rate. The base used in this example is B = 4.

as our metric, where Si is the average sampling rate of node i. Conversely, if the tracking accuracy is not met, a positive offset (corresponding to all nodes having a 100% sampling rate) is added to the difference between the actual accuracy and the target accuracy. This helps to differentiate between solutions which achieve, for example, an 89% tracking accuracy and a 0% tracking accuracy. Thus, we can write the piecewise fitness function as:  1 PN : Qa ≥ 90% 1 (Si ) N (1) F = (100%) + (90% − Qa ) : Qa < 90% In summary in Phase 1, the user specifies the sensor and actuator variables and their mappings, the network test scenarios and lastly the fitness function which measures the performance of dGRN controllers. Note that the user does not need to specify how nodes communicate with each other. The subsequent phases are then undertaken automatically by the dGRN evolutionary process. 2.2.

Phase 2: Generation of random dGRN controllers

The dGRN controller updates each of its protein concentrations using rules that define how the new value of a protein (at time step t + 1) is computed based on the values of proteins at the prior step t. Note that the timebase used for the dGRN depends on the application requirements and can vary between seconds and hours. In this section we explain how the protein concentrations are updated, how concentrations from neighbouring nodes are incorporated, and how initial dGRN controllers are randomly generated. An example of a protein update rule is: P1∗ = P1 o1 P2 , which denotes that the new value of protein P1 is derived by combining the current values of P1 and P2 with a binary operator2 o1 . Each protein (with 2 Note that we refer to a binary operator as a function which takes two values as operands and returns a single value as a result. A binary operator is not the same in this context as a Boolean operator (AND, NOT, OR etc).

The Computer Journal,

the exception of sensor proteins, as their values are set directly by the node) has an associated update rule and all protein concentrations are updated simultaneously. Similarly to biological systems, where proteins from one cell can be diffused to a neighbouring cell, we also allow a protein update rule to take as input the concentration of proteins from neighbouring nodes. We use the notation P¯i to refer to the concentration of Pi at a neighbour node. Given that nodes may have multiple (or no) neighbours, a method is required to aggregate or merge the multiple foreign protein concentrations together. In general, we define the expansion hmj , P¯i i = P¯i (N1 ) mj P¯i (N2 ) mj . . . P¯i (N` ), where mj is a binary operator used to aggregate or merge the values of Pi at neighbour nodes N1 , . . . , N` . For example, a rule for updating protein P1 in the local node combines the local value of P2 with aggregated values of P3 from neighbour nodes: P1∗ = P2 o7 hm4 , P¯3 i. Thus, in general, a rule has the form: Pi∗ = hm, P i o hm, P i o . . . hm, P i,

(2)

where P is a protein species drawn from the protein alphabet. Note that the merge operator has no effect on local proteins, i.e. hm, Pi i = Pi . An example protein alphabet, corresponding to the mapping discussed in Phase 1, is shown in Fig. 4. In this alphabet there is one sensor protein, P4 (target presence), one actuator protein, P1 (sampling rate), and two hidden proteins P2 and P3 . We introduce two control proteins which allow for the creation of varying length rules from a fixed length rule. The JUMP, J protein instructs the dGRN controller to skip to the next protein in the rule and the TERMINATE, T protein instructs the dGRN controller to end the evaluation of the current rule and move to the next. We now turn our attention to discussing the operators that are used to combine the protein concentrations. An operator is a function which acts on two protein species (operands) to produce a resultant concentration. Typical operators for a base-4 (B = 4) system are shown in Fig. 4. Note that each operator is a 4 × 4 matrix that essentially acts as a lookup table. In general each operator will have dimensions of B ×B and in total 2 there are B B unique operators. The concentration of the first protein operand is the row index of the matrix, and the concentration of the second protein operand is the column index. The resulting concentration is the number which is found at the intersection of the row and column indices. The flexibility of the dGRN approach comes from the fact that we automatically create both the rules and the operators. However, to prevent combinatorial explosion of the search space for large bases (e.g. B > 4), we evolve parameters of a basis function, instead of the individual elements of each operator directly. An example basis function, f (Pi , Pj ) is a linear combination of the two Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation 5

FIGURE 5: Example showing how numeric strings are parsed to produce protein update rules.

FIGURE 4: Lookup tables for protein and operator decoding. Note that the tables do not need to be of the same length. For each table, the index is a numeric value which decodes to a particular symbolic representation. Observe that, in the operator table, each operator is specified. In this case, the operators are base B = 4

proteins Pi and Pj   B − 1 : (aPi + bPj + c) > B − 1 0 : (aPi + bPj + c) < 0 f (Pi , Pj ) =  floor(aPi + bPj + c) : otherwise

(3)

where parameters a, b, c control the mapping between the operand proteins and the resultant concentration, and floor() rounds the resulting value down to the nearest integer. Note that only three parameters need to be evolved for the linear basis function, as opposed to B 2 if the operator matrix is evolved directly. Using a basis function allows for generalization and also makes the search space independent of the base used for the dGRN controller. The drawback of using a linear basis function is that arbitrarily shaped operators (such as max() or sin()) cannot be constructed directly. More general basis functions (such as quadratic, sigmoidal or fuzzy) can be used, but this increases the size of the search space as these functions typically have more controlling parameters. In this paper, the simple linear basis function will be used. Lastly in this section we examine how dGRN controllers can be randomly generated, and later evolved. Notice that in Fig. 4, each item in the alphabet has an associated numeric index. Also, recall from Eq. 2 that a rule has a fixed structure consisting of merge, protein and operator triplets. Thus, a random string of numbers can be translated, with the aid of the lookup tables, into a rule. Numbers are either decoded into protein symbols or operator symbols, based on their position in the rule. Note that merge operators, m and normal operators, o are identically drawn from the same operator alphabet i.e. mi ≡ oi . An example of the The Computer Journal,

rule-decoding process is shown in Fig. 5. Starting from strings of integers on the left, each number is mapped to a symbol depending on its position within the rule. Merge operators are removed from local proteins and special control symbols are parsed to yield the final rule shown in the right of the figure. This example demonstrates how varying length rules can be created from a fixed length string of numbers. In summary, a dGRN controller consists of a number of rules acting on proteins to generate new protein concentrations. Operators are used to combine two protein concentrations together, and they are designed as part of the dGRN, rather than being specified by the user. The following section discusses how randomly generated dGRN controllers can be refined through a process of evolution. 2.3.

Phase 3: Evolving dGRN controllers

The random dGRN controllers generated in Phase 2 would, in general, be expected to perform quite poorly at the test scenarios specified in Phase 1, resulting in solutions with low fitness (corresponding to high values of the fitness function). However, some controllers would perform slightly better than others. It is these differences between controller performance that are used by the evolutionary algorithm, which is a search method inspired by the process of natural selection. In our work, we use Differential Evolution [3], which is a variant of the standard genetic algorithm. It uses continuous values for the genes and a crossover mechanism that incorporates differences between individuals in the population. This helps to reduce the sensitivity to tuning parameters for crossover, and also has been shown to be superior to the standard genetic algorithm for high dimensional problems [3]. Due to the discontinuous nature of the search space, gradient based methods are unable to generate good dGRN controllers as they become trapped in local minima. The steps involved in this process are as follows: 1) Initially, a number (e.g. 100) of candidate dGRN controllers, comprising the population, are randomly generated. Their performance at the task is evaluated Vol. ??,

No. ??,

????

6

Markham and Trigoni

using the fitness function specified by the domain experts in Phase 1. 2) A new child solution is generated from each individual (the parent) in the population and three other individuals (one of which acts as a pivot node and the remaining two as genetic ‘donors’) chosen at random. For each gene in the chromosome, the parent’s gene is replaced with a probability of crossover DE CR = 0.8. To replace the gene, a weighting factor (DE F = 0.2) is used to scale the difference between the genes of the two donors. This weighted difference is then added to the gene of the pivot node. This particular approach is referred to as the DE rand 1 bin update equation (refer to [3] for the Differential Evolution algorithm and a discussion of how to choose the parameters). 3) The performance of these new individuals is assessed using the fitness function. If they are better than the parent, the parent is replaced by the child. 4) Steps 2 and 3, which together comprise a single generation, are iteratively executed until either a target fitness is reached or the number of generations exceeds a limit (e.g. 300). Note that a dGRN controller has two parts, namely a set of rules (initially randomly generated) and a table of operators (also initially randomly generated). We evolve dGRN controllers in an iterative, two step process. In the first step, the rules of the dGRN controllers are evolved, keeping the operators fixed. The fitness of these controllers is evaluated and the best controller (the one with the lowest fitness score) is selected. We then fix the rules of this controller and evolve its operators. This iterative process of alternatively evolving rules and operators is continued until either a good solution is found or the number of generations exceeds a preset limit. Once the best dGRN controller has been identified, it needs to be disseminated to the nodes and executed locally. 2.4.

Phase 4: Disseminating and executing dGRN code

In this section we discuss, with particular reference to communication overhead, the dissemination and execution of dGRN controllers. In order to assess the cost of sending a dGRN controller to nodes in the sensor network, we must now examine the size of a dGRN controller in bytes. As discussed in Phase 2, a ruleset can be represented by a string of integers, with each number being mapped to a symbol from either the operator alphabet or the protein alphabet. For simplicity, assume that each alphabet has 16 or fewer symbols in it. We can thus represent each symbol as a 4 bit number. Recall that a rule consists of a sequence of symbols in the form (merge m, protein P , operator o). Thus, these three symbols can be written using 3 × 4 bits = 12 bits. Assume that in our example 3 proteins have update rules, with a maximum rule length of 3. The total number of bits required to write the ruleset The Computer Journal,

is 12 × 3 × 3 = 108 bits (14 bytes). Additional to the ruleset is the operator lookup table. The number of bits required for each operator depends on the base used3 . Let B = 4 and assume 16 operators are present in the lookup table. Each operator consists of B 2 numbers, and each of these can be written as a 2 bit digit. Thus an operator is 42 × 2 = 32 bits = 4 bytes in size. As there are 16 operators in total, the total size used by the operator pool will be 16 × 4 = 64 bytes. Other parameters which need to be specified include the sensor and actuator mappings, the base, number of proteins and the update rate. In this example these add approximately 20 bytes of overhead. Hence in total, this example dGRN controller can be expressed in 14 + 64 + 20 = 98 bytes. This is less than the maximum packet length of 128 bytes in 802.15.4 networks. This example has assumed that no optimizations have been performed on the rule and operator alphabets – preparsing the rule and reducing the size of the operatorset by eliminating operators which are not used in the dGRN controller can result in an even smaller dGRN representation. Thus, it can be seen that disseminating new dGRN controllers into a wireless network does not incur a large amount of communication overhead. Once the optimal dGRN controller has been disseminated into the network, the following steps are taken by each node: 1) It maps sensor variable(s) to sensor protein(s) 2) It executes the dGRN rules, updating the protein values 3) It maps actuator protein(s) to actuator variable(s) 4) It broadcasts updated protein values to neighbours and receives updated protein values from its peers 5) It waits for t units of time Note that steps 1 – 3 and 5 do not involve any communication and are executed locally on each node. Consider the communication cost of step 4. Assume that we have 4 proteins in our dGRN controller. If we use B = 4, each protein concentration can be written as a 2 bit number. Hence in total, a protein update requires 4 × 2 = 8 bits to be sent to neighbours. Note that proteins only need to be transferred when they have an effect on neighbours update rules (i.e. if P¯1 is not present in the controller, protein P1 does not need to be sent to peers). Further to this, if we assume that a node can buffer its neighbour’s protein values, they only need to be sent when they change. Depending on the timebase t, it may be possible to piggyback the protein update on top of regular messages such as neighbour advertisement beacons. Thus it can be seen that updating protein concentrations does not have a large communication overhead.

3 Note in some cases, especially for large bases, the byte size of the transmission will be reduced if the coefficients for the basis function are sent, rather than the expanded B × B matrix.

Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation 7 3.

EVALUATION

In this section, we investigate the benefits of the dGRN approach with a simple scenario, namely collaborative target tracking in a wireless sensor network. Our goal is to minimize the average sampling rate required to track the target, subject to the target tracking accuracy being met. Primarily, we want to investigate whether dGRN controllers can communicate effectively with each other to achieve the scenario goals. It must be stressed that any communication which occurs between controllers is emergent and has not been specified by the designer. Secondly, we ask whether the solutions obtained are comparable to those that would be expected from a human designer. Intuitively, we would expect that the faster the target moves through the sensor field, the more aggressively nodes communicate in order to activate nodes in the neighbourhood to maintain continual tracking. Similarly, the higher the desired tracking accuracy, the larger the region of activation needs to be. Indeed, as will be shown, this is precisely the behaviour that occurs, without a programmer instructing detailed behaviour. Thus, the tracking strategy generated by the dGRN controllers varies according to the particular application scenario to meet user requirements. We also investigate the sensitivity of the dGRN approach to the various model parameters, such as the base and the mapping. In addition, we investigate whether the dGRN controllers can be engineered to be robust to communication failures and whether a dGRN controller can adapt to changes in the environment, such as variations in target speed. Lastly, we conduct an experiment to determine if dGRN controllers can be evolved for a real world mobility trace. 3.1.

Simulation Parameters

As potential application users, we need to set a number of parameters that were discussed in Phase 1. The sensor protein is defined as target presence and speed, and the actuator protein as the sampling rate. Note that the sampling rate controls the probability of a node attempting a target detection in a particular iteration. Additionally, two hidden proteins are included and the maximum rule length is set to be 4. For the dGRN controller, a base B = 16 is used, unless otherwise specified. For the network test case, the nodes are arranged in a regular, square lattice and each node can communicate with its direct neighbours. A total of 64 nodes are arranged in an 8 × 8 cellular array. The total simulation time is set to be 1000 s, with a 1 Hz communication rate4 . In a single update cycle, nodes transmit and receive broadcast packets with their adjacent neighbours, that is, those immediately to the 4 This is a typical communication rate for a low-power, duty cycled sensor network.

The Computer Journal,

top, bottom, left or right of the current cell. Hence, communication with a neighbour on the diagonal requires two hops, and thus takes two seconds to undertake. For simplicity, we assume that each node’s sensing area is square with side of 10 m. The target executes a random walk across the sensor field. The target randomly selects its destination to be either the current cell or one of the eight surrounding cells, and moves to the destination at a constant speed. Once it has reached the centre of the destination cell, it then randomly selects a new destination. In order to assess dGRN performance, the fitness function defined in Eq. 1 is used. The dGRN evolutionary process was written in a mixture of C and Python. Typical execution time for a single run, consisting of 300 generations, is approximately nine minutes on a 2.70 GHz dual core pentium with 4 Gb of RAM. As a comparison, a 300 generation run for a 900 node network takes approximately 90 minutes. For each operating point (i.e. target speed and tracking accuracy), 30 runs were executed and the best performing dGRN controller was selected. Thus, evolving a dGRN controller for the 64 node scenario takes approximately five hours. 3.2.

Competing Algorithms

In this section, we describe the algorithms that we are using as a basis for comparison: two simple engineered strategies, wake-on-exit and wake-on-enter, a reinforcement learning technique and an Oracle baseline. Wake-on-exit 1. 2. 3.

4.

Initially all nodes sample. Let the node which detects the target be St . This node remains on. All other nodes power down. When St detects that the target has moved, it initiates a process to wake up the nodes in the 8 surrounding cells. Due to the limited communication range, this occurs in two communication steps (i.e. it takes 2 s in total). In the first step, St wakes up the four neighbours at the {N ; E; S; W } positions. In the second step, the latter wake up the diagonal neighbours at the N E; N W ; SE; SW positions. When a node receives a wakeup notification it powers up its sensor in search for the target. If it detects the target it goes to step (2), otherwise it remains in sleep mode.

Wake-on-entry 1. 2. 3.

Initially all nodes sample. Let the node which detects the target be St . This node remains on. All other nodes power down. St initiates a process to keep nodes in the 8 surrounding cells awake whilst the target is present.

Vol. ??,

No. ??,

????

8

4.

Markham and Trigoni Due to the limited communication range, this occurs in two communication steps (i.e. it takes 2 s in total). In the first step, St wakes up the four neighbours at the {N ; E; S; W } positions. In the second step, the latter wake up the diagonal neighbours at the {N E; N W ; SE; SW } positions. This process is repeated in each iteration, for as long as the target is present at St . As long as a node receives a wakeup notification, it keeps its sensor powered up to search for the target. If it detects the target, it goes to step (2). If it does not receive a wakeup notification, it enters sleep mode.

Reinforcement learning The goal of reinforcement learning (RL) is to design a policy, Π, that specifies the optimal action to take at each timestep (i.e. to either sample or sleep) given a current system state [4]. The attractive feature of RL is that it is a form of unsupervised learning and can be performed online, if the state space is small enough. We will now briefly describe how the RL technique works. In a particular timestep, a node will be in state s ∈ S. It can take either one of two actions a ∈ A: to sample as , or to remain idle ai . The external environment (in this case, the node itself), will give a reward, r ∈ R based on the outcome of the chosen action. The two ¯ outcomes are: D, that the target was detected, and D, the target was not detected. For example if the node samples, and the target was detected, then it is given a positive reward, r1 . Conversely if the node samples and does not detect a target, it is penalized with a negative reward, r2 , as the action of sampling consumed energy. If a node does not sample in a timestep, it gets a zero-value reward. Rewards update the action-value function Q(s, a) which represents the expected reward of undertaking an action a given a particular node state s. The action-values are updated using the following: Q0(s, a) ← (1 − α)Q(s, a) + αr, where α is the innovation rate which controls how rapidly new information is incorporated into the actionvalue function and r is the reward given by the node when the action was undertaken. Thus, given a state, the objective is to choose the action which maximizes the expected reward. In order to explore the state space, we use an -greedy policy. This means that the agent most often selects the action that will maximize the reward, but with probability  selects an action at random. We now discuss how a node evaluates its current state. Each node creates a tuple of the outcomes from the last two actions it performed, which we refer to as a partial state. As an example, if a node has not detected a target in the last two timesteps, its partial state will ¯ D.To ¯ be D evaluate its state, a node concatenates its The Computer Journal,

own partial state, and the unique partial states from its eight neighbours. For example, if a node has the partial ¯D ¯ and the partial states of its neighbours are state D ¯ ¯ D ¯ D; ¯ D ¯ D; ¯ D ¯ D; ¯ D ¯ D; ¯ D ¯ D; ¯ D ¯ D} ¯ the resulting {DD; DD; ¯ ¯ ¯ ¯ ¯ ¯ state s will be hDD{DD; DD; DD}i. By only considering the unique states, the size of the state space is greatly reduced. Note that the partial state from the two-hop neighbours incurs a one cycle delay. It must be noted that the reinforcement learning technique is sensitive to the choices of the rewards and the two parameters α and . Thus, we had to search exhaustively for values in the range (1e − 4 ≤ α ≤ 0.1) and (0.01 ≤  ≤ 0.1) which would result in the best performance for each of the operating points. In addition, this technique required at least 100000 steps of simulation time before the action-values converged. This suggests that in some applications it might not be feasible to employ online reinforcement learning. Oracle This is the lowest possible sampling rate that can be achieved if the target motion were known in advance and corresponds to a single node tracking the target at each time step. This places a lower bound on the achievable duty cycle. Our motivation for providing both a wake-on-exit, and a wake-on-entry strategy is as follows: We noted that the performance of each strategy depends on the frequency at which nodes can communicate, and how that relates to the target speed. We provide an example with three scenarios where we fix the communication frequency to one broadcast per second, and vary the target speed. For simplicity, in this example, we consider a 1-D tracking problem, where a target moves from the current cell to either the right or left cell, and the user requires a tracking accuracy of 90%. In the first scenario, a person is moving on foot, travelling up to 1 m/s (3.6 km/h). It is simple to obtain the required tracking accuracy in this case by simply informing neighbouring nodes when the target has exited a particular node’s sensing area (wake-onexit). Even if the target is not being actively observed for 1 s, it is still being tracked with high accuracy (9 out of 10 s). In the second scenario, the sensor network is tracking a target on a bicycle, moving at 1–10 m/s (3.6 – 36 km/h). To obtain the required tracking accuracy a node should send notification to its neighbours when it first detects the target, and for as long as the target remains there (wake-on-enter). If it waits until the target is out of its sensing range, it will fail to meet the accuracy requirement. In the third scenario, the target is a car, moving at speeds greater than 10 m/s (36 km/h). In this case, the rate of communication is not sufficient to maintain tracking performance and the target will eventually be lost. Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation 9 30%

18%

Oracle dGRN Wake-on-entry Reinforcement

Average Sampling Rate

Average Sampling Rate

16% 14% 12%

Oracle dGRN Wake-on-exit Wake-on-entry Reinforcement Learning

10% 8% 6% 4%

25%

20%

15%

10%

5%

2% 0% 65%

70%

75%

80%

85%

90%

95%

0% 65%

100%

70%

75%

80%

Tracking Accuracy

FIGURE 6: Variation in average sampling rate with accuracy requirements for the various algorithms with a target speed of 1 m/s. Note that the wake-on-exit algorithm is unable to achieve an accuracy higher than 87% for this particular speed due to the communication latency of 1 s.

This example shows that the tracking performance depends on the ratio between the communication and target speeds, and that the tracking strategy has to be selected for each particular scenario. It thus highlights the interplay between the MAC layer and collaborative sensing in wireless sensor networks.

85%

90%

95%

100%

Tracking Accuracy

FIGURE 7: Variation in average sampling rate with accuracy requirements for a target speed of 5 m/s. Note that the wake-on-exit algorithm is unable to track the target at this speed. 85% 1 m/s



95% 1 m/s



85% 5 m/s





149

150























151

152



153

154

Simulation Timestamp (units)

3.3.

Varying target tracking accuracy

In this experiment, we examined how various tracking accuracies and speeds of motion impact both the energy used and the temporal activation patterns required. The desired accuracy was varied from 70% to 95% and the speed was varied to be either 5 m/s or 1 m/s. In this experiment, we used a binary mapping, that is, nodes were either on with a 100% sampling rate, or off with 0% sampling rate, depending on the value of the actuator protein. The average sampling rate of the various controllers for a target speed of 1 m/s is shown in Fig. 6. It can be seen that the average sampling rate of the dGRN controllers is lower than that of all other algorithms (with the exception of the Oracle lower bound). More importantly, note that the dGRN approach is able to construct solutions where the wake-on-exit algorithm fails, namely for tracking accuracies of greater than 87%. The wake-on-exit algorithm cannot achieve a higher accuracy than this due to the impact of the communication latency. Average sampling rates for dGRN obtained solutions for a target speed of 5 m/s are shown in Fig. 7. At this speed, the target dwell time within a cell is 2 s, and the wake-on-exit algorithm loses track of the target. As there is no exploratory action (random sampling), the target will never be located again once it is lost using this algorithm. The results show that the performance of all the approaches is largely similar, except at accuracies above 85%, where the dGRN The Computer Journal,

Key:

0%

100%

● Target position

FIGURE 8: Timeseries of typical activation patterns for the various scenarios. Observe how the area of activation becomes larger the more challenging the tracking task becomes, i.e. as the required accuracy or the target speed increases.

approach starts to use a higher average sampling rate. The dGRN curves shown in Figs. 6 and 7 form a Pareto Frontier diagram which show the relationship between sampling rate (which is proportional to energy consumption) and tracking accuracy for varying target speeds. These curves can help to guide the user in choosing a dGRN controller which best satisfies the application requirements in terms of network lifetime and tracking fidelity. Note that each of these points on the curves corresponds to an actual dGRN controller that has been evolved. Thus, once the operating point has been chosen, the controller corresponding to that particular point can be instantly sent to the network. The time dynamic behaviour of obtained solutions is shown in Fig. 8 for three different scenarios. The shading of each cell indicates each node’s chosen sampling rate, with white blocks representing nodes which are on (100% sampling rate). Observe that depending on the target speed, cells around the target are also awake, attempting to detect the target as it moves. Thus, the dGRN controllers, whose Vol. ??,

No. ??,

????

10

Markham and Trigoni

Average Sampling Rate

40%

30%

20%

10%

0%

Binary

Fixed Linear

Evolved Quadratic

Evolved Linear

Evolved Sigmoid

FIGURE 9: Box-plot showing the effect of different actuator mappings on the distribution of sampling rates for 30 evolved controllers. This shows that although mappings play an important part in the performance of dGRN controllers, they can be evolved as part of the controller. Target speed is 1 m/s and tracking accuracy is 85%.

rules are automatically created, have disseminated the target’s presence to nodes which are not their one-hop neighbours (and in some cases are four hops distant). It must be strongly emphasized that the mechanism in which this information is communicated is an emergent property of the dGRN controller and is generated by the evolutionary process to satisfy the application requirements. Of special interest is the fact that some dGRN controllers locate the initial position of the target by turning all nodes on in the first timestep, followed by turning off all nodes except the one which located the target. This is an evolved behaviour (as all nodes start with all protein states equal to zero) and is identical to the human designed algorithms. This highlights the power of the dGRN method in that it is able to automatically design distributed algorithms to meet the specified objectives, obviating the need for a human to design different algorithms for different tracking regimes. Note that it does so without modification to any parameters other than the target’s mobility pattern. This experiment has demonstrated that communication between nodes is emergent, and that activation patterns alter according to the particular application requirements. 3.4.

The impact of alternative mappings

Mappings have a large impact on the performance of the dGRN controllers, as they provide the interface between the discrete protein levels and real world values. In Fig. 3, we used a fixed, designer specified, mapping where concentrations of actuator protein P1 were mapped to a corresponding sampling rate S : [0 ≤ S ≤ 1]. The problem with an arbitrary mapping is that it might not result in the best possible The Computer Journal,

dGRN controller for a particular application scenario. Furthermore, manually specifying a good mapping is likely to involve a large amount of tuning, precisely what the dGRN approach is attempting to minimize. In this section, the effect of different mappings (between the actuator protein, P1 and node sampling rate, S) on the evolved dGRN controllers is investigated. In particular, we determine if these mappings can be automatically evolved as part of the dGRN controller, rather than being specified externally by the designer. We examine the effect of five different mappings (two fixed, and three evolved) on the evolvability of dGRN controllers. These mappings are specified below. Firstly, let Pˆ1 be the normalized value of the actuator protein P1 : P1 (4) Pˆ1 = B−1 In general, the aim is to specify a function, S = f (ˆ(P1 ), which relates the normalized value of the actuator protein to the sampling rate. The first fixed mapping which was used is a binary mapping where the sensor is either on, S = 100%, or off, S = 0%. Fixed Binary:  1 : Pˆ1 > 0.5 S= 0 : Pˆ1 ≤ 0.5 The other fixed mapping which was investigated is a direct linear mapping between protein concentration Pˆ1 and S: Fixed Linear: S = Pˆ1 . We now consider a more general linear mapping, with two parameters b and c which are evolved as part of the dGRN controller: Evolved Linear: S = bPˆ1 + c Note that parameters b and c do not have to be positive and hence an inverse relationship between protein concentration and sampling rate can be evolved if required for the application. Note that the mapping does not have to be linear. We also investigated a higher order polynomial model: Evolved Quadratic: 2 S = aPˆ1 + bPˆ1 + c

where a,b and c are all evolved coefficients. Another possible mapping, which is commonly used in artificial neural networks[5], is a sigmoidal basis function: Evolved Sigmoidal: S=

1 (1 +

e−bPˆ1 +c )

where b and c are parameters which are evolved as part of the dGRN controller rule. Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation11 For each mapping, 30 dGRN evolutionary runs (each consisting of a 300 generation search process) was executed. The target speed was fixed at 1 m/s and the desired tracking accuracy was set to 85%. As the evolutionary algorithm is a guided randomized search, each run typically results in a different solution. Ideally, one would like a mapping which consistently results in the best solution possible, with the smallest spread between best and worst controllers. The distribution of the results from the experiment is shown in Fig. 9. It can be seen that, for this problem, the fixed binary mapping results in the lowest average sampling rate. The evolved linear mapping can achieve a similarly low duty cycle, but the solutions are more widely dispersed than for the fixed binary case. Note that the results from the evolved linear mapping are better than the fixed linear mapping. The mapping which performs the worst is the sigmoidal mapping. This experiment demonstrates that mappings do not have to be specified by the user, but can instead be automatically parameterized as part of the dGRN controller evolutionary process. For the remainder of this paper, unless otherwise specified, an evolved linear mapping will be used, as it resulted in the best dGRN controllers out of the evolved mappings we considered. This experiment demonstrates that mappings between proteins and real world values can be evolved as part of the controller, and do not have to be specified by the user. 3.5.

Effect of dGRN parameters on evolved solutions

In this section, we perform a sensitivity analysis on the dGRN model parameters, such as the base, in order to determine their individual impact on the obtained controllers. The target speed was set to 1 m/s and the tracking accuracy to 85%. The basic test parameters were B = 16, 4 proteins and a maximum rule length of 4. A linear mapping was evolved as part of the dGRN controller. For each particular operating point we examine the distribution of fitness scores from 30 runs (each consisting of 300 generations). The effect of varying the controller base, B, (whilst holding the number of proteins and the the maximum rule length constant) is shown in Fig. 10(a). It can be seen that a base B = 2 (i.e. binary protein concentrations) is unsuitable for the target tracking application as it is unable to create any controller that results in a duty cycle under 10%. Bases greater than 2 show a largely similar distribution of sampling rates. This implies that as long as the base chosen has a sufficiently fine granularity (i.e. that the state space is large enough), the resulting dGRN controllers’ performance is likely to be much the same. The result of altering the number of proteins used in the dGRN (whilst holding the base and the maximum rule length constant) is shown in Fig. 10(b). For this The Computer Journal,

scenario, using two proteins (i.e. just the actuator and sensor proteins) leads to poorly performing dGRN controllers, where the lowest sampling rate achieved is 10%. Using a total of four proteins, corresponding to two internal protein state variables, results in the best dGRN controllers. Note that simulation time increases linearly with increasing numbers of proteins. Lastly, the impact of changing the the rule length is shown in Fig. 10(c). This demonstrates that a short rule length results in dGRN controllers which do not exhibit rich enough behaviour to achieve low average sampling rates for the required accuracy. As the maximum rule length is increased, the dGRN controllers become better. Again note that simulation time increases linearly with rule length. These results show that in order to evolve good dGRN controllers, the dGRN representation needs to be rich enough to create multi-state behaviour. However, once the parameters chosen enable the controllers to exhibit the required objectives, further increases generally do not lead to markedly improved controllers. Note however that a practical limit is placed on both the rule length and number of proteins as increasing these values results in slower simulations. Choosing the right parameters to use for a particular scenario could be undertaken using a meta-search algorithm, that is, both the dGRN controllers (including the mapping) and the dGRN model parameters can be evolved in tandem, removing the need for a user to specify them. Thus the user would only have to specify a fitness function. These experiments have shown that certain minimum values of model parameters must be met in order to evolve good dGRN controllers. However, once these have been met, the resulting controllers are relatively insensitive to further variations in these parameters. 3.6.

Training for communication failures

In a wireless sensor network, link quality dynamically varies with time. Links may fail due to obstructions or environmental effects such as rainfall. In addition, protein packets can be lost through collisions when two or more nodes attempt to send their messages at the same time. These link failures will affect the operation of the dGRN controllers, as they will only have partial state information from their neighbours. In this section, we investigate whether dGRN controllers can be trained to handle communication failures. We would expect that as the probability of link failure increases, the average sampling rate would increase as nodes would have to take more exploratory action in order to compensate for an incomplete view of their neighbourhood. For this experiment, we consider a scenario where the target tracking accuracy is set to 90% and the target moves at a speed of 2 m/s. The base is set to B = 4. We vary the probability of communication failure from 5% Vol. ??,

No. ??,

????

12

Markham and Trigoni

60%

40%

20%

2

4

8

16

32

64

128

Average Sampling Rate

Average Sampling Rate

Average Sampling Rate

80%

0%

100%

40%

100%

30%

20%

10%

0%

2

3

4

80%

60%

40%

20%

0%

8

1

Number of Proteins

Base

(a)

2

4

8

Maximum Rule Length

(b)

(c)

FIGURE 10: Impact of altering dGRN parameters on the sampling rates of obtained controllers. (a) Effect of altering the base. (b) Effect of altering the number of proteins. (c) Effect of altering the maximum rule length.

3.7.

Adaptive speed tracking

The last experiment was conducted to determine whether a dGRN controller can exhibit different behaviour depending on the speed of the target – for example, if the target is moving slowly, one would expect that the activation pattern be smaller (i.e. fewer nodes active) than if the target is moving rapidly. To train the dGRN controller, the target was set to switch between speeds of 5 m/s and 1 m/s every 50 s. The required tracking accuracy was specified to The Computer Journal,

60%

Average Sampling Rate

to 100% (no communication). A communication failure is defined as the loss of a protein update message from a dGRN controller to all its peers in the neighbourhood. We compare the performance of the obtained dGRN controllers against a non-communicating probabilistic wakeup model, where all nodes have the same default sampling rate. In this algorithm, when the target is detected, the node sets it sampling rate to 100% until the target leaves the node, upon which the sampling rate is returned to the default rate. We use the lowest default sampling rate which achieves the application objectives. We expect that as the amount of successful communication decreases, the performance of the evolved dGRN controllers will tend towards that of the non-communicating wakeup model. The results from this experiment are shown in Fig. 11. It can be seen that a 100% communications failure results in an average sampling rate for the dGRN controller very close to the non-communicating algorithm. As expected, the more information a dGRN controller can obtain about its neighbours’ state, the better the decisions it can make to achieve the application objectives, as witnessed by the decrease in average sampling rate as the probability of communications failure drops. This experiment showed that dGRN controllers can be trained to operate in lossy communication environments. The less information that can be obtained from peers, the higher the average sampling rate needs to be to compensate for incomplete knowledge.

50%

40%

30%

20%

dGRN Non-communicating probabilistic wakeup

10%

0% 100

80

60

40

20

0

Link Failure Probability [%]

FIGURE 11: Variation in average sampling rate with communication failures. It can be seen that when no communication can occur (100% failure), the average sampling rate of the dGRN controller is very close to that of the non-communicating algorithm. With lower probabilities of failure, the average sampling rate of the dGRN becomes lower, as nodes can act on information from their peers, based on the presence or absence of the target.

be 90%, and a linear mapping was evolved as part of the dGRN controller. The results from the adaptive speed tracking are shown in Fig. 12. The first 1000 s shows the speed relationship that the dGRN controller was trained for and the remaing time is a test case involving speeds for which the dGRN controller was not trained for. The results show that the dGRN controller can perfectly track speeds which it has been trained for. In addition, observe how the sampling rate varies according to the speed of the target. This shows that the dGRN controller is altering its collaborative tracking strategy. Notice that the evolved controller can accurately track all speed values less than 5 m/s, which is the maximum speed that it was trained for. When presented with target speed’s greater than what it encountered in the training set, it is unable to accurately track the target. However, due to the random sampling evolved as part of the linear mapping, it is able to reacquire the target and track it again perfectly when it moves slower than 5 m/s. This shows Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation13

3.8.

Evolving dGRN controllers for real world traces

So far in our evaluation of dGRN controller performance, we have only considered tracking a target moving with a random walk. In this section, we examine whether a controller can be evolved using real traces. As an example, we use traces available online from a cattle monitoring application [6]. Cows in a field were equipped with GPS enabled mobile phones which logged location (latitude and longitude) every 3 s for a period of a few hours. 64 nodes, each with a 20 m sensing range, were uniformly placed in an 8 × 8 lattice. The nodes thus cover a region of 160 m × 160 m. The dGRN controllers were specified to use 4 proteins, have a maximum rule The Computer Journal,

100%

25%

80%

20%

60%

15% 40%

10%

0%

20%

Average Sampling Rate Accuracy

5%

0

1000

2000

3000

0% 4000

3000

4000

Tracking Accuracy

Average Sampling Rate

30%

Time [s] 10

Speed [m/s]

8 6 4 2 0

0

1000

2000

Time [s]

FIGURE 12: dGRN controller adapting to speed of target. Note that when the target is moving rapidly, the average sampling rate is higher.

160

160

140

140

120

120

100

100

Y [m]

Y [m]

that the dGRN is able to generalize to speeds which are slower than the maximum that it has previously encountered, but cannot accurately track faster moving targets. In Fig. 13 we show how the protein values, including the hidden proteins, for all nodes in the network alter in response to a change in the target’s speed. At time t = 650, the target changes from moving at 5 m/s to moving at 1 m/s. Fig. 13(d) shows the concentrations of the sensor protein, P4 . This protein concentration is only set to a high value if a node’s sensor detects the presence of a target in a particular timestep. Note that only one node detects the target at each point in time. Further note for t < 650, the target changes position every 2 s, and for t ≥ 650 the target moves position every 10 s. The concentrations of the actuator protein are shown in Fig. 13(a). Observe that whilst the target is moving rapidly, a large number of nodes are simultaneously sensing at each time step in order to maintain the high tracking accuracy of 90%. Thus the node which has detected the target has communicated the presence to its four neighbouring nodes, which have in turn informed their neighbours. However, for t > 650, the number of nodes simultaneously sensing decreases. This reduces the average sampling rate, conserving energy. Thus the dGRN controller evolved has generated a distributed tracking strategy whose operation varies in relation to the speed of the target. The plots Fig. 13(b) and (c) show the time dynamic concentrations of hidden protein 1, P2 , and hidden protein 2, P3 , respectively. Note how P2 exhibits a large dynamic variation in value, whereas P3 closely mirrors the value of the actuator protein P1 . These plots demonstrate the complex signalling that has been automatically designed in order to tackle the target tracking scenario. This experiment demonstrates that a dGRN controller can exhibit adaptive behaviour by allowing its sampling rate and communication patterns to vary according to the speed of the target.

80

80

60

60

40

40

20

20

0

0

20

40

60

80

100

120

140

X [m]

160

0

0

20

40

60

80

100

120

140

160

X [m]

(a)

(b)

FIGURE 14: Traces from cattle. (a) Trace used to train the dGRN controllers (ID 125). (b) Trace used to test the dGRN controllers (ID067).

length of 4, and use B = 16 proteins. A linear mapping was evolved as part of the controller. The required tracking accuracy was set to 90%. The sensing protein indicated either target presence or absence (i.e. no speed information was embedded in the mapping). In the previous experiments, we fixed the communication rate to 1 Hz and varied the target speed. In this experiment, the target speed is dictated by the traces themselves, and thus we consider the impact of altering the communication rate from 1 Hz to 0.1 Hz. In order to test whether a generic dGRN controller can be created for a particular communication rate, they were trained using a 10 000 s trace from one cow (ID number 125) and tested using a 10 000 s trace from a different cow (ID number number 067). The traces for the two different cows (superimposed on the node grid) are shown in Fig. 14. The results from this experiment are shown in Fig. 15, which compares the performance of the evolved dGRN controllers against the wake-on-entry and wake-onexit algorithms for two different communication rates. Fig. 15(a) shows that all approaches can achieve a very high accuracy (Qa > 98%) when the communication rate is 1 Hz. The dGRN controller has evolved a Vol. ??,

No. ??,

????

14

Markham and Trigoni

(a)

(b)

(c)

(d)

FIGURE 13: Dynamic protein concentrations for the adaptive speed tracking example. The horizontal axis is the timestamp and the vertical axis is the node’s ID. The intensity of each block is proportional to the protein concentration, with white corresponding to B − 1 and black to 0. At time t = 650, the target changes speed from 5 m/s to 1 m/s. Observe how the number of nodes with sensors enabled (white) decreases. (a) Actuator protein values. (b) Hidden protein 1 values. (c) Hidden protein 2 values. (d) Sensor protein values.

4.

16%

Average Sampling Rate

14%

Wake-on-entry Wake-on-exit dGRN

12% 10% 8% 6% 4% 2% 0% 98%

99%

100%

Tracking Accuracy

(a) 16% Wake-on-entry Wake-on-exit dGRN

14%

Average Sampling Rate

strategy which has the lowest average sampling rate. Note that the wake-on-entry strategy has a tracking accuracy of 100%, but has a considerably higher average sampling rate. The performance of the various algorithms with a 0.1 Hz communication rate is shown in Fig. 15(b). The accuracy of wake-on-exit is 78% and the accuracy of wake-on-entry is 88%. At the end of the test trace, both of these strategies had lost the target. As they do not randomly sample, they cannot reacquire the target. The accuracy of the dGRN controller is also 88%. However, note that the dGRN controller evolved a linear mapping which created a baseline average sampling rate of 0.9%. Thus, if the target is lost (for example, if it is moving more rapidly than the rate of communication), the dGRN controller can reacquire it. Although a random sampling rate could be included in the other two algorithms, the level of this would require tuning by a human. This test has demonstrated that a dGRN controller can be evolved for a real world mobility trace and a particular communication rate. The dGRN controller was able to track the trace of a cow which was not part of the original training set.

12% 10% 8% 6% 4% 2% 0% 78%

80%

82%

84%

86%

88%

90%

Tracking Accuracy

(b)

FIGURE 15: Accuracy vs Sampling Rate plots for the various algorithms for the real traces. (a) Communication rate of 1 Hz. (b) Communication rate of 0.1 Hz.

TESTBED IMPLEMENTATION

In order to demonstrate that the dGRN approach can simplify sensor network configuration, we present results in this section from a real world testbed, shown in Fig. 16. We investigated a simple target tracking application where all the nodes are arranged in a line (1-D tracking). An example of such a scenario could be tracking people in a corridor, tracking vehicles on a road or monitoring animals along a fence. Note that in each of these scenarios, there are a number of variables which impact the configuration of the sensor network such as the expected speeds of the target and the required tracking accuracy. We compare the evolved dGRN controllers against a generic algorithm (1-hop activation) which simply instructs neighbouring nodes The Computer Journal,

to turn on their sensors when a target is detected. The dGRN controllers used a base B = 4, and there were two hidden proteins, one sensor protein (target presence or absence) and one actuator protein (effected sampling rate). The mapping that was used to control the probability of sampling at each dGRN cycle was the same as used in Fig. 3. Each node was equipped with a light sensor which measures incident illumination from the “target” (a torchbeam) which was moved over the nodes. When the sensor was powered up, if the light level exceeded a threshold, the actuator protein was set to 3 (target present), otherwise it was set to 0 (target absent). The target tracking accuracy was set to 90%. The dGRN system was implemented using the Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation15

Average Sampling Rate

40%

30%

20%

dGRN 1-hop

10%

0%

Slow

Fast

Target Speed

FIGURE 16: Photograph of the 1-D target tracking testbed. Each node is equipped with a lightsensor which is shielded for directionality. The torch acts as the target. The node in the bottom of the picture is a sniffer which logs all of the packets.

Contiki OS [7] on a T-Mote SKY equivalent platform5 , using a total of 8 tracking nodes. The dGRN controller used an additional 3.6 kB of Flash and 176 bytes of RAM. The node in the bottom of the picture is a sniffer which simply captures all packets and relays them to the PC for realtime display and logging. Due to the close proximity of the nodes, the routing tables are fixed such that a node only listens to transmissions from its immediate neighbours. Access to the wireless channel is random and protein concentrations are disseminated in unacknowledged broadcast packets. The dGRN update cycle time was set to 2 [s] and the interval between transmitted protein beacons to 0.5 [s]. We tested the performance of the controllers with a target moving at a high speed (pause time at each node varied uniformly between 1 and 4 dGRN cycles) and a slow speed (pause time at each node varied uniformly between 4 and 15 cycles). Each experiment was run for 200 dGRN cycles, and was repeated 3 times in order to obtain an average measure of performance. The logfiles that were obtained from the dGRN experiment were then postprocessed to compare the performance of the 1-hop activation tracking algorithm. The results from these experiments are shown in Fig. 17, which demonstrate that the dGRN controllers achieve a lower average sampling rate than the 1-hop activation algorithm. The difference is greater when the target is moving slowly, as the dGRN selectively powers down nodes, resulting in a lower average sampling rate. This can be seen in the time series plots in Fig. 18. The top plot shows the activation patterns when the target is moving quickly, and the bottom plot when the target is moving slowly. There are a number of observations that can be made from these two time series. The first is that the dGRN controllers have placed all nodes which are not in the immediate vicinity of the target into the lowest sampling rate of 12.5%. Next, observe that on average, the number of nodes which are sampling at 5 Code for the dGRN Contiki implementation, and for the other simulations, is freely available from the authors via e-mail request.

The Computer Journal,

FIGURE 17: Results from the 1-D tracking experiment, showing that the average sampling rate of the dGRN controller is lower than the 1-hop activation algorithm.

100% is less than three. This shows that the dGRN controllers have created a strategy which is able to track the target whilst using a lower average sampling rate than the 1-hop activation scheme which sets three nodes to 100% sampling rate (the node that detected the target and its two immediate neighbours). Lastly, examining the difference between the two time series, it can be seen that the dGRN controller evolved to track a slow moving target uses a strategy of setting both nodes adjacent to the target to a sampling rate of 25%, whereas the dGRN controller for fast targets alternately sets immediate neighbours to the target to a 100% sampling rate. Thus, each dGRN controller has been specifically configured for a particular application scenario, where prior knowledge of the average speed of the target has influenced the strategies that have been automatically generated. It must be emphasized that the process involved in creating these different dGRN controllers consists of simply changing the speed distribution in the dGRN simulation. Once the speed distribution has been specified, no further human input is required. Thus the operation of a sensor network can be automatically configured for a particular application scenario by an inexperienced network user. This is the power of the dGRN approach. 5.

RELATED WORK

An overview of the various representations of GRNs, ranging from nonlinear differential equations to random Boolean networks can be found in [8]. However, our approach of using discrete protein concentrations is most similar to that of Thomas on the logical representations of regulatory networks as discrete dynamical systems [9]. One interesting representation of a discrete system is to use discrete polynomials, allowing techniques from number and field theory to be used to reconstruct GRN models [10]. To date, there has been only one application of GRNs to a wireless network problem. Das et al. present the use of a GRN model to address the problem of coverage Vol. ??,

No. ??,

????

16

Markham and Trigoni

Fast

Slow

Time [s] Key

50

52

54

Sampling Rate:

56

58

60

12.5%

62

64

25%

66

68

50%

70

72

74

76

78

80

100%

Target position:

FIGURE 18: Time series showing dGRN controlled sampling rates from the tracking tests. The top series is that captured whilst tracking a target moving at high speed, and the bottom series from a slow speed target. Note the difference in the activation patterns.

in a sensor network [11]. Each node has a randomly assigned sensing radius in which it can sense a certain phenomenon. Nodes can either be active or inactive, and the goal is to maximize the coverage of the region as a whole for a given number of active nodes. Each node is aware of its location and its sensing radius. Nodes which have overlapping sensor regions are allowed to exert a suppressive effect on gene expression. The results from this paper are interesting, as it shows that a distributed optimization algorithm can match the performance of a centralized algorithm (NSGA-II). However, the GRN structure (i.e. the differential equations specifying how proteins interact with one another) has to be specified by the designer. In our approach, all that is required is the specification of a fitness function. Operators and the rules determining how proteins interact with one another are evolved, and the other parameters such as the base can be determined using a meta-search algorithm. In addition, nodes have to use the computationally expensive Runge-Kutta method in order to simulate the (floating point) gene levels, whereas in our method uses integer valued concentrations. Their presented method is a discrete optimization technique for binary (i.e. on or off) gene expression, whereas we allow nodes to finely control their output levels. Lastly, their results are based on simulations and have not been implemented in a real sensor network. One of the first applications of GRNs to realistic distributed systems, was the work by Taylor on multirobot configuration using GRNs [12]. The goal was to configure a group of underwater robots to perform a particular task, namely minimizing the area that they covered by forming a tight cluster. Each robot was treated as ‘cell’, running a GRN. These GRNs communicated with each other, transferring proteins. Protein interactions were controlled by a GRN, which was evolved using a genetic algorithm. The main contribution of their work is the demonstration of The Computer Journal,

how a GRN can be evolved to perform a distributed task. Related to this, Quick et al. showed how a GRN could be evolved to form a control system for a single robot [13]. A more biological investigation is the work of Knabe et al. on evolving biological clocks [14]. Living organisms have an internal oscillator which affects their behaviour (in particular, the circadian cycle is an example). These oscillators are able to adjust and synchronize to environmental stimuli, such as variations in day length. In this work, the authors present a twolevel model of protein modulation of gene expression. Their results show how GRNs can be evolved to oscillate in synchrony with an applied reference signal, and how noise and phase changes affect the resulting oscillation. Various types of genetic algorithms have been used to optimize network coverage [15], topology [16] and routing [17]. The genetic algorithm is used as centralized, global optimization method which specifies parameters for each and every node in the network. The problem with this approach is that the solution found is static, and thus cannot adapt to changes in network topology or account for node failure. One way of tackling the static nature of the solutions is to run the genetic algorithm periodically, as in [18]. In this way, the nodes can adjust to changes in their environment, however, nodes need to send their current state (e.g. energy levels, number of neighbours) to the base-station where the genetic algorithm is typically run. This results in scalability issues and also an overhead cost of acquiring the information. However, the genetic algorithm can choose the best behaviour for each node as it has a global view of the network. Boonma and Suzuki present a biologically inspired method to control the behaviour of mobile agents in a wireless sensor network [19]. Parameters affecting agent operation are encoded in its genetic information. Elite agents which outperform their peers on competing objectives are disseminated back into the network. This is an interesting approach to dynamically tuning agent behaviour using an online genetic algorithm. However, where it differs from our work is that gross agent behaviour is specified a priori and the genetic algorithm is used to fine tune parameters affecting agent operation. Our work seeks to design the distributed behaviour itself. A distributed approach which has been used to control node behaviour is resource allocation through reinforcement learning, where nodes learn the expected payoff for certain actions based on their prior behaviour [20]. Nodes independently choose what actions to take (such as sampling, routing or sleeping) to maximize their overall reward. Some actions consume more energy than others and thus to prevent premature exhaustion, nodes are allocated a certain energy budget. The authors showed how nodes could adjust their sampling rates based on the presence of a target in the area, and send this information to the end user. The problem with this approach is that the reward Vol. ??,

No. ??,

????

The automatic evolution of distributed controllers to configure sensor network operation17 parameters need to be tuned by the user, and they have a large impact on the behaviour of the nodes. In addition, nodes do not communicate with one another to disseminate state but act in isolation. Similar approaches have been presented in [21, 22]. Other adaptive approaches involve each node constructing a model of the environment in order to predict when to sample, based on prior information [23, 24]. Although these are decentralized methods, there is no distributed communication amongst neighbouring nodes. Game theoretic constructs have also been used to alter node behaviour in response to information, by treating nodes as players in a mathematical game [25]. Again, the difficulty is in writing these strategies, which typically requires an expert. Node behaviour can be controlled by executing a distributed optimization algorithm, such as is presented in [26]. In their work, nodes interact with one another in an auction to dynamically form clusters in order to react to a particular stimuli in an energy efficient manner. However, in order to achieve optimal performance, a large amount of parameter tuning is required. It can thus be seen that existing work for network configuration suffers from the need for “finetuning” of node parameters for particular application scenarios. 6.

CONCLUSIONS AND FUTURE DIRECTIONS

In this paper we have presented a general framework for evolving distributed wireless sensor network controllers, inspired by the way in which cells regulate their behaviour by exchanging proteins with each other. The key advantages of the proposed dGRN approach over existing techniques are that: 1) it generates adaptive code for parameter tuning in an automatic manner; 2) the manner in which nodes communicate with one another is an emergent property of the system and is not defined by human experts; 3) it provides a generic framework that can be applied to a variety of problems. The applicability of this approach to the problem of tuning the sampling rate in a target tracking application was illustrated. We showed that the dGRN framework automatically generates distributed code, arriving at node activation patterns similar to those that would be designed by human experts. Moreover, it was demonstrated that nodes are able to adapt their activation patterns in response to changes in target speed. These results indicate that dGRNs could be a promising new paradigm for automatically and adaptively configuring sensor network operation. Through our experiments, we identified a number of limitations of the dGRN approach, and we highlight some avenues for further exploration. The first is that evolving dGRN controllers, especially for more complex problems, can take a large amount of time. Brute force increases in computational power, such as using GPGPUs, will reduce the simulation time required The Computer Journal,

[27]. However, we are investigating ways to prune the controller search space to reduce the time taken to converge to good solutions. A simple technique is to parse controllers before they are executed. If, for example, they do not use the sensor protein, nor communicate, they can be penalized to remove them from the population. Another method is to refine good controllers using a different optimization algorithm. Secondly, the dGRN controllers have been configured to act as a static agent. However, dGRN controllers could also act as a mobile agents, with decisions (such as where to move and what actions to undertake) evolved rather than designed. Furthermore, as dGRNs are essentially black box modules, we also are investigating how multiple dGRNs can be connected together at different layers of the networking stack and what effect this interconnection has on their behaviour. Thirdly, an inherent drawback of this method is that it is an offline, centralized optimization algorithm. Although we have demonstrated that we can evolve controllers which are able to switch between multiple behaviours, the controllers themselves are static once deployed. We are conducting research into determining if it is possible to evolve controllers online, such as those presented in [19]. Lastly, we need to investigate how to cope with realworld network dynamics, such as node insertion and removal. We showed that dGRN controllers could be trained to operate with link failures. It would be beneficial for the dGRN controllers to adapt to varying network conditions. We believe that it would be possible to do so if knowledge about the losses was presented as an additional sensor protein. ACKNOWLEDGEMENT This work was supported by EPSRC by fund EP/E013678/1 (WildSensing). Many thanks to Sonia Waharte for her suggestions. We would also like to thank the anonymous reviewers for their comments, and the proposal of the wake-on-exit algorithm. REFERENCES [1] Jacob, F. and Monod, J. (1961) Genetic Regulatory Mechanisms In Synthesis Of Proteins. Journal of Molecular Biology, 3, 318 – 356. [2] Davidson, E. and Levin, M. (2005) Gene regulatory networks for development. Proceedings of the National Academy of Sciences of the United States of America, 102, 4936 – 4942. [3] Price, K. (1999) An introduction to differential evolution. In Corne, D., Dorigo, M., and Glover, F. (eds.), New Ideas in Optimization, pp. 79 – 108. McGraw Hill International, UK. [4] Sutton, R. S. and Barto, A. G. (1998) Reinforcement Learning: An introduction. The MIT Press, Cambridge, USA. [5] Haykin, S. (2008) Neural networks: a comprehensive foundation. Prentice Hall, New Jersey, USA.

Vol. ??,

No. ??,

????

18

Markham and Trigoni

[6] Wietrzyk, B. and Radenkovic, M. (2007). CRAWDAD data set nottingham/cattle (v. 2007-12-20). Downloaded from http://crawdad.cs.dartmouth.edu/nottingham/cattle. [7] Contiki operating system (available http://www.sics.se/contiki/ ). [8] de Jong, H. (2002) Modeling and simulation of genetic regulatory systems: a literature review. Journal of Computational Biology, 9, 67 – 103. [9] Thomas, R. (1991) Regulatory networks seen as asynchronous automata: a logical description. Journal of Theoretical Biology, 153, 1 – 23. [10] Stigler, B. (2006) Polynomial dynamical systems in systems biology. AMS Proceeding of Symposia in Applied Mathematics, pp. 59 – 84. [11] Das, S., Koduru, P., Cai, X., Welch, S., and Sarangan, V. (2008) The gene regulatory network: an application to optimal coverage in sensor networks. GECCO ’08, New York, USA, pp. 1461 – 1468. [12] Taylor, T. (2004) A genetic regulatory network-inspired real-time controller for a group of underwater robots. Eighth Conference on Intelligent Autonomous Systems (IAS-8), Amsterdam, The Netherlands, pp. 403 – 412. [13] Quick, T., Nehaniv, C., Dautenhahn, K., and Roberts, G. (2003) Evolving embodied genetic regulatory network-driven control systems. Seventh European Conference on Artificial Life (ECAL 2003), Dortmund, Germany. [14] Knabe, J. F., Nehaniv, C. L., Schilstra, M. J., and Quick, T. (2006) Evolving biological clocks using genetic regulatory networks. Artificial Life X Conference (Alife 10), Indiana, USA. [15] Jourdan, D. and de Weck, O. (2004) Layout optimization for a wireless sensor network using a multi-objective genetic algorithm. Vehicular Technology Conference (VTC), Los Angeles, USA. [16] Wang, Y. (2008) Wireless Sensor Networks and Applications. Springer US. [17] Badia, L., Botta, A., and Lenzini, L. (2009) A genetic approach to joint routing and link scheduling for wireless mesh networks. Ad Hoc Networks, 7, 654 – 664. [18] Ferentinos, K. P. and Tsiligiridis, T. A. (2007) Adaptive design optimization of wireless sensor networks using genetic algorithms. Computer Networks, 51, 1031 – 1051. [19] Boonma, P. and Suzuki, J. (2008) Exploring self-star properties in cognitive sensor networking. International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS), Edinburgh, UK. [20] Mainland, G., Parkes, D., and Welsh, M. (2005) Decentralized, adaptive resource allocation for sensor networks. Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation (NSDI), Berkeley, CA, USA, pp. 315 – 328. [21] Lim, H., Lam, V., Foo, M. C., and Zeng, Y. (2006) Adaptive distributed resource allocation in wireless sensor networks. Proc. of the 2nd International Conference on Mobile Ad-hoc and Sensor Networks (MSN 2006), Hong Kong, China.

The Computer Journal,

[22] Bian, D. G. R., F.; Kempe (2006) Utility-based sensor selection. Information Processing in Sensor Networks, IPSN 2006, Nashville, USA. [23] Kho, J., Rogers, A., and Jennings, N. (2007) Decentralised adaptive sampling of wireless sensor networks. Proc. First International Workshop on Agent Technology for Sensor Networks (ATSN 07), Honolulu, Hawaii, USA. [24] Jain, A. and Chang, E. (2004) Adaptive sampling for sensor networks. Proceeedings of the 1st international workshop on data management for sensor networks, Toronto, Canada, pp. 10 – 16. [25] Galstyan, A., Krishnamachari, B., and Lerman, K. (2004) Resource allocation and emergent coordination in wireless sensor networks. AAAI Workshop on Sensor Networks, California, USA. [26] Melodia, T., Pompili, D., Gungor, V. C., and Akyildiz, I. F. (2005) A distributed coordination framework for wireless sensor and actor networks. MobiHoc ’05: Proceedings of the 6th ACM international symposium on Mobile ad hoc networking and computing, New York, NY, USA, pp. 99–110. ACM. [27] Harding, S. and Banzhaf, W. (2007) Fast Genetic Programming on GPUs. Genetic Programming, 10th European Conference, EuroGP 2007, Valencia, Spain.

Vol. ??,

No. ??,

????

Suggest Documents