Self-organizing and Self-stabilizing Role Assignment in Sensor ...

12 downloads 0 Views 2MB Size Report
2 Communication and Operating Systems Group, Berlin University of Technology,. Einsteinufer ... oper to provide a solution that does not rely on manual intervention and that is able to adapt to ... Funded by Deutsche Forschungsgemeinschaft.
Self-organizing and Self-stabilizing Role Assignment in Sensor/Actuator Networks Torben Weis1, , Helge Parzyjegla2, , Michael A. Jaeger2, , and Gero M¨ uhl2 1

2

Application of Parallel and Distributed Systems, University Stuttgart, Universit¨ atsstraße 38, 70569 Stuttgart, Germany Communication and Operating Systems Group, Berlin University of Technology, Einsteinufer 17, 10587 Berlin, Germany [email protected], {parzyjegla, michael.jaeger, g muehl}@acm.org

Abstract. Developing distributed applications for sensor/actuator networks is challenging, particularly, with regard to unreliable nodes and communication links. Splitting applications into roles eases the development significantly but presumes a reliable role management that autonomously assigns roles to devices depending on their capabilities. In this paper, we present a self-organizing and self-stabilizing role assignment mechanism as an integral part of a light-weight, flexible middleware. The deployed algorithms tolerate the addition and removal of devices at runtime and are also able to recover from any transient fault. Considering that resources are restricted on many devices, we analyze the proposed algorithms with respect to communication overhead, memory usage, and stabilization time.

1

Introduction

The development of distributed applications is difficult, especially if the devices that run the code are resource constraint and error-prone. A role-based programming model can leverage a model-driven approach for application development in sensor/actuator networks (SA-nets) and provides advantages regarding fault tolerance and management. A typical scenario where SA-nets are used is the e-home. Programming applications for such an environment where no administrator is present and complex network structures can evolve requires the developer to provide a solution that does not rely on manual intervention and that is able to adapt to changes in its environment. These changes include, for example, adding and replacing a device. It is also essential that the application is able to get back to a correct working state (if possible) even if wireless links are disturbed or devices that run (parts of) the application crash unexpectedly. Thus, applications for SA-nets should be self-organizing as well as self-stabilizing. We developed an integrated stack of algorithms to provide a self-organizing and self-stabilizing substrate for the development of role-based applications for  

Funded by Deutsche Forschungsgemeinschaft. Funded by Deutsche Telekom Stiftung.

R. Meersman, Z. Tari et al. (Eds.): OTM 2006, LNCS 4276, pp. 1807–1824, 2006. c Springer-Verlag Berlin Heidelberg 2006 

1808

T. Weis et al.

SA-nets. It consists of three algorithms that feature a spanning tree, a publish/ subscribe infrastructure, and a role assignment mechanism. The algorithms are tightly integrated to reduce the overhead. For example, the publish/subscribe infrastructure uses the heartbeats of the spanning tree and the role activation relies on the publish/subscribe algorithm to discover device capabilities. The paper is structured as follows: in the next section we discuss in more detail the role concept and what self-organization and self-stabilization means. In Sect. 3 we present the algorithm stack and analyze it in Sect. 4. In Sect. 5 we discuss related work. Section 6 features conclusions and outlook.

2

Roles

Roles provide a powerful abstraction concept by specifying a certain behavior that can be adopted by different entities (e.g., nodes, components, or objects) and abandoned again. Furthermore, a particular entity can fulfill more than one role at a given time, while a particular role can also be performed by many entities. The concept of roles was first introduced by Bachmann and Daya [2] in data modeling and, thereafter, frequently adapted. Here, we use it to ease the development of flexible applications for SA-nets. Therefore, an application is split into several roles that collaborate to provide the desired service. A role consists of the necessary code and data structures—this might also comprise several components or objects—to fulfill a certain task needed by an application. Each role presumes a set of capabilities that describe the minimum requirements for a node to be able to serve this role (e.g., a set of sensors or actuators). Thereby, roles decouple the distributed application from the actual nodes it runs on since a role only addresses other roles and not a concrete node itself. A robust role assignment mechanism is an important precondition for this role-based development. It is responsible of first assigning all roles that are needed to nodes that are capable of serving them. Afterwards it has to monitor the assigne roles and reassign them if necessary in order to ensure that all applications can fulfill their function. Due to the dynamic and error-prone nature of SA-nets, it is important that the mechanism works as autonomous as possible, i.e., it automatically adapts to dynamic changes and recovers from transient faults without external intervention. The ability to dynamically reconfigure an application here is a main motivation for the introduction of the role-based programming abstraction. Achieving automatic adaptation in SA-net applications is hard since there is generally no fixed central computer which has global knowledge about available devices, software components, and network connectivity. Many devices in such a system are too weak to maintain global knowledge and the availability of a more powerful machine cannot be assumed. Therefore, every single device must make decisions based solely on its local knowledge such that the global result yields the expected system functionality. Organizing a system of independent entities based on local decisions is a key-property of self-organizing systems [10]. A traditional way of dealing with faults is trying to mask them, but obviously not every fault can be masked. If such a fault occurs, the system can get in an invalid state from which it may not recover without human intervention. In

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1809

Fig. 1. The role assignment algorithm stack

our application domain, this is not acceptable since the user is hardly able to administer this complex system: a crash can occur on any kind of appliance ranging from the coffee machine to the TV set. Self-stabilization can be applied to tackle this problem: a self-stabilizing system is guaranteed to recover from any transient fault after a bounded time [5]. However, this comes at a cost because the system can (in the general case) not detect whether it is currently in a legal state or not. It can only guarantee that an illegal state will be left after a fixed time. This is done by constantly pushing the system towards a legal state.

3

Role Assignment Algorithm Stack

In this section, we present a self-organizing and self-stabilizing role assignment mechanism that consists of three layers made up by an own algorithm each. Figure 1 gives an overview of our algorithm stack which presumes only a simple radio interface able to receive and broadcast messages. The purpose of the lowest layer is to structure the network. Therefore, it applies a spanning tree algorithm designed for wireless networks. Please note that it is possible to support networks with a fixed infrastructure, too. In this case, a different algorithm must be used in the stack to construct the spanning tree. The second layer comprises a lightweight publish/subscribe algorithm that provides flexible communication by decoupling publishers and subscribers, i.e., senders and receivers of messages. After nodes have subscribed for the roles they perform, messages can be published related to some role instead of being addressed directly to the node they run on. The publish/subscribe algorithm takes care that a message is forwarded to the correct receiver(s). Furthermore, this indirection also eases the migration of roles, since publishers are not required to know their subscribers anymore. Finally, the third algorithm layer manages the role assignment within the network. It monitors the routing tables of the underlying publish/subscribe layer to determine whether for each role there is a node properly subscribed. If this is not the case, a message is sent to a capable node to activate a missing role. The presented algorithm stack is tightly integrated by exploiting several crosslayer optimizations. This results in higher performance and less communication overhead. For example, the role activation directly inspects the publish/subscribe routing tables to discover devices, their capabilities and currently assigned roles.

1810

T. Weis et al.

The drawback is that one algorithm of the stack cannot easily be replaced by another one even if it exhibits the same functionality. Especially the role-activation algorithm relies on the underlying publish/subscribe algorithm. Furthermore, aspects of self-organization and self-stabilization must be considered on each layer anew and reconciled with the measures taken on other layers. In the following, we discuss each layer of the algorithm stack in more detail. 3.1

Spanning Tree Algorithm

The spanning tree algorithm of the first layer serves several functions. In wireless ad-hoc networks, it structures the network providing the basis for the hierarchical publish/subscribe algorithm which comprises the second layer. Therefore, it constructs the hierarchy which is used to disseminate messages within the network. The algorithm determines the node with the highest ID and chooses it as the tree’s root node. This fact is exploited by the role activation afterwards. The self-stabilizing spanning tree algorithm is an adaption of an algorithm published by Afek, Kutten, and Yung [1]. A pseudo code implementation is shown in Listing 1. The algorithm uses only two functions, which must be implemented by the firmware. Send() broadcasts a message and Receive() waits for a message or a timeout. Furthermore, we require a clock on each node, but the clocks must only be loosely synchronized. These primitives can be expected even on very tiny sensor boards. Whenever a new node wants to join the tree, it initializes the fields, that store the IDs of the root and the parent node, with its own ID. Afterwards, it waits to receive a TreeMessage from a neighbor node. TreeMessages are broadcasted periodically. They serve as heartbeats and allow to detect failed nodes and to refresh the tree’s state. They contain the ID of its sender, the ID of the root node as seen by the sender, and the sender’s distance to the root node measured in hops. If a TreeMessage contains a root node with a higher ID, the new node selects the sender as parent and updates its fields appropriately (cf. line 16). Furthermore, among all suitable parent nodes from which a heartbeat is received, one is chosen that provides the shortest path to the root (cf. line 19). Heartbeats sent by the parent node are forwarded by re-broadcasting an own TreeMessage. However, if the new node does not hear of another node having a higher ID, it finally times out and issues a broadcast claiming itself as new root node. Since its ID is then the highest within the network, its neighbors will accept it as the new tree’s root and propagate its claim. Interestingly, much the same thing happens, if a heartbeat times out on any other node. The node then assumes that an existing tree is broken or a network partition occurred. Therefore, it simply resets its fields, broadcasts a TreeMessage, and starts over again. In consequence, the root node repeatedly times out and resets itself, because no higher ID exists within the network. Nevertheless, no harm is done, since resetting its fields to its own ID is an idempotent operation in this case. Please notice the usage of a shorter timeout ( 1p · TIMEOUT) for the root node that forces it to initiate heartbeats with a higher frequency (cf. line 31). This allows any other node to miss a heartbeat without running into the risk of a timeout causing a

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1811

Listing 1. Spanning tree algorithm 1 2 3 4 5 6

// The node’s ID and predetermined timeout values as builtin constants. const uint ID; const long TIMEOUT; const int p ← 3, t ← 10; // Assumed root node, current parent node, and number of hops to the root. uint Root ← ID, Parent ← ID, Distance ← 0; // Timestamp of the last heartbeat received. long Timestamp ← GetCurrentTime();

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

void Main() { Message msg; while ( true ) { // Wait for an incoming message (or a timeout). if ( (msg ← Receive( TIMEOUT/t )) = null ) { if ( msg is TreeMessage ) { // Check, whether a new root or better parent is found. if ( msg.Root > Root ) { Root ← msg.Root; Parent ← msg.Sender; Distance ← msg.Distance+1; } else if ( msg.Root = Root ∧ msg.Distance+1 < Distance ) { Parent ← msg.Sender; Distance ← msg.Distance + 1; } // Process a received heartbeat. if ( msg.Root = Root ∧ msg.Sender = Parent ) { Timestamp ← GetCurrentTime(); OnHeartbeat(); Send( new TreeMessage( ID, BROADCAST, Root, Distance ) ); } } else if ( IsAddressedToNode( ID, msg.Receiver ) ) { OnProcessMessage( msg ); } } // Do a reset, if the heartbeat timed out. long timeout ← ( Root = ID ? TIMEOUT/p : TIMEOUT ) if ( ¬(0 ≤ GetCurrentTime() − Timestamp ≤ timeout) ) { Root ← ID; Parent ← ID; Distance ← 0; Timestamp ← GetCurrentTime(); OnHeartbeat(); Send( new TreeMessage( ID, BROADCAST, Root, Distance ) ); } } }

subsequent reset. This makes the constructed tree more robust with regard to unreliable communication links. However, as Receive() just blocks for at most 1 t ·TIMEOUT, it is guaranteed, that a failed parent node is detected at least after t+1 t · TIMEOUT, when the current time is compared to the timestamp of the last heartbeat received (cf. line 32). This also allows to test, whether the timestamp lies in the future (i.e., the difference to the current time is negative) in order to achieve self-stabilization even in case of arbitrary memory perturbations that possibly affect any stored value.

1812

T. Weis et al.

The other layers hook up with the spanning tree algorithm in two ways. First, the OnHeartbeat() method is called whenever a TreeMessage is sent. This enables the node to perform certain tasks periodically, e.g., to trigger the routing table propagation of the publish/subscribe algorithm. Second, the spanning tree provides the means for a hop-wise communication along its edges. Therefore, all messages contain a field with the ID of the intended receiver. To address all direct children in a tree, we set the highest bit of the receiver address. To simply broadcast a message, we set all bits of the whole address. This way, a node can send a message to its parent (ID = Parent), to its children (ID = 0x80000000), to its parent and its children (ID = Parent|0x80000000), or to all nodes within broadcast range (ID = 0xFFFFFFFF). The IsAddressedToNode() method tests whether a received message is intended for the current node according to the introduced addressing scheme. If successful, the message is propagated upwards in the algorithm stack via the OnProcessMessage() method. 3.2

Publish/Subscribe Algorithm

To send information across the network, we use a self-stabilizing hierarchical publish/subscribe algorithm [14]. It enables nodes to address messages to a certain role without knowing the device on which the role is currently executed. The publish/subscribe algorithm offers essentially three functions: – Subscribe(type): subscribes to messages of a given type usually belonging to a certain role. – Publish(type,message): publishes a message of a given type that is forwarded to all matching subscribers. – PublishSingle(type,message): sends a message of a given type to just one subscriber provided that at least one exists. Nodes interested in messages related to a certain role subscribe to a corresponding message type. Their subscriptions are forwarded along the edges of the tree towards the root. Along their way, they install the necessary routing entries to deliver matching publications on the reverse path back to interested subscribers. Therefore, each node has a routing table which stores its own subscriptions as well as subscriptions issued by its direct and indirect children. A routing entry contains the ID of the node the subscription was received from, the type of messages the subscriber is interested in, and an additional timestamp to detect stale entries. Figure 2 provides an example. Nodes are shown together with their IDs and routing tables. A routing entry of (80,A) means that node 80 has subscribed for messages of type A. We omitted the timestamp for reasons of clarity. Nodes can be direct and/or indirect subscribers. Direct subscribers (e.g., node 80) consume the messages they are subscribed for as indicated by a routing entry with their own ID. Instead, indirect subscribers (e.g., node 62) issue a subscription on behalf of an interested child in order to propagate this interest to the root. Whenever a published message is received, a node searches its routing table for matching subscriptions and forwards the publication to its children provided

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1813

Fig. 2. Routing tables

that at least one child is subscribed for this type of message. If the node itself is interested, the publication is also delivered locally to the subscribing role and consumed there. If the message comes from a child or is published on the node itself, it is additionally addressed to the parent node in order to reach subscribers in different subtrees, too. For example, a message of type A published on node 8 first bubbles up the tree towards the root, before it is disseminated in the left subtree down to its subscribers 80 and 42. When a message is published by the PublishSingle() method, the nodes behave different. Instead of broadcasting the publication to all interested children, only one is directly addressed until a node is reached that consumes it. This function is heavily exploited by the role activation algorithm. The timestamp in each routing entry is used to detect expired subscriptions. Subscriptions are only leased and must be renewed periodically. By applying leasing, the algorithm is able to recover from corrupted routing tables and to insert missing entries, and, thus, achieves self-stabilization. Since corrupted entries are not refreshed by subscribers anymore, they will finally time out and get removed. All three aspects, removing expired subscriptions, inserting missing ones, and refreshing the remaining ones, are handled by the OnHeartbeat() method as shown in Listing 2. First, the role activation algorithm is forced to renew its subscriptions by invoking RoleActivation(). Afterwards, each routing entry is inspected (cf. line 4). If it possesses a valid timestamp, the entry’s message type is added to a set s. Otherwise, the routing entry is removed. Finally, every node (except the root node) sends one Subscription message to its parent that contains all message types stored in s (cf. line 11). Only this way, new subscriptions bubble up the tree and existing ones get refreshed. When the parent node receives the Subscription message, it invokes OnProcessMessage(). Subscriptions are handled by updating the timestamp of the sender’s routing entry for each message type the subscription contains. If a corresponding routing entry does not exist yet, it is simply added. Additionally, OnProcessMessage() also handles publications. First, it checks, whether the node itself is interested. In this case, the message is delivered to the subscribing role by OnConsumeMessage(). If the message was

1814

T. Weis et al. Listing 2. Self-stabilizing publish/subscribe algorithm

1 2 3 4 5 6 7 8 9 10 11 12

void OnHeartbeat() { RoleActivation (); Set s ← new Set(); foreach ( Entry e ∈ RoutingTable ) { if ( 0 ≤ CurrentTime()−e.Timestamp ≤ TIMEOUT ) { s. add( e. Type ); } else { RoutingTable.remove( e ); } } if ( Root = ID ) { Send( new Subscription( ID, Parent, s ) ); } }

13 14 15 16 17 18 19 20 21 22 23 24

void OnProcessMessage( Message msg ) { if ( msg is Subscription ) { RoutingTable.Update( msg.Sender, msg.SubscribedTypes ); } else if ( msg is Publication ) { if ( RoutingTable.ContainsEntry( ID, msg.Type ) ) { OnConsumeMessage( msg ); if ( msg.IsPublishedSingle ) return; } [...] // Propagate the message to subscribed children } }

published by the PublishSingle() method, its propagation then ends. Otherwise, it is propagated according to the forwarding scheme already discussed above. We omit the particular code for reasons of brevity. 3.3

Role Activation Algorithm

The role activation algorithm is layered on top of the publish/subscribe algorithm. Since hierarchical routing is used, the root node knows whether a subscription for a given message type exists somewhere in the tree. However, it does not know who exactly subscribed to this message type as it only stores the next hop to the subscriber(s) in its routing table. This knowledge about existing subscriptions is exploited by the role activation algorithm. It enables the root node to decide if it is possible to assign a role and whether a role is currently active. The algorithm’s structure can be divided into different phases which we describe in the following. Phase 1. The devices analyze their capabilities and compare them with the capabilities required by a certain role. We assume that devices are not uniform: some might have special sensors, for example, others have special actuators. Furthermore, their memory, CPU, and energy resources can vary. Imagine an

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1815

application A with roles RA ⊆ R, while a device D can play roles RD ⊆ R. Depending on the mode of deployment different approaches are possible. Option one (the preferred one) is to execute phase 1 once upon deployment. The result, A = RD ∩ RA for every device D and application A is stored in noni.e., RD volatile memory (i.e., flash memory) of D. Thus, it cannot be altered by RAM A errors. The second option is to calculate RD upon startup and store it in RAM. A The drawback is that RD has to be recalculated periodically to achieve selfstabilization in the face of RAM errors. Phase 2. After having determined their capabilities, the devices signal which A roles they can activate. For every role r ∈ RD the device subscribes in our example to a message of type t = (A|r|possible). The GUID A uniquely identifies the application, r is encoded as an integer, and the flag possible indicates that the device is able to execute this role, but it has not yet activated it. The subscriptions are periodically renewed by the RoleActivation() function (cf. Listing 3), which is in turn invoked on every heartbeat (cf. function OnHeartbeat() in Listing 2). It is important that the subscription is refreshed directly before the publish/subscribe algorithm sends its subscriptions towards the root node to avoid running into a timeout which would signal an error. Phase 3. The root node waits until it has received at least one subscription of type (A|r|possible) for all roles r ∈ RA . If this is the case, then it is possible to find one device for each role. Subsequently, the root nodes proceeds to phase 4. Phase 4. Now, the root node assigns the roles to individual nodes. Therefore, it publishes a message of type (A|r|possible) for all roles r ∈ RA using PublishSingle(). The published message has a boolean payload: true means “activate this role”, while false means “deactivate this role”. Role deactivation is used if more than one node is activated for one role. This can happen due to duplicated or delayed messages or because of corrupted data structures and is part of the self-stabilizing mechanism of the role assignment algorithm. Phase 5. The recipient of the message (A|r|possible) will activate role r of application A. Therefore, the recipient subscribes to the message type (A|r|active). Notice that the node is now subscribed to two message types: (A|r|possible) and (A|r|active). If the root node sees a subscription for every r ∈ RA and (A|r|active) then all roles are correctly assigned and activated. Phase 6. This phase is the runtime loop of the application. To send a message to the node executing role r it suffices to simply publish a message of type (A|r|active). The publish/subscribe infrastructure will then forward it to the node executing role r. This is a key advantage of using the publish/subscribe infrastructure for role assignment. Figure 3 illustrates which subscriptions the algorithm issues and which messages are published. First, two nodes subscribe to role r1 (phase 2, message m1 ). If at least one subscription for every role is available (after phase 3), the root node publishes a message matching the subscription (phase 4, message m2 ). The

1816

T. Weis et al. Listing 3. Role activation algorithm

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

void RoleActivation () { foreach ( GUID A ∈ APPLICATIONS ) { foreach ( Role r ∈ RA ) { // Signal ability to execute role r. A if ( r ∈ RD ){ Subscribe( (A|r|possible) ); // Phase 2. // Signal activation of role r? if ( IsActive ( r ) ) { Subscribe( (A|r|active) ); // Phase 5. } } // Reset multiple activations of the same role, when detected. if ( | RoutingTable.GetSubscriptions ( A, r, active )| > 1 ) { Publish ( (A|r|possible), false ); } } // Role assignment is only performed by the root node. if ( ID = Root ) continue; // Check, if at least one device is available for every role. Phase 3. if ( RoutingTable.HasSubscriptions ( RA , possible ) ) { // (Re)assign roles, if they are not yet active. foreach ( Role r ∈ RA ) { if ( ¬RoutingTable.HasSubscriptions( A, r, active ) ) { PublishSingle ( (A|r|possible), true ); } } } else { // Deactivate whole application, if one of its roles cannot be reassigned. foreach ( Role r ∈ RA ) { if ( RoutingTable.HasSubscriptions ( A, r, active ) ) { Publish ( (A|r|possible), false ); } } } } }

30 31 32 33 34 35 36 37 38 39 40 41

void OnConsumeMessage( Message msg ) { foreach ( GUID A ∈ APPLICATIONS ) { A foreach ( Role r ∈ (RD ){ // (De)activate role r on receipt of a matching message. if ( msg.Type = (A|r|possible) ) { if ( (bool)msg.Data ) Activate(r); else Deactivate(r); } // Deliver the message’s content to an active role r. if ( msg.Type = (A|r|active) ) { OnHandleMessage( r, msg.Data ); } } } }

usage of PublishSingle() makes sure that only one of the two nodes activates role r1 . Finally, the node that consumed the published message activates role r1 (phase 5) by subscribing to (A|r1 |active) (message m3 ). Any message of the type (A|r1 |active) will now be forwarded to the node executing role r1 .

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1817

Fig. 3. Example message exchange in the role activation algorithm

Considering self-stabilization, the role activation is in a correct state, if every role needed for an application is assigned to exactly one node that has the capabilities to perform it. If no capable node exists for a particular role, the other roles must also stay deactivated to prevent partially running applications to behave arbitrarily. However, due to faulty messages or memory perturbations, nodes may start or stop executing a role without a correct (de)activation by the root. Nodes also fail and stop working, when they run out of energy. When a role r of application A gets deactivated because of any reason, its subscription (A|r|active) is not refreshed anymore and will vanish after a constant time due to the self-stabilizing publish/subscribe layer. Eventually, the root node detects the missing subscription and returns to phase 4 to reassign role r. If a subscription (A|r|possible) is also missing, no node is left that is able to perform r. The root node then falls back to phase 3 and stops the whole application. Therefore, a deactivation message is sent for all remaining active roles of the respective application. Afterwards, a legal state is reached again. The case of an unintended role activation is treated differently. If two devices activate the same role r, they both subscribe to (A|r|active), which is detected by their first common ancestor in the tree, when two different subscriptions for the same role arrive. Thereafter, the detecting node causes a reset by publishing a deactivation message telling every node to unsubscribe for this role (cf. line 13). Eventually, the root node detects the missing role and reassigns it as described in the first case.

4

Analysis

In the following, we analyze the three algorithms presented in the previous section. We analyze the worst case message complexity (i.e., the number of calls to Send()) and determine the stabilization time Δ, i.e., the time it takes for a system to reach a legal state starting from an arbitrary one. Therefore, let n be the number of nodes within the network and d its diameter in network √ hops. Assuming a uniform, planar distribution of nodes, d is in the size of O( n). Regarding the stabilization time, we introduce π as the refresh period a heartbeat message is sent (( 1p + 1t ) · TIMEOUT in Listing 1) and τ as the maximum timeout

1818

T. Weis et al.

after which stale routing entries are removed ( t+1 t · TIMEOUT in Listing 1). For the message delay δ between neighboring nodes, we require that δ  π holds to prevent that heartbeat messages congest the radio channel. Considering the role assignment, |R| denotes the number of possible roles within the system. 4.1

Spanning Tree Algorithm

Whenever a node broadcasts a TreeMessage claiming itself as root, the message is only accepted and forwarded by nodes with a smaller ID provided that these do not already know a better candidate. Nodes having a higher ID always ignore the claim. Therefore, a TreeMessage containing the ID = x is forwarded by at most x − 1 smaller nodes. Hence, the worst case regarding the message complexity occurs as depicted in Fig. 4, if all nodes in ascending order time out and issue a TreeMessage, that is forwarded by the smaller ones. Thus, we encounter n · (n − 1)/2 messages within the system, which is O(n2 ).

Fig. 4. Worst-case scenario for the self-stabilizing tree algorithm

Concerning self-stabilization, we have just to wait until the highest node times out after τ and issues its TreeMessage. Then, it takes no more than d · δ to reach every node within the network. Afterwards, it is guaranteed that all nodes know the root which has the highest ID and that every node has selected a parent node closest to the root. Hence, for the stabilization time Δtree of the spanning tree algorithm the following holds: Δtree ≤ τ + d · δ = O(τ ).

(1)

The memory consumption is constant because every node only stores its children, its parent, the last timestamp, and the root ID plus the distance from the root. Load Distribution. The spanning tree that results from the algorithm is a shortest path tree since every node chooses the node as its parent that is closest to the root. This reduces the average number of messages required by the publish/subscribe algorithm. However, the number of children is not evenly distributed among the nodes. To illustrate this effect, we simulated the algorithm with a large number of nodes as shown in Fig. 5. The “rings” that emerge here have a distance equal to the maximum range of the radio rmax . Due to the shortest path property, nodes are more likely to have many children if their distance to the root is a multiple of rmax . In conjunction with our publish/subscribe algorithm this can cause an inhomogeneous energy consumption among the nodes,

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1819

Fig. 5. Simulation of the spanning tree algorithm

since nodes with many children will on average spend more energy on forwarding messages and maintaining routing tables than nodes with less children. However, this problem can be fixed. In the original version of the algorithm, a child always selects the node with the shortest distance to the root node as parent. Instead of broadcasting the real distance in hops to the root, potential parent nodes in an improved version divide their real distance by the percentage of energy left. Thus, the lower the energy reserves are, the less interesting is the node for potential children. The drawback of this solution is that the resulting tree will not necessarily have the shortest path property. Therefore, the overall energy consumption might increase. 4.2

Publish/Subscribe Algorithm

The periodic update of subscriptions requires in the worst case n − 1 messages for n nodes. The leaf nodes send their subscriptions to their parent which in turn forwards the aggregated subscriptions to its parent and so on. Thus, we send one message per edge of the tree. The worst case message complexity in forwarding a publication arises in the case where every leaf is subscribed to a filter matching the publication. An invocation of Publish() then results in at most n−1 messages as the publication has to be sent to every node. An invocation of PublishSingle() causes at most 2 · h messages if the sender and the receiver are both leafs on the lowest level of the tree, where h ≤ d is the height of the tree. A call to Subscribe() causes at most h messages. Stale subscriptions will be removed after h · (τ + δ) since the propagation of a refresh takes at most time δ and it takes at most time τ until a node removes

1820

T. Weis et al.

the entry if it has not been refreshed meanwhile. Interestingly, new subscriptions need at most h · π + 2 · δ to finally arrive at the root. With every heartbeat, they bubble up one level in the tree towards the root. Additionally, the last heartbeat needs δ to reach a node next to the root, which then responds by refreshing its subscriptions that takes another δ. Alternatively, we could forward subscriptions immediately as discussed earlier [14]. This would decrease the time until the subscription becomes active on the cost of h additional messages for every issued subscription. In total, the publish/subscribe algorithm has a stabilization time Δp/s of Δp/s ≤ max{h · π + 2 · δ, h · (τ + δ)} = O(h · τ ).

(2)

The memory consumption is limited by the size of the routing table which depends on the maximum degree D of a node and is, thus, not greater than D · |R|, since each child node can issue a subscription for every possible role. This upper bound in the size of the routing table is important because many nodes will have restricted memory resources in practice. 4.3

Role Activation Algorithm

A node that subscribes to (A|r|possible) causes at most h messages. The actual role assignment in phase 4 (which is the invocation of PublishSingle()) results in at most h messages for one role and, thus, in at most |R| · h messages for all roles together. Since it then takes at most time h · π + 2 · δ until all subscriptions (A|r|active) for active roles arrive at the root (which again takes at most h messages per role). Meanwhile the root node times out at most h times, each time calling PublishSingle() again. Hence, the total message complexity of phase 4 and 5 results in at most |R| · h2 + |R| · h messages, which in fact is O(|R| · h2 ). The messages needed for subscriptions of (A|r|possible) are already counted in the analysis of the periodic costs of the publish/subscribe algorithm. The role activation algorithm monitors the correct assignment of all required roles and will detect errors, if the publish/subscribe routing tables are stable. It then takes no more than h · δ (i.e., one call to PublishSingle()) to assign and activate a role, i.e., to make a node subscribe accordingly. These subscriptions need an additional time of at most h · π + 2 · δ to reach the root. Thus, the stabilization time of the role activation algorithm sums up to: Δra ≤ h · δ + h · π + 2 · δ = O(h · π).

(3)

The additional memory consumed by the role activation algorithm is limited by the number of possible roles (O(|R|)) provided that at least a flag has to be set indicating that a role is active. 4.4

Role Assignment Algorithm Stack

For layered self-stabilizing algorithms the overall worst case stabilization time is not greater than the sum of the stabilization times of the individual algorithms [5]. The stabilization time Δ of our algorithm stack is thus:

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1821

Δ≤

Δtree + Δp/s + Δra

(4)

= =

O(τ ) + O(h · τ ) + O(h · π) O(h · τ ).

(5) (6)

The stabilization time Δ of the whole algorithm stack is bounded√by the timeout period τ and the shape of the tree. With h ≤ d and d = O( n), we finally get √ Δ = O( n · τ ). (7) If no fault occurs, the algorithm stack produces 2 · (n − 1) messages per refresh period τ : n − 1 messages for the heartbeat mechanism and another n − 1 messages for subscription propagation. As long as all required roles are active and the routing tables are consistent, the role assignment stack does not send additional messages. The overall memory consumption of the three layers consisting of the spanning tree, the publish/subscribe, and the role activation algorithm sums up to O(D + |R| + D · |R|) = O(D · |R|). It is thus linear in the number of possible roles and does not depend on the actual size of the network. This is ideal for small devices since they cannot store a global view of all available nodes and their capabilities.

5

Related Work

Self-organization in sensor networks has been an active research field in the past. Algorithms for several specific problems have been presented, ranging from routing and election [18], over data aggregation [19], coverage [17], and lookup services [9], to backbone infrastructures [12] built by using a fixed set of roles. In [6,15] the authors present an algorithm for generic role assignment in sensor networks. However, their aim is different from ours, as their algorithm is not self-stabilizing and intended for another scenario. They use role assignment to determine cluster heads, or in-network aggregation nodes. For example, they consider clustering to be a special discipline of role assignment. Furthermore, they have a rule-based language to define which properties a node must fulfill to be assigned a certain role. A cluster head must, for example, not have another cluster head in a distance of two hops. This is a very powerful approach, but the authors do not prove that the algorithm always terminates, especially in the face of cyclic dependencies. These can occur, for example, if node A states that it becomes a cluster head if node B is not a cluster head and vice versa. The authors counter this using heuristics and evaluation cycles. In [7] they show how to map a role assignment specification for a given network to an integer linear program formulation in order to better analyze its properties at design time. The TinyCubus [13] project uses application-specific role assignment for deployment. When sending code updates across the sensor network, they determine which sensor could execute which role. Their code distribution algorithm leverages this knowledge to route code updates only through the set of nodes that really need it, i.e., that execute the matching role. However, they do not provide a concrete role assignment algorithm.

1822

T. Weis et al.

Two projects have investigated self-stabilizing publish/subscribe systems. In [14] an efficient algorithm is presented that builds on a second-chance algorithm comparable to the approach presented in this paper. However, this approach forwards subscriptions immediately causing much more network traffic. On the other hand, the stabilization time is much lower, i.e., in the order of h · δ + 2 · τ . Another project uses a probabilistic approach where the nodes exchange lossy compressed routing tables to fix inconsistencies [16]. However, this leads only to probabilistic self-stabilization which is weaker than the approach we have taken. Many different algorithms for self-stabilizing spanning trees have been proposed. A decent overview and comparison has been published by G¨ artner [8]. We rely on the publish/subscribe communication paradigm to disseminate data in the network. Other approaches have been proposed, too, most notably directed diffusion [11]. However, to the best of our knowledge no role assignment algorithm based on directed diffusion has been devised yet. The aim of our work is to provide algorithms and tools for self-organizing and self-stabilizing pervasive applications. BASE [4] and PCOM [3] represent a component-based approach to this problem domain. Devices such as PDAs and other appliances build an ad-hoc network and a configuration algorithm tries to compose an application using the components available on the devices. Their approach has three drawbacks. First, it is not self-stabilizing. Second, the configuration algorithm is very expensive since it must solve a constraint satisfaction problem. Finally, the dependencies between the components must build a tree. Our approach does not have these limitations and is more efficient.

6

Conclusion and Outlook

We presented an algorithm stack for self-organizing role assignment in actuator and sensor networks. The contribution of this paper is threefold: First, the entire algorithm stack is self-stabilizing. We can recover from any transient faults including temporary network and memory faults. Furthermore, the algorithms can adapt to a changing network topology and they can handle adding and removing of nodes at runtime. Second, our role assignment algorithm builds directly on a data dissemination algorithm, i.e., a publish/subscribe system. When roles are assigned, every node can immediately send messages to the executors of the roles via the publish/subscribe system. Thus, role assignment and routing to the respective nodes are solved together. Finally, the memory usage of our algorithms is linear in the number of roles and it is independent of the size of the network, which is especially important for resource restricted devices. In the future we want to optimize the placement of roles in the tree. If we knew that two roles are tightly coupled, it would be beneficial to locate them close to each other in the tree. We are currently investigating how we could exploit the information stored in the routing tables to optimize role placement. Furthermore, we are also extending our algorithm stack. On top of the spanning tree algorithm we are building a clustering algorithm that operates just by snooping the messages exchanged by the spanning tree algorithm. Furthermore,

Self-organizing and Self-stabilizing Role Assignment in SA Networks

1823

we are currently evaluating election and process monitoring algorithms. Both are self-stabilizing and utilize the routing table of the publish/subscribe algorithms. The process monitoring is required to ensure that each application is executing exactly once in a network. Otherwise it could happen that all roles are assigned, but the application is not active. Furthermore, due to temporary network partitions it could happen that several instances of one application run concurrently on a network. Currently, developers have to manually define the roles and implement dedicated functionality for every role. Our long term goal is to provide development tools that automatically determine roles through program analysis. Thus, developers will be able to implement self-organizing applications without having to deal with self-organization themselves. This will be supported by special development tools and our algorithm stack.

References 1. Y. Afek, S. Kutten, and M. Yung. Memory-efficient self stabilizing protocols for general networks. In J. van Leeuwen and N. Santoro, editors, WDAG, volume 486 of Lecture Notes in Computer Science, pages 15–28, New York, NY, USA, 1991. Springer. 2. C. W. Bachman and M. Daya. The role concept in data models. In Proceedings of the Third International Conference on Very Large Data Bases, pages 464–476, Tokyo, Japan, Oct. 1977. IEEE Computer Society. 3. C. Becker, M. Handte, G. Schiele, and K. Rothermel. PCOM - A Component System for Pervasive Computing. In Proceedings of the Second IEEE International Conference on Pervasive Computing and Communications, pages 67–77. IEEE Computer Society, Mar. 2004. 4. C. Becker, G. Schiele, H. Gubbels, and K. Rothermel. BASE - A Micro-brokerbased Middleware For Pervasive Computing. In Proceedings of the First IEEE International Conference on Pervasive Computing and Communication (PerCom), March 23-26, Fort Worth, USA, pages 443–451. Los Alamitos: IEEE Computer Society, Mar. 2003. ISBN 0-7695-1895. 5. S. Dolev. Self-Stabilization. MIT Press, 2000. 6. C. Frank and K. R¨ omer. Algorithms for generic role assignment in wireless sensor networks. In Proceedings of the 3rd ACM Conference on Embedded Networked Sensor Systems (SenSys), pages 230–240, San Diego, CA, USA, Nov. 2005. 7. C. Frank and K. R¨ omer. Solving generic role assignment exactly. In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), Rhodes, Greece, apr 2006. 8. F. C. G¨ artner. A survey of self-stabilizing spanning-tree construction algorithms. Technical Report 200338, Swiss Federal Institute of Technology (EPFL), School of Computer and Communication Sciences, June 2003. 9. K. Herrmann, G. M¨ uhl, and M. A. Jaeger. A self-organizing lookup service for dynamic ambient services. In 25th International Conference on Distributed Computing Systems (ICDCS 2005), pages 707–716, Columbus, Ohio, USA, June 2005. IEEE Press. 10. K. Herrmann, M. Werner, and G. M¨ uhl. A methodology for classifying selforganizing software systems. In International Conference on Self-Organization and Autonomous Systems in Computing and Communications (SOAS’2006), Sept. 2006. (accepted for publication).

1824

T. Weis et al.

11. C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva. Directed diffusion for wireless sensor networking. ACM/IEEE Transactions on Networking, 11(1):2–16, Feb. 2002. 12. M. Kochhal, L. Schwiebert, and S. Gupta. Role-based hierarchical self organization for wireless ad hoc sensor networks. In Proceedings of the 2nd ACM International Workshop on Wireless Sensor Networks and Applications (WSNA’03), pages 98– 107, New York, NY, USA, 2003. ACM Press. 13. P. J. Marr´ on, A. Lachenmann, D. Minder, J. H¨ ahner, R. Sauter, and K. Rothermel. TinyCubus: A Flexible and Adaptive Framework for Sensor Networks. In E. Cayirci, S. Baydere, and P. Havinga, editors, Proceedings of the Second European Workshop on Wireless Sensor Networks (EWSN 2005), pages 278–289. Istanbul, Turkey: IEEE, Jan. 2005. ISBN 0-7803-8801-1. 14. G. M¨ uhl, M. A. Jaeger, K. Herrmann, T. Weis, L. Fiege, and A. Ulbrich. Selfstabilizing publish/subscribe systems: Algorithms and evaluation. In J. C. Cunha and P. D. Medeiros, editors, Proceedings of the 11th European Conference on Parallel Processing (Euro-Par 2005), volume 3648 of Lecture Notes in Computer Science (LNCS), pages 664–674, Lisboa, Portugal, Aug. 2005. Springer. 15. K. R¨ omer, C. Frank, P. Marr´ on, and C. Becker. Generic Role Assignment for Wireless Sensor Networks. In Proceedings of the 11th ACM SIGOPS European Workshop, pages 7–12. Leuven, Belgium: self-publisher, Sept. 2004. 16. Z. Shen and S. Tirthapura. Self-stabilizing routing in publish-subscribe systems. In 3rd International Workshop on Distributed Event-Based Systems (DEBS’04), pages 92–97, Edinburgh, Scotland, UK, May 2004. IEE. 17. S. Slijepcevic and M. Potkonjak. Power efficient organization of wireless sensor networks. In Communications, 2001. ICC 2001. IEEE International Conference on, volume 2, pages 472–476, Helsinki, Finland, 2001. 18. K. Sohrabi, J. Gao, V. Ailawadhi, and G. J. Pottie. Protocols for self-organization of a wireless sensor network. IEEE [see also IEEE Wireless Communications] Personal Communications, 7(5):16–27, 2000. 19. L. Subramanian and R. H. Katz. An architecture for building self-configurable systems. In Mobile and Ad Hoc Networking and Computing, 2000. MobiHOC. 2000 First Annual Workshop on, pages 63–73, Boston, MA, USA, 2000.

Suggest Documents