Long-term Reliable Data Gathering Using Wireless ... - Semantic Scholar

Long-term Reliable Data Gathering Using Wireless Sensor Networks Volker Turau and Christoph Weyer Hamburg University of Technology, Institute of Telematics Schwarzenbergstrasse 95, 21073 Hamburg, Germany Email: [email protected]

Abstract— This paper presents the design of a long-term reliable data gathering service for many-to-one communication in wireless sensor networks: all nodes send periodically delaytolerant data to a single node. The service operates reliably despite strict resource constraints, poor link qualities, and frequent disconnects. This paper presents a novel protocol for gathering periodically measured data including a solution for the wear-leveling problem of the EEPROM. Furthermore, theoretical limits of this approach based on available bandwidth and local memory are provided. Finally, preliminary results of an implementation of the service are presented. In particular, a detailed analysis of the energy consumption during the different phases of this experiment is provided. Keywords— Wireless sensor networks, reliability, data gathering, TDMA.

I. I NTRODUCTION Some applications of wireless sensor networks require that all nodes distributed over a geographical region measure values in regular time intervals. All measured values must be transported through the network to a central location called sink. The objective is that the sink receives all sensor values, i. e., a reliable service is desired. Another objective is that this service must be provided over long periods of time without replacing batteries. There is no hard deadline for the arrival of the data at the sink, i. e., we consider delay tolerant networks. Despite a considerable amount of research on sensor networking, the problem of reliable transport is yet to be efficiently studied and addressed. A reliable service is needed for example if the data from all sensors are used to build a model, where only the entirety of the data allows a correct interpretation. This is the case in structure analysis such as monitoring the effects of forces on buildings or waves on coastline fortifications. Any loss of data items requires extrapolation reducing the quality of the analysis or even rendering the analysis impossible. Reliability in wireless networks in harsh environments is a challenging task for several independent reasons. The first problem is rooted in wireless communication: high loss rates, asymmetric links, weak correlation between quality and distance, hidden terminal problems, and dynamic changes of connectivity. The other main problem comes from the constrained resources of sensor nodes: limited energy supply, small computational power and memory space, narrow communication bandwidth. Consequently, nodes cannot buffer

large amounts of data nor implement complex compression algorithms. Unreliableness in sensor networks can have several reasons: malfunction of sensors, errors in measurements, data corruption, and loss of data during transport. The focus of this work is on errors rooted in the communication layer. Well-known techniques avoiding data corruption from the wired world are not considered here. Instead, this work concentrates on the provision of a reliable data transport. This paper presents the design and preliminary evaluation of a reliable data gathering service of periodic data in the face of poor link quality and frequent disconnects. The service is based on a packet-level, hop-by-hop routing protocol which buffers data using persistent storage provided by the nodes. An upper limit for the sampling rate that can be handled reliably by a given set of resources is provided. The organization of the paper as follows: Section II reviews related work and Section III states the goals and explains the assumptions underlying this work. Section IV contains the core of our work: statements about limits of sampling rates for reliable transport over a routing tree and the protocol. In Section V preliminary results of an implementation of the service including an analysis of energy consumption are presented. II. R ELATED W ORK Reliability in wireless sensor networks can be considered for several different communication patterns: one-to-one, oneto-many, and many-to-one. The first problem is addressed by Kim et al. by considering link-level retransmissions and erasure codes [1]. They observe that link level retransmissions are very important, but their experiments show that even with 5 retransmissions some data is still lost. One reason is that a link may have gone down, but until this information reaches the routing layer, packets are lost. This requires some costly measures such as holding packets in buffers for extended periods, or backtracking packets in the reverse path. They propose the usage of erasure codes, which allow to some extent to reconstruct lost packets at the cost of higher bandwidth requirements. Wan et al. propose a hop-by-hop NACK transport protocol called PSFQ as a solution to the one-to-many problem [2]. PSFQ distributes the data from a source node by transmitting

data at a slow speed, but allowing nodes that experience losses to recover missing data aggressively. RMST is also a hop-by-hop selective NACK transport protocol that includes link layer recovery via automatic repeat requests [3]. The link layer implements a stop-and-wait protocol using explicit acknowledgments for each packet sent. Both protocols work well in systems with high connectivity and low error link quality, otherwise packets will be lost. Another solution for the one-to-many problem is contained in [4]. ESRT addresses the many-to-one problem for reliable event detection using minimum energy expenditure without intermediate caching requirements [5]. ESRT does not provide a guaranteed end-to-end data delivery service. DTNLite and ESS are architectures for the many-to-one problem providing packet-level, hop-by-hop reliability for delay-tolerant data using sequential storage for buffering during long queue delays [6], [7]. Our work is based on different assumptions in considering the transport of periodically taken data samples to a sink. Hence, we are concerned with single packet messages and thus, fragmentation of storage space is not an issue. Furthermore, the data rate is fixed and known in advance. Finally, data transport is the main task of our network. A survey on protocols for reliable data transport in wireless sensor networks is given by Willig and Karl [8]. III. T HE G OALS The bounded memory of the nodes of a wireless sensor network does not allow to store large amounts of data locally. Therefore, the data measured by the nodes must be routed to an external storage system. This makes the data available for interpretation without any manual activity. The goal of this work is to provide a long-term reliable transport of periodic sensor measurements from all nodes to a sink. For each measurement received by the sink, the time the value was measured and the identifier of the source node must be available. In a network with bounded resources the loss of data cannot be precluded completely. Therefore, the algorithm must fulfill the following requirements: (A)

(B) (C)

In case not all data can be transported to the sink, the sink must be able to determine which data is missing. The system should attempt to balance the loss of data equally over all nodes. The operation time for a given energy budget should be maximized.

protocol (TDMA). Time is divided into rounds of constant size Tr and each round is divided into ns ≤ n + 1 time slots, all but one of length size Ts . There is one time slot assigned to each node, during which the node is allowed to transmit data. If several nodes share the same time slot, they can transmit concurrently without interference. There is a special time slot at the beginning of every round of length Tp ≥ Ts . This slot is not used for transmitting data, but to perform data processing. It is assumed that all data processing necessary to execute the algorithm can be performed during this time slot and that all other slots are fully available for data transmission. To simplify the exposition, we assume Tp = Ts . The establishment of a TDMA schedule is outside the scope of this paper. There are several algorithms available to set up and maintain a TDMA scheme [9]. Each node is equipped with a battery and set of sensors. Each sensor is assumed to record samples of the sensed field with a fixed sampling rate, scalar quantize and encode the data. Because a sensor value is known only at its own location, this process must be done independently at each sensor location. Without loss of generality, we assume that all sensors take samples with the same frequency. All samples have constant size. It is assumed that received packets are not corrupted (e. g., by using error correcting codes). When the sink has received the value corresponding to a particular sampling time from each node, it can reconstruct a snapshot for that instant of time. This paper does not consider the latency between the time when the values were taken by the sensors and the time the sink received the data needed for its reconstruction. This work does not discuss data compression; this is regarded as a useful concept but largely orthogonal to the presented approach. It is straightforward to combine runlength compression with the presented algorithm. A protocol for data gathering that adapts opportunistically to changes in its environment is proposed in [10]. Each node compresses its gathered data locally, transmitting a burst of data when communication conditions are good. It is not known whether the concept of distributed compression as proposed in [11] can be used for reliable data transport, as the loss of one packet may render many other packets unusable. See [12] for a discussion of compression in the many-to-one transport problem. A transport scheme that allows nodes to predict data from other nodes is presented in [13]; the scheme is used to reduce communication in a reliable data gathering service. IV. L ONG -T ERM R ELIABLE DATA G ATHERING

A. Assumptions This work considers a wireless sensor network with n nodes. The following infrastructure is presumed. Each node has a unique identifier. The link layer provides a neighborhood protocol, i. e., each node knows the set of nodes that it can communicate with using bidirectional links. The qualities of the links are available to the nodes. All nodes are assumed to share the same frequency band, and access to the wireless medium is controlled by a time division multiple access

This section describes our main contribution, the reliable transport protocol, and derives theoretical limits for reliable transport of periodical data. A. Packets To have a homogeneous memory layout and to simplify the code, all data packets have the same size. Thus, each packet contains a constant number np of sensor readings.

B. Memory Layout To achieve a reliable service, each node stores all received packets and its own measurements directly in the EEPROM as opposed to RAM. This way the data survives a temporary outage of the node. This is also useful for example in case a node is reset. The storage set aside for packets is organized in a ring buffer, called packet buffer, consisting of frames. Each frame can store a single data packet and some meta data (e. g., a flag indicating whether the packet has been successfully sent). Pointers to the head and the tail of the packet buffer are stored in EEPROM (see Section IV-I for details). New measurement values are written directly into a frame in the packet buffer. A separate pointer stored in EEPROM always points to the frame currently used for new measurements. When this frame is full, a new frame from the ring buffer is allocated for the next set of samples and the full frame is copied into the frame at the head of the ring buffer. After this step, there is no difference between local and foreign packets. If no free frame is available, the node stops taking measurements. This simplifies the handling of the packet buffer. Writing data into persistent storage takes more time and requires more energy compared with RAM accesses, but it provides the required reliability. C. Sampling Rate To guarantee a reliable data gathering process the sampling rate of the sensors cannot be arbitrarily high. The sampling rate is limited by several aspects of the system, foremost the bandwidth of the transceiver and the available memory for buffering. Let s be the sampling rate measured in bits/s. The maximum throughput is achieved when the sink is 100% busy. Hence, the sampling rate must be less than C/n (see also [14]). Taking the maximal channel utilization into account, then the following inequality holds: γC (2) s≤ n

10 9 8

current [mA]

7 6 5 4 3 2 1

14ms 112.3ms

0 0

10

20

30

40

50 60 70 time [ms]

80

90

100 110

80

90

100 110

10 9 8 7 current [mA]

Packets contain samples of a single node only. Each packet also contains some meta-data, e. g., a time stamp and the identifier of the sampling node (in the application header). This data enables the sink to associate with each measurement a time stamp and the source node. Hence, requirement (A) is fulfilled. During each time slot a single packet is sent. The length of Ts is chosen, such that it is just long enough to send a packet and to receive an acknowledgment from the receiver (plus additional time for packet handling, communication delay and a time window to allow for synchronization errors). From the optimal total packet size Ptot of the transceivers the actual payload Pp of a packet can be derived (allowing error correction and packet specific headers). If a single sample of all sensors takes sb bits, then Pp = sb np . Let C be the bandwidth of the transceiver and let γ ≤ 1 be the channel utilization. Let Ta be the time required to send an acknowledgment, then the following constraint for Ts can be derived: Ptot + Ta Ts > (1) γC

6 5 4 3 2 1

56ms 112.3ms

0 0

10

20

30

40

50

60

70

time [ms]

Fig. 1. Energy consumption of an ESB node during a single slot of the TDMA scheme in mA. A node sends a single packet (bottom figure) that is acknowledged by the receiving node that responds with an acknowledgment packet (top figure). Table I summarizes the configuration of the node, the measurement was made with a transmission power of 80% of the maximal power.

This calculation does not consider the characteristics of the TDMA scheme. A single round has the duration Tr = ns Ts . In the worst case, a node needs to be able to send the amount of data sampled by all n sensors during a round in a single time slot, i. e., in a single packet. Hence s Tr n ≤ Pp , this implies the following upper bound for the sampling rate: Pp (3) Tr n This is a very rough upper bound; it does not consider issues such as the time needed for acknowledgments. Assuming θPp ≤ Ptot with θ > 1 this leads to a sharper bound for s than Equation (2): s≤

s≤

γC γC Ts = θTr n θns n

(4)

In a single round a node v can receive a packet from each of its vs direct successors in the routing tree, hence at most vs Pp bits. The node must temporarily store this packet. Let M be the storage available for this purpose (in bits), then vs Pp ≤ M . If a node is not able to transfer data in λ successive rounds

Waiting for Tree Construction

(5)

where ∆ is the maximum number of direct successors of a node in the routing tree. The value for λ depends on the algorithm, in our case it depends on the number of retransmissions (due to lost packets or lost acknowledgments) and the number of secondary parents (see Section IV-F for details).

sy

nc

me

ss

ag

e

Waiting for Synchronization

Tree Constructing ge

sa

connected

nc

s me

sy

Data Forwarding

tree timeout

no

invitation

failed

M s≤ λ∆ Tr n

synchronization timeout

it needs to buffer the data received during this period. Using Equation (3) this implies another constraint on the sample rate

D. TDMA In order to use a TDMA scheme it is necessary to fix the length of a single slot. During the span of a single slot a node needs to send a packet and the receiver needs to send an acknowledgment that the sender must receive. To avoid unnecessary power consumption, a slot should be just long enough to fulfill this task. Nodes do not have to perform complex computations during the time of a normal slot, since a sperate processing slot exists. On the other hand, the slot must be long enough to tolerate small deviations of the clocks. To choose an appropriate slot length the energy expenditure of the sender and the receiver for the standard packet size needs to be monitored precisely. The following considerations are valid for the hardware used in the real implementation, see section V for details. Internally the node measures time in ticks, 1024 ticks correspond to one second. Figure 1 depicts such a measurement for a slot length of 100 ticks (e. g., 97.6 ms). When a node has turned off its transceiver the power consumption is 1.46 mA, with the AD converter switched off the consumption is even further lowered to 0.38 mA. To accommodate for the turn on time of the receiver, a node turns on its receiver 15 ticks before the start of a slot. The lower figure shows the sender. When the node turns on its receiver, energy consumption jumps to (6.31 mA). The pure sending time for the packet is 56 ms. Then the node waits for the acknowledgment. After the receiver has received the packet it immediately sends an acknowledgment (depicted at the top of Figure 1). This takes another 14 ms. This analysis shows that for the given configuration the slot length should be at least 70 ms. In Figure 1, the transceiver of the nodes have turned on for a slot length of 112.3 ms. E. Overall Procedure The overall procedure consists of five phases and is timetriggered (see Figure 2). At the beginning all nodes are waiting for the tree construction phase to begin. A time 0 the tree construction begins at all nodes. Each node sets two timers • •

a tree construction timer and a synchronization timer.

The sink initiates this phase by broadcasting an invitation message. At the end of this phase a routing tree has been set up, i.e., each node knows the identifier of its parent and

all data forwarded

Sleeping

Fig. 2.

The different application states.

those of its children. Furthermore, the clocks of the nodes have been synchronized. As soon as a node knows its parent and its children in the routing tree, it switches over to the data-forwarding phase. The details of these two phases are described in Section IV-F and IV-G. If a node fails to be integrated into the routing tree, it descends into the sleeping state. At the end of the data forwarding phase (when all own packets and those of all descendants have been sent) nodes also change into the sleeping state. If the tree construction timer expires, all nodes change into the initial state and the tree construction phase begins again. If the synchronization timer expires, nodes descend into a waiting state, in order to wait for a synchronization message. In this state, starting from the sink, the current time is sent to every node in the network via the current routing tree. As soon as a node has received a sync message from its parent, it descends into the data-forwarding phase. If a node does not receive a time messages it changes into the initial waiting state. The purpose of the synchronization phase is to avoid the tree construction phase and to reuse the routing tree previously build. The length of tree construction timer is a multiple of the length of the synchronization timer. The purpose of the tree construction timer is to allow new nodes or nodes not connected to the current routing tree to join the routing tree. To save on energy the communication intensive tree construction phase is performed in longer time intervals, in-between only the clocks of the nodes are synchronized. Figure 3 depicts a sequence of phases and the fill state of the packet buffer. The different states require different resources to be provided by a node. To save energy, only those services are provided, that are needed. During all phases, all nodes take measurements. All nodes have switched on their transceiver during the tree construction phase. During the data-forwarding phase, nodes have switched on their transceiver

• •

during their time slot and during the time slot of their children.

While in the synchronization phase, nodes have switched on their transceiver • •

during their time slot and during the time slot of their parent.

Nodes have the lowest energy requirements when they are in the sleeping state. Then the transceiver and the AD converter are switched off. Note that a node that misses time synchronization has to keep its transceiver turned on until the next invitation message. Therefore, the tree construction timer should not be too large. F. Tree Constructing Phase In order to transport the data towards the sink, a routing tree is built. The tree is similar to a breadth-first tree rooted at the sink. Each node maintains a reference to its immediate parent and an ordered list of secondary parents, these are used in case the primary parent can no longer be reached or refuses to accept packets. The protocol is implemented as a finite state machine; each node is in one of the following states: WAITING: Node is not part of the routing tree, it is waiting for invitation messages; data packets are not accepted. BROADCAST: Node is part of routing tree, it sends invitations to join the routing tree. CONNECTED: Node is part of routing tree, data packets are accepted for further routing. This phase uses a three-way handshake protocol to establish bidirectional links between nodes. Initially all nodes are in state WAITING. In the first round, the sink commences the procedure by changing to state BROADCAST. A node in this state broadcasts an invitation message and changes into state CONNECTED. Nodes in state WAITING that receive invitations store the identifiers of the inviting nodes along with the quality of the link. In the following round, they broadcast a parent-request message along with the identifiers of the nodes from which they received an invitation. Nodes in state CONNECTED that receive such a request message including their own identifier may accept this node as a child. Each node accepts only a limited number of children, this is done in recognition of the limited memory resources and in order to store the list of all children of a node in a single packet. In the following round the parent node sends an parent-acknowledge message including the identifiers of those nodes accepted as children. Nodes receiving this message including their own identifier accept this node as a parent node. A node that does not receive a parent-request regards itself as a leaf. Upon the end of the current round the node selects the node with the best quality as its parent and selects up to nsf secondary parents (according to signal strengths or a link quality metric provided by the link layer). Then it changes its state to BROADCAST and sends an invitation in the next round. Eventually all nodes reachable from the sink

Tree Construction

Synchronization

emptying filling

emptying filling

Data Forwarding

emptying

Sleeping

filling

Fig. 3. This picture shows a sequence of phases: one tree construction phase and two synchronization phases. Each followed by a data-forwarding and a sleeping phase. In the lower part the fill level of the EEPROM is shown.

are in state CONNECTED. Altogether it takes 3d rounds to build the routing tree, d being the depth of the established routing tree. Nodes remain only for a fixed span of time in state WAITING, if they are not able to join the routing tree during this time they switch off their receiver and change into deep sleep mode. They do not participate in the next phase of data forwarding. They will get a new opportunity in the next round. G. Data Forwarding In principle, a node may start taking measurements at any time before the routing tree is built. As soon as a node changes into state CONNECTED it starts sending data packets via its parent to the sink (as soon as a packet is available). Upon sending a packet, the data is read byte-wise from the packet buffer in the EEPROM into working memory from where the transceiver can access it. After sending the packet to the primary parent, the node waits for an acknowledgment. If an acknowledgment is received by the end of this slot, the node marks this packet as sent. Otherwise, it increases the number of failed transmission attempts. In the first case, the node moves the head pointer of the ring buffer to the next packet during the next processing slot. The node tries to send a packet via its primary parent nr times. If all these attempts fail, the node discards the primary parent, selects the next best secondary parent as its new primary parent, and repeats this step. If no secondary parent is available, the node stops sending packets and transits into the sleeping state. Since a parent keeps on receiving packets, the section of the packet buffer reserved for foreign packets will eventually be full. In case the buffer is filled up, the node sends after receiving packet a negative acknowledgment to corresponding child. After all children know that this node is no longer able to receive packets, the node transits into the sleeping state (but keeps recoding measurements of its own sensor until the packet buffer is full). Upon receiving a negative acknowledgment, a child removes the sender of this message from its list of parents. In case this was the primary parent, a new primary parent is selected from the list of secondary parents. If this leaves a node without a primary parent, this node also behaves as described above and transits to the sleeping state. If a leaf node has only one more packet to sent, it sets a particular flag in this packet to indicate its parent that it will send no further packets. The parent node will in future rounds not turn on its receiver during the slot of this child.

Upon receiving the corresponding acknowledgment message, the node transits into the sleeping state. If a node has been informed by all its children that they will send no more packets or if a node does not receive any packet during three consecutive rounds it also changes into the sleeping state.

sending packets decreases dramatically. In this case, the sink could initiate a rebuild of the routing tree simply by starting a tree construction phase the next time a synchronization phase is scheduled. I. Wear-Leveling

H. Congestion Detection The algorithm must be prepared to react in case all available frames of the packet buffer are taken. To fulfill requirement (B) the packet buffer is logically divided into two sections: one section is reserved for packets with measurements of the node itself. The remaining frames can hold packets of other nodes. Each node determines the number of frames reserved for each section individually. Denote by sts(v) the number of nodes in the subtree of the routing tree with root v. If the packet buffer can hold f frames in total, then flocal (v) = f / sts(v)

(6)

frames are reserved for data of the node itself, the remaining frames are used to store packets in transit towards the sink. This leads to a ratio between the size of the two sections of 1/(sts(v) − 1), i. e., the ratio is determined by the position of the node in the routing tree. For leaf nodes, flocal (v) = f , i. e., all frames can be used to store local data. Note that the division of the packet buffer in two sections is not enforced by a physical separation but it is controlled by monitoring the number of packets inserted and removed. This simplifies the implementation of the buffer considerably. Since the exact value of sts(v) is not known, an estimation stse (v) is introduced. Each node v includes in the packets with its own measurements the value stse (v) = 1 + stse (vi ) (7) vi ∈Succ(v)

where Succ(v) is the set of neighbors of v sending their packets to v (stse (v) = 1 for a leaf node v). Each node keeps the numbers stse (vi ) in its neighbor list, if a node does not receive a packet from a node vi for some fixed time, it sets stse (vi ) to 0 and recalculates flocal (v). When a new packet arrives at a node v, the packet is inserted at the tail of the packet buffer and the values of stse (v) and flocal (v) are recalculated. If the packet buffer now contains f −flocal (v) foreign data packets, then the node can no longer accept new packets. The node changes its state to the sleeping state and behaves as described above. If the packet buffer contains flocal (v) data packets with measurements of the node itself, the node stops taking new samples. When a packet from the packet buffer is sent later, the node may resume taking samples. Nevertheless, if the node can no longer transmit packets, it will eventually change to the sleeping state. A frame holding a packet that has not been sent is never overwritten; this avoids unnecessary writing to the EEPROM. Note that in case of a congestion, back pressure messages are not propagated upstream toward the source as is done for example in the CODA system [15]. The sink monitors incoming packets and can detect if the number of nodes

Each location of the EEPROM memory can endure a maximum number of writes (between 10,000 and 1,000,000). Repeated writes to the same location will exhaust the lifetime of the EEPROM (i. e., storage becomes unreliable). It is important to ensure that writes are evenly distributed, a process called wear-leveling. A ring buffer is well suited for this purpose, since all modifications are done at either end of the ring. This provides even wear-leveling as the frames that are written move sequentially through the EEPROM, continuously wrapping around the specified storage area. To provide wearleveling for the pointer to the head of the ring buffer, another ring buffer, called head buffer, is introduced. This buffer has the same number of components and each component can store a pointer to a component of the first buffer. The head pointers of both ring buffers point to the same position in their corresponding ring buffers. All components of the head buffer are initialized with 0. When writing to a component of the packet buffer, the corresponding component of head buffer is also updated: the value written to this component is the value written to the previous component incremented by 1 modulo c, the total number of components. After a crash followed by a reset, the components of the head buffer are read sequentially. The head pointer points to the component, where the difference of the value of the current component and the next component is bigger than 1. In the experiment the ring buffer can store up to 1.600 packets, thus a width of 2 bytes for a component of the head buffer suffices. Since the algorithm maintains three pointers into the packet buffer, three ring buffers are needed. A solution for wearleveling in case of writing records of different length to flash memory is presented in [7]. J. Eliminating Duplicates The algorithm does not prevent that different nodes hold a copy of the same packet. For example if a packet is transmitted successfully but the acknowledgment for this packet is lost, the node will send the packet again (possibly to a different node). To avoid unnecessary forwarding of packets and hence to save energy, duplicates are eliminated at the link and at the network layer. A packet is uniquely identified by the source node identifier and the time stamp, the combination is called packet identifier. Packet identifiers are mapped to a one byte value using a hash function, the resulting values are called hashed identifiers. These enable the sink to discard duplicates. To save memory, bandwidth, and energy, duplicates should be eliminated in the network if possible. To meet this end, each node stores the hashed identifiers of the last L packets seen in a ring buffer in RAM, needing L + 1 bytes altogether. If the identifier of a newly received packet is found in this list, the packet is discarded immediately.

TABLE I S YSTEM SETTINGS FOR DIFFERENT PHASES OF THE PROTOCOL Mode

TDMA Slot

TXPower

Waiting receiving DataForwarding

sending

70 80 90 100

inactive Sleeping

LPM

Serial

Transceiver

ADC

Timer [Hz]

Current [mA]

1

off

on

on

1024

6.31

1 1 1 1 1 3

off off off off off off

on on on on on off

on on on on on off

1024 1024 1024 1024 1024 20

6.31 4.05 4.32 5.02 7.62 0.38

3

off

off

off

20

0.38

Since the routing tree is recomputed periodically, it is possible that a parent-child relation in one phase will be a child-parent relation in the following phase. This can lead to a situation that a packet is routed twice through the same node. For this reason, the duplicate lists are cleared at the beginning of each tree construction phase. V. P RELIMINARY I MPLEMENTATION In order to provide a proof of concept for the proposed reliable data gathering service and to verify the validity of the derived theoretical limits, a prototype of the service based on a real sensor network consisting of ESB and ECR nodes of the ScatterWeb platform developed at the FU Berlin was implemented [16]. Each node was equipped with 2 KB RAM and 64 KB EEPROM, the TI MSP430 controller, and with a wireless communication device working at 19.200 kbit/s. The experiment was conducted with 15 nodes and implemented an office monitoring system. The sink was connected to a PC, and all data received by the sink was stored in a database. This simplified the analysis of the data and allowed a remote monitoring of the application. In order to create a communication bottleneck, the sink was in the transmission range of a single node only. All other nodes were geographically distributed, such that the communication graph constituted a binary tree. The payload Pp of a packet is 26 bytes leading to a packet size of 48 bytes: 10 bytes application header, 10 bytes firmware header, and 2 bytes CRC. Manchester coding and pre/postamble led to a total number of Ptot = 111 Bytes. The packet buffer in EEPROM allowed the storage of 1600 packets. According to Equation (1) a single time slot should have a duration of at least 89 ms (assuming γ = 0.5). To deal with issues such as clock drifts, Ts was given a value of 120 ms (see also section IV-D). To simplify the implementation, the TDMA protocol used a fixed slot allocation with 15 communication slots and one processing slot of length 200 ms. This yielded Tr = 2 s. The system is tested with different sampling rates. In the first experiment every 5 seconds a temperature value is measured, each value occupies 2 bytes. This leads to a sampling rate s of 3.2 bit/s. Together with a relative time stamp of 4 bytes 11 values fit into a single packet. Every

55 seconds a packet is placed into the packet buffer in the EEPROM. Every 3 hours a new tree is constructed, in between after 1 and 2 hours a synchronization phase is initiated. To analyze the dynamics of the routing tree and in particular the usage of the ring buffer the application generates packets with debugging information. After each tree construction phase each node generates one packet with the list of all neighbors including link quality values. A second packet with internal status information (e. g., battery level) is produced after wards. To examine the flow control mechanism every 75 EEPROM operations (i. e., packet insertions and deletion in the packet buffer) another packet with the status of the buffer is send to the sink. To fulfill requirement (C), i. e., to prolong the running time of the overall application all possibilities to save energy were considered and implemented if possible. First of all the serial line and all additional sensors of the ESB nodes were deactivated all the time. The temperature sensor was activated only for measuring. During the tree construction phase the transceiver needs to be activated all the time, this results in a power consumption up to 6.31 mA. During the following phase, the data-forwarding phase, a node only needs to have its transceiver activated when itself or one of its children is sending. In the latter case, the node is waiting to receive a packet. During all other time slots (including the processing slot) and the sleeping phase, nodes switch off their transceiver and the AD converter. The energy requirement is with 0.38 mA much lower during this time. Obviously the power consumption increases with the number of packets received and send, thus nodes near the sink consume more power than those lower in the routing tree. One month after the deployment of system with several sampling intervals, the debugging information revealed that the nodes were 94.5% of the time in the low power mode, when configured to use a sampling interval of 5 seconds. No packets were lost during this time. Table I gives an overview of the settings of the system in the different phases of the protocol and the values of the energy consumption. Using the data shown in Table I the application requires about 0.055 × 6.31 + 0.945 × 0.38 ≈ 0.49 mA. Standard batteries of type AA have on the average a capacity of 2200 to 3000 mAh. Thus, this would lead to an upper bound of 186 to 254 days.

1400

257 261 264

1200 1000 800 600 400 200 0 03:00 06:00 09:00 12:00 15:00 18:00 21:00

(a) Sampling interval 2 s

Number of Packets stored in EEPROM

Number of Packets stored in EEPROM

1600

257 261 264

1600 1400 1200 1000 800 600 400 200 0

03:00 06:00 09:00 12:00 15:00 18:00 21:00

(b) Sampling interval 5 s

Fig. 4. The course of the EEPROM fill level for two different sampling intervals over 24 h.

With a sampling interval of 2 seconds, the application would still run for 152 to 207 days. The experiment used a TDMA scheme with a one-toone relationship between nodes and slots. In more elaborate schemes more than one node may use the same slot. This would result in fewer rounds during data forwarding, but would require a more complex algorithm. The main benefit would be a decrease in the number of rounds, leading to a smaller delay at the sink. The number of slots during which an individual node needs to send or receive data would be unchanged. However, since such an algorithm requires more communication (i. e., control messages) the overall energy consumption would increase. Figure 4 shows the course of the EEPROM fill level over 24 h for sampling intervals of 2 resp. 5 seconds. Leaf nodes, such as node 264, have no problems to send their EEPROM content within one data-forwarding phase. Even after a temporal disconnection from the tree, see (a) between 3pm and 8pm, node 264 is able to empty the EEPROM within one data-forwarding phase. A sampling interval of 2 s (resp. 5 s) results in nearly 164 (resp. 65) packets per hour. The application is able to handle a sampling interval of 5 s. Figure 4 (b) depicts some characteristic nodes: a node just 1 hop from the sink (node 257) and a node having a bad connection to its parent (node 261). Increasing the sampling interval leads to a EEPROM congestion, see Figure 4 (a). VI. D ISCUSSION There are several open issues that have not been addressed in this work. Introducing several sinks is a possibility to allow for applications with high sampling rates. This requires a more elaborate routing scheme in order to achieve higher network utilization. If the network experiences very high failures rates over a longer period of time, the data delivery ratio will decrease. The sink will notice that the network is oversaturated and could react by requesting a lower data sampling rate. For this purpose, the sink could estimate an advisable sampling rate and include this information in the invitation messages during the next tree-rebuilding phase and nodes would adjust the sampling rate (i. e., a kind of closed-loop flow control). In case the local storage capacity is exhausted, application-specific countermeasures could be initiated. A possible solution is to free memory by removing

every other data sample and thereby guaranteeing even wearleveling. The results of the experiments indicate that a TDMA schedule with a constant number of time slots is a limiting factor, and the assignment of slots must consider the depth of a node in the routing tree. Buffering packets in memory is absolutely necessary in harsh environments to cope with complete communication failures over longer periods of time (e. g., networks in tidal environments). In such applications, predicting the load of a channel is extremely important, because the queued data can only be transmitted during the time communication is available. ACKNOWLEDGMENT The authors would like to thank Stefan Untersch¨utz for his help during the implementation and maintenance of the application. R EFERENCES [1] S. Kim, R. Fonseca, and D. Culler, “Reliable Transfer on Wireless Sensor Networks,” in 1st IEEE Int. Conf. on Sensor and Ad hoc Communications and Networks, Santa Clara, USA, 2004. [2] C.-Y. Wan, A. T. Campbell, and L. Krishnamurthy, “PSFQ: A Reliable Transport Protocol for Wireless Sensor Networks,” in 1st ACM Int. Workshop on Wireless Sensor Networks and Applications, USA, 2002. [3] F. Stann and J. Heidemann, “RMST: Reliable Data Transport in Sensor Networks,” in 1st Int. Workshop on Sensor Net Protocols and Applications, Anchorage, USA, 2003. [4] S.-J. Park, R. Vedantham, R. Sivakumar, and I. F. Akyildiz, “A Scalable Approach for Reliable Downstream Data Delivery in Wireless Sensor Networks,” in 5th Int. Symp. on Mobile Ad Hoc Networking and Computing, Tokyo, Japan, 2004. [5] Y. Sankarasubramaniam, O. B. Akan, and I. F. Akyildiz, “ESRT: Eventto-Sink Reliable Transport in Wireless Sensor Networks,” in 4th Int. Symp. on Mobile Ad Hoc Networking and Computing, USA, 2003. [6] S. Nedenshi and R. Patra, “DTNLite: A Reliable Data Transfer Architecture for Sensor Networks,” in 8th Int. Conf. on Intelligent Engineering Systems, Cluj-Napoca, Romania, 2004. [7] D. Estrin, “Reliability and Storage in Sensor Networks,” CENS, Technical Report 59, 2005. [8] A. Willig and H. Karl, “Data transport reliability in wireless sensor networks – a survey of issues and solutions,” Praxis der Informationsverarbeitung und Kommunikation, vol. 28, 2005. [9] W. Ye and J. Heidemann, “Medium Access Control in Wireless Sensor Networks,” USC/ISI, Technical Report ISI-TR-580, 2003. [10] R. Cardell-Oliver, “ROPE: A Reactive, Opportunistic Protocol for Environment Monitoring Sensor Networks,” in The Second IEEE Workshop on Embedded Networked Sensors (EmNetS-I), San Francisco, USA, May 2005, pp. 63–70. [11] J. Chou, D. Petrovic, and K. Ramchandran, “A Distributed and Adaptive Signal Processing Approach to Reducing Energy Consumption in Sensor Networks,” in 22nd Conf. of the IEEE Computer and Communications Societies, San Francisco, USA, 2003. [12] D. Marco, E. J. Duarte-Melo, M. Liu, and D. L. Neuhoff, “The Manyto-One Transport Capacity of a Dense Wireless Sensor Network and the Compressibility of its Data,” in 2nd Int. Workshop on Information Processing in Sensor Networks, Palo Alto, USA, 2003. [13] E.-O. Blass, L. Tiede, and M. Zitterbart, “An Energy-Efficient and Reliable Mechanism for Data Transport in Wireless Sensor Networks,” in Third International Conference on Networked Sensing Systems (INSS), Chicago, USA, May 2006, pp. 211–216. [14] E. J. Duarte-Melo and M. Liu, “Data-gathering wireless sensor networks: Organization and capacity,” EECS Department, University of Michigan, Technical Report CSPL-333, 2002. [15] C.-Y. Wan, S. B. Eisenman, and A. T. Campbell, “CODA: Congestion Detection and Avoidance in Sensor Networks,” in 1st Int. Conf. on Embedded Networked Sensor Systems, New York, USA, 2003. [16] ScatterWeb, http://www.scatterweb.net, 2006.