Most nodes in a typical WSN for environment monitoring operate in similar ambient .... which is started at system bootup, thus reflecting the node uptime.
Distributed time management in Wireless Sensor Networks Tomasz Surmacz1 , Bartosz Wojciechowski1 , Maciej Nikodem1 , and Mariusz Słabicki2 1
2
Institute of Computer Engineering, Control and Robotics, Wroclaw University of Technology, Poland Institute of Theoretical and Applied Informatics of Polish Academy of Sciences, Gliwice, Poland
Abstract. Keeping time synchronization between nodes in wireless sensor networks (WSNs) is an important task allowing reduction of power consumption and better bandwidth usage due to collision avoidance. In this article we discuss requirements and practical limitations of time synchronization and present evaluation of sample algorithm, that can maintain the common clock between nodes of the WSN network that persists even in case of repeated node restarts, as long as network connectivity is maintained. We also provide measurements and temperature-dependent model of clock drift in popular WSN TelosB nodes.
1
Introduction
A common requirement for Wireless Sensor Network (WSN) nodes is the low cost and ability to operate for a long time without external power source. This allows setting up networks consisting of tens or hundreds of elements that can operate unattended for extensive periods of time, thus reducing the maintenance costs. For this reason WSN nodes are usually equipped with low-cost and low-power components and do not have any modules that can be considered superfluous, such as Real-Time Clocks (RTC) and high quality oscillators. This makes synchronization and time keeping in WSN nodes a challenge. Agreeing on some common time in WSN nodes is needed for synchronous data transmission (TDMA modes), asynchronous sleep modes, collision avoidance, and data time-stamping, just to name a few uses. There are two closely-related topics: synchronization of the nodes and keeping real time. Most applications require only local time synchronization, i.e. between nodes in close vicinity of each other. This allows improving power conservation and communication throughput by synchronizing sleep/communication duty cycling and scheduling transmission times. In most cases the time between taking the measurement by a node and the time when the packet with measured quantity reaches the Base Station (BS) is short enough to time-stamp the measurement at it’s destination. However, time-stamping the data at it’s source may be necessary, for instance if some form of data aggregation is
employed. This would allow for delayed transfer of non-time-critical data to reduce the total number of transmissions. Many characteristics of WSN networks affect time-keeping and synchronization in the nodes. The most important ones are: clock frequency skews (between nodes), time drift (causing offsets), temperature-dependence of frequency of oscillators, random communication latencies and network topology variations, which may be caused by routing and network-management algorithms or by intrinsically low reliability of individual nodes causing failures and forcing changes in data transmission paths. Also in many low-power or high-throughput applications the amount of communication overhead associated with synchronization may be a problem. Most nodes in a typical WSN for environment monitoring operate in similar ambient conditions. This causes their clocks to drift with similar pace. However, if some nodes’ temperature differs from the others by a significant margin, e.g. due to heat build-up from direct sunlight, their clock drift may be much higher. In such cases these nodes may have a negative impact on time synchronization of the whole network. A time synchronization algorithm may use oscillator models to adjust the level of confidence of a given node. This paper presents evaluation of a simple yet communication-effective approach to time-keeping and node-synchronization targeted at WSNs for environment monitoring. Such networks operate with long sleep periods to conserve energy. The time synchronization is only one of the tasks that the WSN network must perform – the main one (in our case) is performing periodical measurements and sending the results to BS. We focus on both extending the network lifetime and the reliability of data collection [?, ?]. Contributions of this paper are twofold: 1) evaluation of a simple world-time synchronization algorithm for environment monitoring WSNs using a real-world network deployed in a greenhouse, 2) measurement-based models of temperaturerelated differences in time drift of popular WSN nodes. Our approach is to focus on collective time keeping in WSNs in cases where nodes may spontaneously reboot due to watchdog conditions caused by reliability requirements and need new time synchronization without any external time source. We show that even simple solutions to synchronization problems are satisfactory in environment monitoring applications while preserving low communications and energy overhead.
2
Related work
Most works focus on improving precision of time synchronisation. This usually comes at cost of multiple messages that have to be sent. One option is to transmit synchronization data by piggybacking with other types of packets, e.g. standard measurement reports. In most synchronization schemes it is necessary to be able to measure transmission delays between two points. The precision of message delay measurement depends on 2-way communication which is influenced by send, access, transmission, propagation, reception and receive times and is subject to high uncertainty. Even more so in WSNs with multi-hop communication and frequent
transmission collisions. The other important factor is the drift of nodes’ internal clock [?]. For reasons such as computational requirements, memory footprint and communication bandwidth requirements, traditional approach to network synchronization known from TCP/IP networks, such as the Network Time Protocol (NTP) or IEEE 1588 Precision Time Protocol (PTP), are not suitable for use in lowcommunication-intensity, low-power sensor networks. Elson and R¨omer [?] explain why NTP is ill-suited for sensor networks and suggest various design principles for WSN time synchronization. They suggest adapting methods specific to the application and exploiting domain knowledge. The authors place a few known algorithms in the parameter space with energy, precision, cost, synchronization scope and lifetime as dimensions, but do not present any specific algorithm. In [?] Elson et al. presented Reference-Broadcast Synchronization (RBS) in which nodes send reference broadcasts to their neighbours to remove the sender’s nondeterminism from timestamp calculation. The broadcasts are used as reference points for comparing clocks. The authors claim that this method improves precision with respect to twopoint sender-receiver schemes. Cho at al. [?] consider using PTP for WSNs but their approach requires redesigning the hardware of a basic WSN node to contain a WSN to ethernet gateway and is not suitable for typical low-energy WSN networks. Ganeriwal et al. proposed a two-phase hierarchical synchronization protocol called TPSN (Timing-Sync Protocol for Sensor Networks) [?]. In the first phase (level discovery) each node obtains a level, with only one root node having level 0. The root node is usually the BS (or packet sink) of the network. In the synchronization phase each node synchronizes to a node with a lower level. After the second phase of the protocol the network is globally synchronized to the root node. TPSN was implemented and tested on Berkeley’s Mica nodes and allowed to achieve average error of less than 20 µs. Authors of [?] presented two lightweight synchronization algorithms called tiny-sync and mini-sync. These are based on the assumption that oscillators have fixed frequency and that two clocks can be linearly related. Then, a two-way communication is sufficient to obtain data points (3-tuples) that allow to bind the relative clock drifts and offsets of any pair of neighbouring nodes. LTS proposed by Greunen and Rabaey [?] takes a different approach aiming not at absolute accuracy but trying to minimise the overhead associated with synchronizing the nodes with required precision. This follows from the notion that the required accuracy in WSN networks is not very high (on the order of fractions of a second). TSync [?] follows a bidirectional approach where synchronization can be initiated by central time source (e.g. a node equipped with GPS receiver) for lightweight global synchronization or pulled on-demand by individual sensors. In [?] a lightweight and energy efficient time synchronization scheme called LEETS is proposed. It can be applied to all kinds of TDMA power saving MAC schemes. The main objectives for LEETS design was to remove communication overhead associated with other time-synchronisation schemes. LEETS operates in two phases: initial time synchronisation and synchronisation maintaining, and assumes a root node with a GPS receiver to be present that sends the original SYNC packet. Fontanelli and Macii [?] evaluate their master-less local synchronization algorithm by means of simulation. They assume pessimistic clock rate distribution
over the total of 10−4 (100 ppm). These assumptions stand in contradiction to our own experimental evaluation of clock drift in WSN nodes (cf. Sec. 5). A good introduction to clock synchronization can be found in a survey by Sundararaman et al. [?] where challenges and design principles relating strictly to WSN domain are presented and several algorithms are compared qualitatively and quantitatively.
3
Time synchronization approach
There are two main goals to achieve by using time synchronization: local synchronization within a group of nodes (which can span across the whole network) and synchronization with real-world time. Local synchronization provides means of scheduling sleep and alert modes at the same time at all nodes, so they may communicate efficiently and conserve power. Global network synchronization with universal time is not needed to achieve these goals, but on the other hand – is desirable for data timestamping. If nodes know the global time, they may locally optimize scheduling of data transmissions by aggregating several measurements over time and reducing the overall number of transmitted packets. We have designed and implemented an algorithm (shown in Fig. 1.) which maintains the global time throughout the network with low message overhead. It combines reference broadcasts [?] with peer-to-peer node synchronization to achieve self-correcting behaviour of the WSN. In real implementations time synchronization is only one of the tasks that a WSN node must perform, so the algorithm is shown as a packet handling routine for received packets. Initially, all network nodes start in unsynchronized state. Each node sends a time synchronization request just after booting, but these get unanswered as none of the network nodes running so far is able to answer that request. First synchronization is initiated by a broadcast of a SETTIME message from a network node maintaining
N
Bcast or to us?
received new packet
forward packet
Y
boot
Y
GETTIME & stratum =255?
send GETTIME stratum=255 send SETTIME if synchronised
N
SETTIME?
N
process other packet types
Y
stratum = 255?
N
Y
better stratum?
Y
N
send response with GETTIME
set local time
forward packet if needed
Fig. 1: Message handling in the implemented synchronization method
the universal time reference. This is usually a Base Station but it may be also a wireless node with a Real Time Clock or GPS module attached. Once the message containing the global time reference is propagated throughout the whole network, all nodes change the state to synchronized. From that point the whole network is synchronized and any new node appearing on the network will get its global synchronization from any other node in its vicinity. This will work properly if there is at least one synchronized node in the running network. Sync requests are broadcasted and retransmitted throughout the network. The distance from the time source measured in the number of retransmissions of the sync packet is called stratum – when setting local time, messages with lower stratum are preferred (stratum 0 meaning the clock source, stratum 1: one hop from the source, and so on). Lets assume a sync request message is sent at time t0 from node X and received at t1 at node Y . Before retransmission, node Y checks its local time and if the node is in synchronized state, then a response message is sent (at time t2 ) which will be eventually delivered to node X at time t3 . Message processing time (t2 − t1 ) is marginal (usually constant). The accuracy of time extracted at X is influenced mostly by message propagation delay (t3 − t2 ), which increases with the 0 number of retransmissions. To compensate for that, time at X is set to t2 + t3 −t 2 . Each node maintains its local time by examining a millisecond resolution timer which is started at system bootup, thus reflecting the node uptime. Time synchronization status is kept in 3 variables: time sync source, stratum which refers to the clock source distance, and the millisecond accuracy time difference between the real time and local uptime clock. Initially, stratum is set to 255, which means that the real time is unknown. Global time synchronization is acquired by propagating a time setup message (SETTIME) throughout the network. Each node that receives such a message compares message’s stratum and noOfHops data with the local stratum parameter and if msg stratum + noOfHops < local stratum, it stores the difference between the received and local time. If the time setting message was a broadcast message, it is resent to other nodes (and the noOfHops increases on each retransmission). Base Station (or any node) may request other node’s time info by sending SETTIME message with stratum parameter set to 255. The responding node replies with a GETTIME message (using the same message layout) containing its own source reference, stratum and freshly calculated real time. A GETTIME with stratum equal to 255 heard at other nodes triggers sending a SETTIME with local time info, which allows for automatic self-correction of “lost in time” nodes – i.e. nodes that overhear “unsynchronized” response automatically respond with their own sync data. The algorithm has been implemented and tested in TinyOS running on TelosB and XM1000 motes.
4
Test cases
The algorithm has been tested in real-life network setup in greenhouse environment monitoring application. Our test network is based on popular TelosB motes and programmed with TinyOS. For the sake of simplicity and reliability a multi-
layered protocol is used that can either build a routing tree between WSN nodes or use message flooding with broadcasts. Each message is identified by a source node and a sequence number, which are preserved if the message is retransmitted. All nodes keep track of (source, seq) pairs of messages seen from their neighbours, so that transmission loops and retransmissions of duplicates are avoided. We use the standard Low Power Listening mechanism to prolong nodes’ lifetime through radio duty-cycling. Eleven wireless nodes and two Base Stations have been placed in total throughout 3 greenhouse buildings. Both Base Stations were used for data gathering during the experiment and only one initial synchronization message has been sent at the beginning. After that, all nodes have been using only their own clocks to maintain the global time reference. The data shown below describes the experiment that started on November 13th 2013 and lasted until Nov 28th (over 2 weeks). As can be seen in Fig. 2, during this test nodes steadily gained clock drift between 2 and 3 seconds from the universal time, which increased to 5-7 s in the last 2 days when network connectivity detoriated rapidly. The relative difference between all nodes was kept within fractions of a second. For the clarity of pictures not all nodes are shown, but the omitted ones present the same data patterns. Nodes were programmed with watchdog features that rebooted each node if any malfunction was detected, such as a stalled timer, buffer overrun, etc. In normal mode of operation such reboots would occur at a much lesser rate, but for the sake of timekeeping experiments this was the desired situation, allowing us to test the algorithm performance in extreme conditions. The increasing value of stratum in all nodes indicates repeating node reboots and slowly deteriorating base time reference. Fig. 3 shows a sample message exchange in timesync protocol. First, node 79 reboots and sends its time request message (noted by seq=1 – see line 1). Then
8
6
4
2 node 12 diff [s] node 25 diff [s] node 31 diff [s] node 49 diff [s] node 12 stratum/10 node 25 stratum/10 node 31 stratum/10 node 49 stratum/10
0
-2 14.11
16.11
18.11
20.11
22.11
24.11
26.11
Fig. 2: Stratum and time drift experiment data
28.11
1 2013 -11 -13 12:22:30.2638 , from =79 , to =65535 , via =79 , seq =1 , hops =1 , msg t=GETTIME, stratum=255 , rtc=1970-01-01 01:00:00.331000 , rtc_s =0.331 , 2 2013 -11 -13 12:22:30.2812 , from =32 , to =65535 , via =32 , seq =56 , hops =1 , msg_t = SETTIME , stratum =6 , rtc =2013 -11 -13 12:22:30.272 , rtc_s =1384341750.272 , 3 2013 -11 -13 12:22:30.3425 , from =31 , to =65535 , via =31 , seq =29 , hops =1 , msg_t = SETTIME , stratum =5 , rtc =2013 -11 -13 12:22:31.466 , rtc_s =1384341751.466 , 4 2013 -11 -13 12:22:30.3595 , from =31 , to =65535 , via =82 , seq =29 , hops =2 , msg_t = SETTIME , stratum =5 , rtc =2013 -11 -13 12:22:31.466 , rtc_s =1384341751.466 , 5 2013 -11 -13 12:22:30.3932 , from =31 , to =65535 , via =5 , seq =29 , hops =2 , msg_t = SETTIME , stratum =5 , rtc =2013 -11 -13 12:22:31.466 , rtc_s =1384341751.466 , 6 2013 -11 -13 12:22:30.3934 , from =31 , to =65535 , via =49 , seq =29 , hops =2 , msg_t = SETTIME , stratum =5 , rtc =2013 -11 -13 12:22:31.466 , rtc_s =1384341751.466 , 7 2013 -11 -13 12:22:30.4192 , from =31 , to =65535 , via =25 , seq =29 , hops =2 , msg_t = SETTIME , stratum =5 , rtc =2013 -11 -13 12:22:31.466 , rtc_s =1384341751.466 , 8 ... 9 2013 -11 -13 12:26:19.1149 , from =22 , to =65535 , via =22 , seq =1 , hops =1 , msg_t = GETTIME , stratum =255 , rtc =1970 -01 -01 01:00:00.337 , rtc_s =0.337 , 10 2013 -11 -13 12:26:19.2269 , from =79 , to =65535 , via =79 , seq =6 , hops =1 , msg_t = SETTIME , stratum =6 , rtc =2013 -11 -13 12:26:19.144 , rtc_s =1384341979.144 ,
Fig. 3: Sample time sync exchange between nodes
node 32 responds with its own stratum 6 data (line 2) and node 31 with stratum 5 (line 3) which is also received as a retransmitted packet later (lines 4, 5, 6 and 7). Later on, when node 22 reboots (line 9) node 79 is one of those responding with its own time (line 10) referenced with stratum 6. This means that after initial setup from node 32 to stratum 7 it readjusted its time info to stratum 6 after receiving the stratum 5 message shown in line 3. Further messages did not improve the time sync data, as their stratum + hops values exceeded the already set value. Although the final results of 5 seconds timedrift in 2 weeks may not look astonishing, it is actually a 3.9 · 10−6 precision for collective network timekeeping without any external time reference (except the initial time setting). This has to be compared to the quality of crystal oscillators available in WSN nodes. To keep WSN node costs within reasonable bounds these are just popular electronics quality parts. We have performed several tests to determine the clock drift behaviour of these devices and the results are shown in Fig. 4. In these experiments each node was running a simple program that periodically sent the value of its free running uptime counter, thus informing of its local time. Packets from all nodes were received directly by a Base Station located nearby (no retransmissions). Each logged packet contained three timestamps: the uptime counter of the originating node, Base Station’s timestamp added as soon as the packet has been received over the radio and queued over the serial interface to the NetServ [?] program running on a Raspberry Pi computer, and finally – a Netserv’s timestamp of a packet received over the serial line. The first two are relative to when a particular WSN node was booted, while the last one is a NTP-based accurate global time. Thus, an increasing or decreasing difference in node’s and NetServ’s timestamps measures the clock drift of a particular WSN node. Nodes 16 and 47 in first experiment (Fig. 4 left) showing a negative clock drift were placed in a cooled environment (around 5◦ C), while all others were placed
4
4
3
TelosB clock drift [s]
2
1
0
node 16
-1
node 16
BS (52)
node 48 node 18 node 21 node 44 node 2 node 47 node 25
3
2 TelosB clock drift [s]
node 48 node 17 node 18 node 21 node 44 BS (43) node 19 node 42
node 12 1 node 19 node 42 0
-1 node 20
node 47
node 9 -2 09.01 12:00
-2 10.01 00:00
10.01 12:00
11.01 00:00
11.01 12:00
12.01 00:00
12.01 12:00
13.01 00:00
13.01 12:00
14.01 12:00
15.01 00:00
15.01 12:00
16.01 00:00
16.01 12:00
17.01 00:00
17.01 12:00
18.01 00:00
18.01 12:00
Fig. 4: Clock drift comparison of TelosB nodes in 2 experiments
in room temperature of 22◦ C. The second experiment (Fig. 4 right) started with nodes 19 and 42 placed in the cooled environment, but these have been replaced with nodes 9 and 20 at the 15.01@16:00 mark on the timescale. The effect of this replacement can be easily seen on the corresponding timedrift measurements. Momentary negative drifts that can be seen on Fig. 4 (right) until 17.01 03:00 are the effect of the slightly overloaded BS system (Raspbian Linux on Raspberry Pi) which ended when system was rebooted due to the mains power failure. As can be seen, the drift ratio of all nodes falls around 1s/day (i.e. 10−5 clock precision), but is also node-specific and depends heavily on ambient temperature [?]. In order to achieve a better timekeeping precision of WSN nodes two factors would have to be considered: an individual node clock calibration data and temperature compensation. The first one has to be determined individually for each node before the network deployment while the second requires constant temperature monitoring of the node and applying appropriate corrections. One of the options would be to use TCXOs (Temperature Compensation Crystal Oscillators) instead of SPXOs (Simple Packaged Crystal Oscillators) in node design but that would increase the node cost while improving the oscillator accuracy from 10−5 to 10−6 ppm. If frequent temperature measurements are the core functionality of the WSN network, these can be used for time corrections at each node (i.e. microcontroller-driven time compensation), but if the measurements are rare and the nodes spend most of the time in sleep modes to conserve the batteries, these extra wakeup states would not be justified.
5
Clock drift measurements
To find out how the clock drift changes with temperature we performed measurements of both quantities in outdoor conditions, during a week when outside temperatures steadily dropped over 15 degrees from 6 to −9 ◦ C and in indoor/lab controlled conditions of temperatures reaching 60◦ C. For this experiment we used three TelosB nodes (10, 15 and 45) and two XM1000 nodes (52, 53). We averaged temperature every dt = 900 s and plotted time drift for each time period as a function of temperature (see Fig. 5). We then fitted parameters of polynomial functions
10 15 45 52 53
40 30
drift [ppm]
20 10 0 −10 −20 −30 −10
0
10
20 30 temperature [deg C]
40
50
Fig. 5: Clock drift of nodes as a function of temperature
to all the data sets. According to [?] the crystal oscillator’s drift can be fitted by a cubic function. Our measurements show that for a microcontroller clock drift a square function is enough. Using higher-order polynomials did not improve the fit quality by a substantial value (measured by the norm of residuals). For example, for node 52 (XM1000 architecture) the fitted functions are −0.036x2 + 1.7x + 16 and −0.000034x3 − 0.033x2 + 1.7x + 16 respectively, and for node 45 (TelosB): −0.037x2 + 1.7x − 11 and −0.000011x3 − 0.036x2 + 1.7x − 11. The drift data shown in Fig. 5 is subject to noise. This is mainly caused by random noise originating from delays in interrupt handling in the nodes and transmission delays caused by collisions (10ms drift measurement error over 900s measurement period results in a 11 ppm error). However, removing the noise by applying median filtering did not change the relation between operating temperature and average time drift of the oscillator. For all the nodes the maximal drift change is close to 1.8 ppm per K. TelosB nodes have the drift close to 0 at around 6 ◦ C and 40 ◦ C while the XM1000 nodes have minimal clock drift around −3 ◦ C and 57 ◦ C.
6
Conclusions
Time synchronization is an important issue in WSNs as it allows to reduce energy consumption by scheduling transmissions and sleep/awake duty cycling. Practical experiments show that using even simple synchronization methods without the external time source allows to maintain the global network time drift below 5 · 10−6 which is comparable to the accuracy of the precision quartz crystal oscillators. For synchronous sleep modes this is acceptable, as the algorithms provide self-adjustment to minor discrepancies. This level of precision is also sufficient in data gathering applications for environment monitoring. We have also measured the characteristics of clock drift in function of temperature for TelosB and XM1000
nodes. These can be described by simple quadratic functions, so if periodic temperature measurements are taken by WSN nodes in their typical mode of operation, they can be used to calculate appropriate clock drift compensations. Acknowledgement This work was supported by National Science Centre grant no. N 516 483740.