Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-050
October 31, 2007
Fast Self-Healing Gradients Jacob Beal, Jonathan Bachrach, Dan Vickery, and Mark Tobenkin
m a ss a c h u se t t s i n st i t u t e o f t e c h n o l o g y, c a m b ri d g e , m a 02139 u s a — w w w. c s a il . mi t . e d u
Fast Self-Healing Gradients Jacob Beal, Jonathan Bachrach, Dan Vickery, Mark Tobenkin MIT CSAIL, 32 Vassar Street, Cambridge, MA 02139
[email protected],
[email protected],
[email protected],
[email protected]
ABSTRACT We present CRF-Gradient, a self-healing gradient algorithm that provably reconfigures in O(diameter) time. Selfhealing gradients are a frequently used building block for distributed self-healing systems, but previous algorithms either have a healing rate limited by the shortest link in the network or must rebuild invalid regions from scratch. We have verified CRF-Gradient in simulation and on a network of Mica2 motes. Our approach can also be generalized and applied to create other self-healing calculations, such as cumulative probability fields.
Categories and Subject Descriptors D.1.3 [Programming Techniques]: Concurrent Programming—Distributed programming; C.2.1 [ComputerCommunication Networks]: Network Architecture and Design—Wireless communication; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems—Geometrical problems and computations
General Terms Algorithms, Reliability, Theory, Experimentation
Keywords Amorphous computing, spatial computing
1.
CONTEXT
A common building block for pervasive computing systems is a gradient—a biologically inspired operation in which each device estimates its distance to the closest device designated as a source of the gradient (Figure 1). Gradients are commonly used in systems with multi-hop wireless communication, where the network diameter is likely to be high. Applications include data harvesting (e.g. Directed Diffusion[11]), routing (e.g. GLIDER[9]), distributed control
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’08 March 16-20, 2008, Fortaleza, Cear´a, Brazil Copyright 2008 ACM 978-1-59593-753-7/08/0003 ...$5.00.
7 4
7
0
4
3
4 4
3
7
4 4
5
0 4
7
Figure 1: A gradient is a scalar field where the value at each point is the shortest distance to a source region (blue). The value of a gradient on a network approximates shortest path distances in the continuous space containing the network.
(e.g. co-fields[14]) and coordinate system formation (e.g. [2]), to name just a few. In a long-lived system, the set of sources may change over time, as may the set of devices and their distribution through space. It is therefore important that the gradient be able to self-heal, shifting the distance estimates toward the new correct values as the system evolves. Self-healing gradients are subject to the rising value problem, in which local variation in effective message speed leads to a self-healing rate constrained by the shortest neighborto-neighbor distance in the network. Previous work on self-healing gradients has either used repeated recalculation or assumed that all devices are the same distance from one another (e.g. measuring distance via hop-counts). Neither of these approaches is usable in large networks with variable distance between devices. We present an algorithm, CRF-Gradient, that uses a metaphor of “constraint and restoring force” to address these problems. We have proved that CRF-Gradient self-stabilizes in O(diameter) time and verified it experimentally both in simulation and on Mica2 motes. We also show that the constraint and restoring force framework can be generalized and applied to create other self-healing calculations, such as cumulative probability fields.
2. GRADIENTS Gradients are generally calculated through iterative application of a triangle inequality constraint. In its most basic form, the calculation of the gradient value gx of a device x is simply
7
7 4
7 4
2 3
4 4
4
7
4
5
3
3
4
4 4
3
4
3
4
0
4
5
3
5 4
5
4
4 18
7
4
9
4
7
4
5
7
(f) Problem (2 rounds)
4
6 3
4
7
7
4 12 4
5
0 4
4
5
4
19
4
0
8.5 7 0.5
4
5
4
7
(d) Safe (9 rounds)
7
5.5 7 0.5
14
18
(c) Safe (3 rounds)
4
4
3
0
7
4
7
5
4
0 4
7
(e) Problem (1 round)
4
4
8
7
7
4.5 7 0.5
4
0 4
7
4
6
(b) Safe (2 rounds)
7 7
4
7
4
7
5
4
(a) Safe (1 round)
3
4
7
4.5 7 0.5
6
5
0 4
2
4
4
7 4
7
9
(g) Problem (3 rounds)
4
3
8
4 7
4
5
4
13
12
7 4 4
0
(h) Problem (9 rounds)
Figure 2: The rising value problem occurs when the round-trip distance between devices is less than the the desired rise per round. The figures above show self-healing after the left-most in Figure 1 stops being a source; for this example, updates are synchronous and unconstrained devices (black edges) attempt to rise at 2 units per round. The original graph rises safely (a-d), completing after 9 rounds. With one short edge (e-h), all rising becomes regulated by the short edge and proceeds much more slowly.
gx =
0 min{gy + d(x, y)|y ∈ Nx }
if x ∈ S if x ∈ /S
where S is the set of source devices, Nx is the neighborhood of x (excluding itself) and d(x, y) the estimated distance between neighboring devices x and y. Whenever the set of sources S is non-empty, repeated fair application of this calculation will converge to the correct value at every point. We will call this limit value g¯x .
2.1 Network Model The gradient value of a device is not, however, instantaneously available to its neighbors, but must be conveyed by a message, which adds lag. We will use the following wireless network model: • The network of devices D may contain anywhere from a handful of devices to tens of thousands. • Devices are immobile and are distributed arbitrarily through space (generalization to mobile devices is relatively straightforward, but beyond the scope of this paper). • Memory and processing power are not limiting resources.1 • Execution happens in partially synchronous rounds, once every ∆t seconds; each device has a clock that ticks regularly, but frequency may vary slightly and clocks have an arbitrary initial time and phase. • Devices communicate via unreliable broadcasts to their neighbors (all other devices within r meters distance). Broadcasts are sent at most once per round. • Devices are provided with estimates of the distance to their neighbors, but naming, routing, and global coordinate services are not provided. 1 Excessive expenditure of either is still bad, and memory is an important constraint for the Mica2 implementation.
• Devices may fail, leave, or join the network at any time, which may change the connectedness of the network.
2.2 Separation in Space and Time We can reformulate the gradient calculation to take our network model into account. Let the triangle inequality constraint cx (y, t) from device y to device x at time t be expressed as cx (y, t) = gy (t − λx (y, t)) + d(x, y) where λx (y, t) is the time-lag in the information about y that is available to its neighbor x. The time-lag is itself time-dependent (though generally bounded) due to dropped messages, differences in execution rate, and other sources of variability. The gradient calculation is then gx (t) =
0 min{cx (y, t)|y ∈ Nx (t)}
if x ∈ S(t) if x ∈ / S(t)
Our definition of the set of sources S(t) and neighborhood Nx (t) have also changed to reflect the fact that both may vary over time. The most important thing to notice in this calculation is that the rate of convergence depends on the effective speed at which messages propagate through space. Over many hops, this speed may be assumed to be close to r/∆t (cf. [12]). Over a single hop, however, messages may move arbitrarily slowly: the time separation of two neighbors x and y is always on the order of ∆t , while the spatial separation d(x, y) may be any arbitrary distance less than r. A device and its neighbor constrain one another. Thus, when the value of a device rises from a previously correct value, it can rise no more than twice the distance to its closest neighbor in one round; if it rises higher, then it is constrained by the neighbor’s value. This applies to the neighbor as well, so after each round of rising the constraints
are no looser. Since successive round trips between neighbors must take at least ∆t seconds, a pair of neighbors constrain one another’s distance estimates to rise at a rate no greater than 2d(x, y)/∆t meters per second. When a device x has a value less than the correct value, its time to converge is at least ∆t |y ∈ Nx (t)} max{(g¯x − gx (t)) 2d(x, y) which means that close neighbors can only converge slowly. Worse, the dependency chains from this retarded convergence can reach arbitrarily far across the network, so that the entire network is limited in its convergence rate by the closest pair of devices. We will call this phenomenon the rising value problem (illustrated in Figure 2). This can be very bad indeed, particularly given that many proposals for large networks involve some randomness in device placement (e.g. aerial dispersal). Consider, for example, a randomly distributed sensor network with 100 devices arranged in a 10-hop network with an average of 50 meters separation between devices that transmit once per second. Let us assume that the random distribution results in one pair of devices ending up only 50cm apart. If the source moves one hop farther from this pair, increasing the correct distance estimate by 50 meters, then the close pair and 1 every device further in the network will take 50 2·0.5 = 50 seconds to converge to the new value. If they had landed 5cm apart rather than 50cm, it would take 500 seconds to converge—nearly 10 minutes!
2.3 Previous Self-Healing Gradients To the best of our knowledge, the rising value problem has not previously been formalized. Previous work in selfhealing gradients, however, may still be categorized into two general approaches. An invalidate and rebuild gradient discards previous values and recalculates sections of network from scratch, avoiding the rising value problem by only allowing values to decrease. An incremental repair gradient moves its values bit by bit until they arrive at the desired values, and previously have avoided the rising value problem by limiting the minimum link distance. Invalidate and rebuild gradients periodically discard values across part or all of the network. This effectively avoids the rising value problem by ensuring that values that may need to change are raised above their correct values; often the value is not allowed to rise at all except during a rebuild. For example, GRAB[15] uses a single source and rebuilds when its error estimate is too high, and TTDD[13] builds the gradient on a static subgraph, which is rebuilt in case of delivery failure. These approaches work well in small networks and are typically tuned for a particular use case, but the lack of incremental maintenance means that there are generally conditions that will cause unnecessary rebuilding, persistent incorrectness, or both. Incremental repair does not discard values, but instead allows the gradient calculation to continue running and reconverge to the new correct values. Previous work on incremental repair (by Clement and Nagpal[7] and Butera[6]) has measured distance using hop-count. This has the effect of setting d(x, y) to a fixed value and therefore producing a consistent message speed through the network. If generalized to use actual distance measures, however, these approaches suffer from the rising value problem and may converge ex-
tremely slowly. Finally, our previous work in [1] uses a hybrid solution that adds a fixed amount of distortion at each hop, producing a gradient that does not suffer from the rising value problem, but produces inaccurate values.
3. THE CRF-GRADIENT ALGORITHM We have seen that a key problem for self-healing gradients is how to allow values to rise quickly yet still converge to good distance estimates. The CRF-Gradient algorithm handles this problem by splitting the calculation into constraint and restoring force behaviors (hence the acronym CRF). When constraint is dominant, the value of a device gx (t) stays fixed or decreases, set by the triangle inequality from its neighbors’ values. When restoring force is dominant, gx (t) rises at a fixed velocity v0 . The switch between these two behaviors is made with hysteresis, such that a device’s rising value is not constrained by a neighbor that might still be constrained by the device’s old value. The calculation for CRF-Gradient implements this by tracking both a device’s gradient value gx (t) and the “velocity” of that value vx (t). We use this velocity to calculate a relaxed constraint c′x (y, t) that accounts for communication lag when a device is rising: c′x (y, t) = cx (y, t) + (λx (y, t) + ∆t ) · vx (t) We will use the original cx (y, t) to exert constraint and the new c′x (y, t) to test whether any neighbor is able to exert constraint. Let the set of neighbors exerting constraint be Nx′ (t) = {y ∈ Nx (t)|c′x (y, t) ≤ gx (t − ∆t )} CRF-Gradient may thus be formulated
gx (t) =
8