Joseph Y. Halpern. Barbara Simons. Ray Strong. IBM Research Laboratory. San Jose, California 95193 ...... On receipt of this message, the joiner sets its corre-.
F A U L T - T O L E R A N T CLOCK S Y N C H R O N I Z A T I O N Joseph Y. H a l p e r n Barbara Simons Ray Strong IBM Research L a b o r a t o r y San Jose, California 95193 D a n n y Dolce Hebrew U n i v e r s i t y , G i v a t Ram 91904 Jerusalem, Israel
This paper gives two simple efficient dist r i b u t e d algorithms: one for keeping clocks in a n e t work s y n c h r o n i z e d and one for allowing n e w processors to join the n e t w o r k with their clocks synchronized. The algorithms tolerate both l i n k a n d node failures of any type. The algorithm for m a i n t a i n i n g s y n c h r o n i z a t i o n will work for a r b i t r a r y n e t works (rather t h a n just completely c o n n e c t e d n e t works) and tolerates any n u m b e r of processor or c o m m u n i c a t i o n l i n k faults as long as the correct processors remain c o n n e c t e d by fault-free paths. It thus represents a n i m p r o v e m e n t over other clock s y n c h r o n i z a t i o n algorithms such as [LM1,LM2,LL1]. Our algorithm for allowing new processors to join requires that more t h a n half the processors be correct, a r e q u i r e m e n t which is p r o v a b l y necessary.
Recently, m a n y protocols for r e s y n c h r o n i z a -
Abstract:
tion in the presence of faults have received wide att e n t i o n (cf. [ L M 1 , L M 2 , M a , L L 1 ] ) .
The algorithms
m e n t i o n e d above are all based on a n averaging process that involves reading the clocks of all the other processors.
Because of this use of averaging, there
must be more n o n f a u l t y t h a n faulty processors for these algorithms to work.
T w o of the algorithms
presented
and
ILL1]
in
[LM1,LM2]
the
algorithm
of
require 3f+1 processors in order to handle f
faults; a third algorithm of [LM1,LM2], which assumes the existence of unforgeable signatures, requires 2 f + 1 processors. The algorithms of [ M a ] , for
1. I n t r o d u c t i o n
I n a distributed system it is o f t e n necessary for
which n o worst case analysis is provided, deal with
processors to perform c e r t a i n a c t i o n s at roughly the
ranges of times rather t h a n a single logical clock time
same time.
and therefore are not directly comparable.
I n such a system each processor usually
possesses its o w n i n d e p e n d e n t clock.
However, de-
spite the marvels of modern technology, clocks tend
I n this paper a s y n c h r o n i z a t i o n algorithm is
to drift apart. Therefore, clocks must be r e s y n c h r o n -
presented that does not require a n y m i n i m u m n u m b e r
ized periodically.
of processors to handle f processor faults, so long as the n e t w o r k remains connected. The crucial p o i n t is that since we do not use averaging, it is not necessary that the majority of processors be correct. Moreover, our algorithm requires the t r a n s m i s s i o n of at most n 2
Permissionto copy without fee all or part of this materialis granted provided that the copies are not made or distributedfor direct commercial advantage, the ACM copyright notice and the title of the publicationand its date appear, and notice is giventhat copyingis by permission of the Associationfor Computing Machinery.To copy otherwise, or to republish,requires a fee and/or specific permission.
messages per s y n c h r o n i z a t i o n (where n is the total
©
ithms of [LM1,LM2] might need as m a n y as n f + l
1984 A C M 0-89791-143-1/84/008/0089
$00.75
n u m b e r of processors in the system).
of [LL1] and one of the algorithms of [LM1,LM2] also require only n 2 messages; the other two algor-
messages t o - t o l e r a t e f faults. 89
The algorithm
A final advantage of
our a l g o r i t h m is t h a t it c a n deal w i t h either p r o c e s s o r
agree on t h e e x p e c t e d t i m e for the next s y n c h r o n i z a -
or l i n k f a u l t s in a n y n e t w o r k , p r o v i d e d the n e t w o r k
tion.
remains
connected.
The
algorithms
of
[ L M I , L M 2 , L L I ] deal only with processor faults in a
I n p r a c t i c e the p e r i o d i c r e s y n c h r o n i z a t i o n algorithm must be s u p p l e m e n t e d by a method for syn-
completely connected network.
c h r o n i z i n g the o r i g i n a l p a r t i c i p a n t s and for b r i n g i n g The a l g o r i t h m is based on the f o l l o w i n g simple
in new processors.
O u r t e c h n i q u e s c a n also be used
If there are no f a u l t y processors, a
to c o n s t r u c t such a join algorithm, which c a n be used
p r o c e s s o r c a n be c h o s e n to be a synchronizer a n d to
to a l l o w n e w p r o c e s s o r s to join the n e t w o r k w i t h
b r o a d c a s t a message w i t h its c u r r e n t time once a n
their c l o c k s s y n c h r o n i z e d to those of a l r e a d y e x i s t i n g
hour (or day, or week, d e p e n d i n g on the f r e q u e n c y of
processors.
s y n c h r o n i z a t i o n required).
r e p a i r e d ( p r e v i o u s l y f a u l t y ) processors t h a t m u s t be
observation.
Each
processor w o u l d
This a l g o r i t h m can also be a p p l i e d to
t h e n adjust its c l o c k accordingly, m a k i n g m i n o r al-
r e s y n c h r o n i z e d w i t h the rest of the n e t w o r k .
The
l o w a n c e s if necessary for the t r a n s m i s s i o n time of t h e
join a l g o r i t h m requires t h a t fewer t h a n half the p r o c -
message.
essors in the n e t w o r k be f a u l t y in order to w o r k , a r e q u i r e m e n t w h i c h is p r o v a b l y necessary.
If t h e r e are faults, however, t h e n there are o b v i ous p r o b l e m s w i t h the a b o v e a p p r o a c h .
s y n c h r o n i z e r might b r o a d c a s t d i f f e r e n t messages (i.e. d i f f e r e n t times) to d i f f e r e n t processors, or it m i g h t b r o a d c a s t the same message but at d i f f e r e n t times, o r it might " f o r g e t " to b r o a d c a s t the message to some processors.
T h e r e m a i n d e r of the p a p e r is o r g a n i z e d as fol-
A faulty
N o t e t h a t it is not n e c e s s a r y to assume
lows.
I n the n e x t s e c t i o n the p r o b l e m is f o r m a l i z e d
and the precise a s s u m p t i o n s u n d e r l y i n g the a l g o r i t h m are described.
These a s s u m p t i o n s include the exist-
ence of a b o u n d e d rate of d r i f t b e t w e e n the c l o c k s of n o n f a u l t y processors, a k n o w n upper b o u n d on the
" m a l e v o l e n c e " on the p a r t of the s y n c h r o n i z e r for
transmission
such b e h a v i o r to occur.
processors, a n d the a b i l i t y to a u t h e n t i c a t e signatures.
F o r example, a s y n c h r o n i z e r
might fail in the middle of b r o a d c a s t i n g the message " T h e time is 9 A.M.," s p o n t a n e o u s l y r e c o v e r f i v e m i n u t e s later, and c o n t i n u e b r o a d c a s t i n g the same
time of messages b e t w e e n
nonfaulty
The r e s y n c h r o n i z a t i o n a l g o r i t h m is described in section 3 a n d a n a l y z e d in s e c t i o n 4. The degree of sync h r o n i z a t i o n o b t a i n e d is almost as tight as possible,
message. Thus, some of the processors w o u l d r e c e i v e
b u t a c a r e f u l discussion of this p r o p e r t y is b e y o n d the
the message " T h e time is 9 A . M . " at 9 A.M., w h i l e
scope of this p a p e r (v. [ DHS ] and [ L L 2 ] ). F i n a l l y ,
the r e m a i n d e r w o u l d receive it at 9:05.
the join a l g o r i t h m is p r e s e n t e d and a n a l y z e d in Section 5.
Nevertheless, the idea of using a s y n c h r o n i z e r c a n be m o d i f i e d to o b t a i n an e f f i c i e n t s y n c h r o n i z a t i o n a l g o r i t h m w h i c h is c o r r e c t even in the p r e s e n c e of faults. T h e k e y idea is to d i s t r i b u t e the role of t h e s y n c h r o n i z e r : e v e r y ( c o r r e c t ) processor will t r y to a c t as a s y n c h r o n i z e r at roughly the same time, a n d at least one will succeed.
2. A specification of the algorithm. I n this section b o t h the p r o p e r t i e s ( C S I - C S 3 ) t h a t t h e c l o c k s y n c h r o n i z a t i o n algorithm satisfies and the a s s u m p t i o n s ( A I - A 3 ) t h a t are made in the model are presented.
To ensure t h a t this r e a l l y
h a p p e n s at " r o u g h l y the same t i m e " , we use a p r o t o col t h a t g u a r a n t e e s t h a t all the c o r r e c t p r o c e s s o r s 90
T h e c l o c k of a p r o c e s s o r is defined to be a part i c u l a r time service d e l i v e r e d by that processor.
In
response to a time query the service responds with a
clocks at the b e g i n n i n g of the period.
n u m b e r i n d i c a t i n g the " t i m e . " In particular, the n o -
dr O. T h a t is: (A1)
schedule the s y n c h r o n i z a t i o n process tends to domi-
(l+p)'l(v-u) < C(v)-C(u) < (l+p)(v-u).
nate the time required to t r a n s m i t a message along F or t e c h n i c a l reasons the l e f t m o s t term has a f a c t o r
the c o m m u n i c a t i o n links.
of ( l + p ) -1 rather than the more c o m m o n l - p ; f o r
analyzed a r ef i n ed v er si o n of assumption (A2) (such
small p both approaches are essentially the same.
as that used in [ L L 1 , L M 1 , L M 2 ] ) that, if t is as above,
An
Therefore, we have not
a d v a n t a g e of (A1) is that it implies the s y m m e t r i c
then 8 - ~ < t < 8 + ~ .
condition
that our results could also be obtained using this re-
(l+p)'1(C(v)-C(u))
end k,
(I.6)
the k th c l o c k s of c o r r e c t processors d i f f e r b y
(1.7)
most
DMAX
I n either case, it passes a message o n to pj,
w h i c h arrives w i t h i n time trtG,F(h,j).
E i t h e r pj has
a l r e a d y started its k th c l o c k by the time the message arrives, or, as we n o w show, the message will pass the v a l i d i t y test of T a s k MSG, so t h a t pj will s t a r t its k th
(1.5)
at
MSG.
c l o c k w i t h i n trtG,F(i,j) of p~. in
the
interval
L e t X be the value of E T shared by all c o r r e c t
[ e n d k , e n d k + l ] ; thus CSI holds in this i n t e r -
processors a c c o r d i n g to h y p o t h e s i s (b). W h e n the k th
val,
c l o c k of a c o r r e c t processor is started, it is set to X.
c o n d i t i o n s (a) and (b) hold with k r e p l a c e d
Suppose pj has not started its k th c l o c k w h e n the
b y k + l and D r e p l a c e d by any D* >_DMAX.
message from Ph arrives. 95
If Ph sent the message as a
result of i n i t i a t i n g T a s k TM, this must have h a p p e n e d
this time (begk).
at time X on Ph'S k-1 st clock.
processors read a time a f t e r E T - ( f p + I ) D
Since, by h y p o t h e s i s
(a), p j ' s c l o c k differs from Ph'S by at most D, this
Thus, the ( k - l ) st c l o c k s of c o r r e c t
at beg k. This proves (1.4).
= ET-ADJ
I"1
h a p p e n s at a time later t h a n X - D on pj's clock. T h u s pj receives the message from Ph at a time later t h a n
Proof of (1.5). Suppose Pi is the first correct p r o c -
E T - D (since E T - - X , by hypothesis, until pj starts its
essor to start its ( k + l ) st clock, and let v' be its value
k th clock).
of E T i m m e d i a t e l y before the ( k + l ) st c l o c k is startl-
Since the message has one signature
(Ph'S), it passes the v a l i d i t y test.
ed. L e t v be the value on its k th c l o c k when the k th
N o w suppose Ph
sent the message to pj as a result of getting a valid
c l o c k is started.
message with s d i s t i n c t signatures. The message m u s t
ithm, it follows t h a t v' = v + P E R .
come at a time a f t e r X - s D on Ph'S clock. By a simi-
m e n t to that of (1.3) a b o v e shows t h a t Pi starts its
lar a r g u m e n t to t h a t above, it comes t o pj at a time
( k + l ) st c l o c k l a t e r t h a n v ' - f p D on its k th clock; i.e.,
after X-(s+I)D
C~(beg k + l ) > v'-fpD. F r o m (1.1), it follows that the
on p j ' s clock, and since it now has
F r o m the d e f i n i t i o n of the
s + l signatures ( i n c l u d i n g Ph'S), the message also p a s -
C~(end k) _> v + ( l + p ) d m i n .
ses the v a l i d i t y test for pj. []
tion
inequality,
algor-
A n identical argu-
By the I n t e r v a l Separa-
v+(l+p)dmin