Failure Detectors, Consensus and Self-Stabilization

Master 2 IFI, CSSR

Distributed Algorithms Francesco Bongiovanni INRIA Sophia Antipolis Research Center OASIS Team [email protected] Course web site : deptinfo.unice.fr/~baude/AlgoDist Nov. 2009

Chapter 7 : Failure Detectors, Consensus, Self-Stabilization

1

Acknowledgement  The slides for this lecture are based on ideas and materials from the following sources: 

Introduction to Reliable Distributed Programming Guerraoui, Rachid, Rodrigues, Luís, 2006, 300 p., ISBN: 3-540-28845-7 (+ teaching material)



ID2203 Distributed Systems Advanced Course by Prof. Seif Haridi from KTH – Royal Institute of Technology (Sweden)



CS5410/514: Fault-tolerant Distributed Computer Systems Course by Prof. Ken Birman from Cornell University



Distributed Systems : An Algorithmic Approach by Sukumar, Ghosh, 2006, 424 p.,ISBN:1-584-88564-5 (+teaching material)



Various research papers

2

Outline 1.

Failure Detectors 

Definition



Properties – completeness and accuracy



Classes of FDs



Two algorithms : PFD and EPFD



Leader Election vs Failure Detector

2.

Consensus 

Definition



Properties



Types of Consensus : regular and uniform



Algorithm: hierarchical consensus

3.

Self-stabilization 

Principle



Example: Dijkstra's Token ring

3

Failure detectors

4

System models  synchronous distributed system   

each message is received within bounded time each step in a process takes lb < time < ub each local clock’s drift has a known bound

 asynchronous distributed system   

no bounds on process execution no bounds on message transmission delays arbitrary clock drifts

the Internet is an asynchronous distributed system

5

Failure model  First we must decide what do we mean by failure? 

Different types of failures



Crash-stop (fail-stop)  A process halts and

does not execute any further operations 

Crash-recovery  A process halts, but then

recovers (reboots) after a while

Crashes Omissions Crashes and recoveries Arbitrary (Byzantine)

 Crash-stop failures can be detected in synchronous systems  Next: detecting crash-stop failures in asynchronous systems 6

What's a Failure Detector ?

Pi

Pj

7


Crash failure Pi

Pj

8


Needs to know about PJ's failure Crash failure Pi

Pj

9

1. Ping-ack protocol If pj fails, within T time units, pi will send it a ping message, and will time out within another T time units. Detection time = 2T

Needs to know about PJ's failure ping Pi

Pj ack - Pj replies

- Pi queries Pj once every T time units - if Pj does not respond within T time units, Pi marks pj as failed 10

2. Heart-beating protocol

Needs to know about PJ's failure

Pi

heartbeat

- if Pi has not received a new heartbeat for the past T time units, Pi declares Pj as failed

Pj - Pj maintains a sequence number - Pj send Pi a heartbeat with incremented seq. number after T' (=T) time units

If pj has sent x heartbeats until the time it fails, then p i will timeout within (x+1)*T time units in the worst case, and will detect pj as failed.

11

Failure Detectors  Abstracting time  FD provide information (not necessary fully accurate) about which processes have crashed



Use failure detectors to encapsulate timing assumptions  Black box giving suspicions regarding node failures  Accuracy of suspicions depends on model strength

12

Failure Detectors  Basic properties 

Completeness  Every crashed process is suspected



Accuracy  No correct process is suspected

Both properties comes in two flavours 

Strong and Weak 13

Failure Detectors  Strong Completeness 

Every crashed process is eventually suspected by every correct process

 Weak Completeness 

Every crashed process is eventually suspected by at least one correct process

 Strong Accuracy 

No correct process is ever suspected

 Weak Accuracy 

There is at least one correct process that is never suspected

14

Failure Detectors  Classes of FDs Accuracy

Weak

Weak

Strong

Eventually Weak

W

Q

Eventual Weak ◊W

◊Q

Strong S

Perfect P

Eventual Strong ◊S

Eventually Perfect ◊P

Completeness

Strong

Synchronous Systems

Eventually Strong

Asynchronous Systems 15

Perfect Failure Detector (P)

16

Perfect Failure Detector (P)

17

Correctness of P  PFD1 (strong completeness) 

A crashed node doesn’t send  Eventually every node will notice the absence of

 PFD2 (strong accuracy)  

Assuming local computation is negligible Maximum time between 2 heartbeats  γ + δ time units



If alive, all nodes will recv hb in time  No inaccuracy

18

Eventually Perfect Failure Detector

19

Eventually Perfect Failure Detector

20

Correctness of EPFD  PFD1 (strong completeness)  Same as before

 PFD2 (eventual strong accuracy) 

Each time p is inaccurately suspected by a correct q  Timeout T is increased at q  Eventually system becomes synchronous, and T becomes larger than the unknown

bound δ (T>γ +δ)

 q will receive HB on time, and never suspect p again

21

Leader Election

22

Leader Election vs Failure Detection  Failure detection captures failure behavior 

Detect failed nodes

 Leader election (LE) also captures failure behavior 

Detect correct nodes (a single & same for all)

 Formally, leader election is an FD  

Always suspects all nodes except one (leader) Ensures some properties regarding that node

23

Leader Election vs Failure Detection 

We’ll define two leader election algorithm  

Leader election (LE) which “matches” P Eventual leader election (Ω) which “matches” eventual P

24

Matching LE and P  P’s properties 

P always eventually detects failures (strong completeness)



P never suspects correct nodes (strong accuracy)

 Completeness of LE 

Informally: eventually ditch crashed leaders



Formally: eventually every correct node trusts some correct node

 Accuracy of LE 

Informally: never ditch a correct leader



Formally: No two correct nodes trust different correct nodes  Is this really accuracy?  

Yes! Assume two nodes trust different correct nodes One of them must eventually switch, i.e. leaving a correct node

25

LE desirable properties  LE always eventually detects failures  Eventually every correct node trusts some correct node

 LE is always accurate  No two correct nodes trust different correct nodes

 But the above two permit the following

 But P1 is “inaccurately” leaving a correct leader 26

LE desirable properties  To avoid “inaccuracy” we add  Local Accuracy:  If a node is elected leader by pi, all previously elected leaders

by pi have crashed Not allowed, as P1 is correct

27

Leader election - interface

28

Leader election - algorithm

29

Matching Ω and EPFD  Eventual P weakens P by only providing eventual accuracy 

Weaken LE to Ω by only guaranteeing eventual agreement

LE Properties: eventual

LE1 (eventual completeness). Eventually every correct node trusts some correct node 

LE2 (agreement). No two correct nodes trust different correct nodes 

LE3 (local accuracy).If a node is elected leader by pi, all previously elected leaders by pi have crashed 

30

Eventual Leader election - interface

31

Eventual Leader election - algorithm  See in the book...

32

Consensus (agreement)

33

Consensus  In the consensus problem, the processes propose values and have to agree on one among these values

B

A C

 Solving consensus is key to solving many problems in distributed computing (e.g., total order broadcast, atomic commit, terminating reliable broadcast)

34

Consensus – cannonical application  a set of servers implement a distributed database   

a subset of servers participate in a particular transaction some of the servers may fail the remaining servers must agree on whether to install the results of the transaction to the database or discard them

35

Consensus – cannonical application

36

Consensus – cannonical application

37

Consensus – basic properties 

Termination  Every correct node eventually decides



Agreement  No two correct processes decide differently



Validity  Any value decided is a value proposed



Integrity:  A node decides at most once

38

FLP impossibility result  Consensus in Asynchronous System 

Impossibility of consensus in the fail-silent model



FPL (Fischer, Lynch and Peterson 1985) : consensus is impossible in the fail-silent model with deterministic processes, even if only one process crashes



No way to satisfy agreement (safety) and termination (liveness) together

39

How to solve consensus in asynchronous systems with crashes ?  How to solve consensus in the presence of crashes ? 

Either we relaxed our system model, that is, we assume partial synchrony



Either we modify the specifications  Constraining the set of inputs  Change the termination property: terminates with some

probability 

Or... 40

How to solve consensus in asynchronous systems with crashes ?  Intuitively consensus is impossible to solve because :  

1) the decision depends on one process 2) we have no idea if this process is alive (we have to wait for its message) or dead.

 Thus we add to the asynchronous system what it needs in order to solve the consensus: 

Failure detectors

41

(regular) Consensus

42

(regular) Consensus  Sample execution

Question : does it satisfy consensus ? 43

Uniform consensus

44

Uniform consensus

Question: Does it satisfy uniform consensus ?

45

Hierarchical consensus  Use perfect fd (P) and best-effort bcast (BEB)  Each node stores its proposal in proposal  

Possible to adopt another proposal by changing proposal Store identity of last adopted proposer in lastprop

 Loop through rounds 1 to N 

In round i  node i is leader and  broadcasts proposal v, and decides proposal v

 other nodes  

adopt i’s proposal v and remember lastprop i or detect crash of i 46

Hierarchical consensus idea  Basic idea of hierarchical consensus 

There must be a first correct leader p,  P decides its value v and bcasts v  BEB ensures all correct nodes get v  

Every correct node adopts v Future rounds will only propose v

47

Problem with orphan messages...

Only adopt from node i if i>lastProp?

48

Invariant to avoid orphans  Leader in round r might crash, 

but much later affect some node in round>r

 Invariant  

adopt if proposer p is ranked lower than lastprop otherwise p has crashed and should be ignored

49

Execution without failure...

50

Execution with failure...

Is it uniform ?

51

Hierarchical consensus Impl. (1)

Last adopted proposal and Last adopted proposer id

52

Hierarchical consensus Impl. (2) set node’s initial proposal, unless it has already adopted another node’s If I am leader Trigger once per round Trigger if I have proposal Permanently decide Next round if deliver or crash Invariant: only adopt “newer” than what you have

53

Correctness  Validity 

Always decide own proposal or adopted value

 Integrity  

Rounds increase monotonically A node only decide once in the round it is leader

 Termination 

Every correct node makes it to the round it is leader in  If some leader fails, completeness of P ensures progress  If leader correct, validity of BEB ensures delivery

54

Correctness (2)  Agreement 

No two correct nodes decide differently

 Take correct leader with minimum id i  

By termination it will decide v It will BEB v  Every correct node gets v and adopts it  No older proposals can override the adoption  All future proposals and decisions will be v

 How many failures can it tolerate? 

N-1

55

Self-stabilization

56

Recall  Main challenges in distributed systems:  

Failures Concurrency

 In presence of (permanent) failures, a robust algorithm guarantees  

Liveness properties are eventually achieved Safety properties are never violated

57

Self-Stabilization  Self-stabilization is a different approach to fault tolerance  

it considers transient (temporary) failures it is more optimistic  If bad thing happen (safety is violated), the system will recover within a finite time, and will behave nicely afterwards.

58

Definition

 “A system is self-stabilizing when, regardless of its initial state, it is guaranteed to arrive at a legitimate state in a finite number of steps.” 1 Edsger W. Dijkstra

[1] Edsger W. Dijkstra, Self-stabilizing systems in spite of distributed control, Communications of the ACM, v.17 n.11, p.643644,Nov. 1974 59

Self-Stabilization  System S is self-stabilizing with respect to predicate P that identifies the legitimate states, if: 



Convergence  Starting from any arbitrary configuration, S is guaranteed to reach a configuration satisfying P, within a finite number of state transitions. Closure  P is closed under the execution of S. That is, once in a legitimate state, it will stay in a legitimate state.

60

Some advantages of Self-Stabilizing systems  No need for consistent initialization. 

Starting in any arbitrary state, the system will converge to a legitimate state.

 Possibility of sequential composition without the need for termination detection.

61

A self-stabilizing algorithm: Dijkstra's Token ring

62

Dijkstra's Token ring 





A single token circulates over the ring and grants privilege to the process holding it. N+1 processes: P0, P2,….,Pn Connected in a ring  Predecessor of Pi pred(Pi ) = P(i-1) mod N+1 

Successor of Pi

succ(Pi ) = P(i+1) mod N+1 63

Token Ring stabilization 



Pi has a local variable Xi Xi can take values from 0 to K-1 (K >= N)



Each process, can read the value of its predecessor (Shared Memory Model)



There is a scheduler, which selects a process at each step, in a random but fair manner.

64

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1

65

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

66


67


You have the token. 68


You have the token. 69


Fire: change your state. 70





Legitimate or illegitimate?

73


74


75


76


77


78

Proof of closure

79

Proof of closure

80

Proof of closure

81

Proof of closure 

If there is only a single token in the ring, when the machine that owns the token fires, it loses the token and will give it to its successor, and to no one else.



This single token is handed over along the ring.

82

Proof of convergence Lemma 1. P0 eventually receives the token. - assume it does not have the token, i.e. X0!= Xn - let j be the minimum value such that Xj!= X0 - for all i < j: Xi= X0 → Xj!= Xj+1 → Pj is privileged - Pj will fire, thus increasing j if j < N, or making X0 = Xn if j = N. → P0 will eventually receive the token.

83

Proof of convergence 

Initially all the process states are white.

84




Initially all the process states are white. Whenever P0 fires, we colour its state.

85






Initially all the process states are white. Whenever P0 fires, we colour its state. Whenever a state is copied from a coloured state, it gets the colour

86







Whenever a state is copied from a coloured state, it gets the colour



Whenever a state is checked upon a coloured state it gets the colour.

87











88











89


Lemma 2. At most after N firings at P0 , all the local

states are coloured. - Assume h is the number of times that P0 fires when Pn is white. - For each firing, X0 has to be the same as Xn. - So, Xn has taken h distinct values. - These values can only be copied from other nodes in the ring. - At the time of first firing, we can have at most N distinct values in the ring. - Therefore, h is bounded to N.

90


If P0 initially starts at state K-1, the first N firings of P0 have

created states 0 to N-1. (K >= N) 

When P0 is in state N-1,  



All the nodes are coloured. Scanning from P0 to Pn, the state of the nodes is in a nonincreasing order . next firing at P0 will happen when X0 = Xn = N-1.

→At the time of Nth firing of P0, all the states are N-1.

91

Proof of convergence

92

Proof of correctness



Starting in an arbitrary state, we ended up in a legitimate state. => Convergence



We also showed that, once in a legitimate state, we will remain in a legitimate state. => Closure

93

Failure Detectors, Consensus and Self-Stabilization

Failure Detectors, Consensus and Self-Stabilization

Suggest Documents

Consensus Based on Failure Detectors with a Perpetual Accuracy ...

Failure Detectors in Omission Failure Environments - CiteSeerX

Failure Detectors as Type Boosters â

Adapting Failure Detectors to Communication Network Load ...

the failure of the new macroeconomic consensus

Weak Synchrony Models and Failure Detectors for Message ... - ARiSE

Muteness Failure Detectors: Speci cation and ... - Semantic Scholar

Comparison of Failure Detectors and Group Membership - CiteSeerX

Comparison of Failure Detectors and Group ... - Infoscience - EPFL

Consensus Extraction from Heterogeneous Detectors to ... - Jiawei Han

Failure Detection and Consensus in the Crash-Recovery ... - CiteSeerX

Failure Detection and Consensus in the Crash-Recovery ... - CiteSeerX

Failure Detection and Exclusion via Range Consensus - CiteSeerX

Failure Detectors as First Class Objects - Xavier DÃ©fago

The weakest failure detectors to boost obstruction ... - Springer Link

On the Implementation of Unreliable Failure Detectors in ... - GSyC

Unreliable Failure Detectors for Reliable Distributed Systems* Tushar ...

Automatic Classification of Eventual Failure Detectors - Google Sites

The Weakest Failure Detectors to Boost ... - Infoscience - EPFL

on the quality of service of failure detectors

Efficient Algorithms to Implement Unreliable Failure Detectors in ...

An Impossibility about Failure Detectors in the Iterated ... - LaBRI

Assessing HPC Failure Detectors for MPI Jobs - NC State: WWW4 ...

Unreliable Failure Detectors for Reliable Distributed ... - Google Sites

Failure Detectors, Consensus and Self-Stabilization