Failure Detectors, Consensus and Self-Stabilization

56 downloads 98 Views 2MB Size Report
CS5410/514: Fault-tolerant Distributed Computer Systems Course by Prof. Ken Birman from Cornell University. ▫. Distrib
Master 2 IFI, CSSR

Distributed Algorithms Francesco Bongiovanni INRIA Sophia Antipolis Research Center OASIS Team [email protected] Course web site : deptinfo.unice.fr/~baude/AlgoDist Nov. 2009

Chapter 7 : Failure Detectors, Consensus, Self-Stabilization

1

Acknowledgement  The slides for this lecture are based on ideas and materials from the following sources: 

Introduction to Reliable Distributed Programming Guerraoui, Rachid, Rodrigues, Luís, 2006, 300 p., ISBN: 3-540-28845-7 (+ teaching material)



ID2203 Distributed Systems Advanced Course by Prof. Seif Haridi from KTH – Royal Institute of Technology (Sweden)



CS5410/514: Fault-tolerant Distributed Computer Systems Course by Prof. Ken Birman from Cornell University



Distributed Systems : An Algorithmic Approach by Sukumar, Ghosh, 2006, 424 p.,ISBN:1-584-88564-5 (+teaching material)



Various research papers

2

Outline 1.

Failure Detectors 

Definition



Properties – completeness and accuracy



Classes of FDs



Two algorithms : PFD and EPFD



Leader Election vs Failure Detector

2.

Consensus 

Definition



Properties



Types of Consensus : regular and uniform



Algorithm: hierarchical consensus

3.

Self-stabilization 

Principle



Example: Dijkstra's Token ring

3

Failure detectors

4

System models  synchronous distributed system   

each message is received within bounded time each step in a process takes lb < time < ub each local clock’s drift has a known bound

 asynchronous distributed system   

no bounds on process execution no bounds on message transmission delays arbitrary clock drifts

the Internet is an asynchronous distributed system

5

Failure model  First we must decide what do we mean by failure? 

Different types of failures



Crash-stop (fail-stop)  A process halts and

does not execute any further operations 

Crash-recovery  A process halts, but then

recovers (reboots) after a while

Crashes Omissions Crashes and recoveries Arbitrary (Byzantine)

 Crash-stop failures can be detected in synchronous systems  Next: detecting crash-stop failures in asynchronous systems 6

What's a Failure Detector ?

Pi

Pj

7

What's a Failure Detector ?

Crash failure Pi

Pj

8

What's a Failure Detector ?

Needs to know about PJ's failure Crash failure Pi

Pj

9

1. Ping-ack protocol If pj fails, within T time units, pi will send it a ping message, and will time out within another T time units. Detection time = 2T

Needs to know about PJ's failure ping Pi

Pj ack - Pj replies

- Pi queries Pj once every T time units - if Pj does not respond within T time units, Pi marks pj as failed 10

2. Heart-beating protocol

Needs to know about PJ's failure

Pi

heartbeat

- if Pi has not received a new heartbeat for the past T time units, Pi declares Pj as failed

Pj - Pj maintains a sequence number - Pj send Pi a heartbeat with incremented seq. number after T' (=T) time units

If pj has sent x heartbeats until the time it fails, then p i will timeout within (x+1)*T time units in the worst case, and will detect pj as failed.

11

Failure Detectors  Abstracting time  FD provide information (not necessary fully accurate) about which processes have crashed



Use failure detectors to encapsulate timing assumptions  Black box giving suspicions regarding node failures  Accuracy of suspicions depends on model strength

12

Failure Detectors  Basic properties 

Completeness  Every crashed process is suspected



Accuracy  No correct process is suspected

Both properties comes in two flavours 

Strong and Weak 13

Failure Detectors  Strong Completeness 

Every crashed process is eventually suspected by every correct process

 Weak Completeness 

Every crashed process is eventually suspected by at least one correct process

 Strong Accuracy 

No correct process is ever suspected

 Weak Accuracy 

There is at least one correct process that is never suspected

14

Failure Detectors  Classes of FDs Accuracy

Weak

Weak

Strong

Eventually Weak

W

Q

Eventual Weak ◊W

◊Q

Strong S

Perfect P

Eventual Strong ◊S

Eventually Perfect ◊P

Completeness

Strong

Synchronous Systems

Eventually Strong

Asynchronous Systems 15

Perfect Failure Detector (P)

16

Perfect Failure Detector (P)

17

Correctness of P  PFD1 (strong completeness) 

A crashed node doesn’t send  Eventually every node will notice the absence of

 PFD2 (strong accuracy)  

Assuming local computation is negligible Maximum time between 2 heartbeats  γ + δ time units



If alive, all nodes will recv hb in time  No inaccuracy

18

Eventually Perfect Failure Detector

19

Eventually Perfect Failure Detector

20

Correctness of EPFD  PFD1 (strong completeness)  Same as before

 PFD2 (eventual strong accuracy) 

Each time p is inaccurately suspected by a correct q  Timeout T is increased at q  Eventually system becomes synchronous, and T becomes larger than the unknown

bound δ (T>γ +δ)

 q will receive HB on time, and never suspect p again

21

Leader Election

22

Leader Election vs Failure Detection  Failure detection captures failure behavior 

Detect failed nodes

 Leader election (LE) also captures failure behavior 

Detect correct nodes (a single & same for all)

 Formally, leader election is an FD  

Always suspects all nodes except one (leader) Ensures some properties regarding that node

23

Leader Election vs Failure Detection 

We’ll define two leader election algorithm  

Leader election (LE) which “matches” P Eventual leader election (Ω) which “matches” eventual P

24

Matching LE and P  P’s properties 

P always eventually detects failures (strong completeness)



P never suspects correct nodes (strong accuracy)

 Completeness of LE 

Informally: eventually ditch crashed leaders



Formally: eventually every correct node trusts some correct node

 Accuracy of LE 

Informally: never ditch a correct leader



Formally: No two correct nodes trust different correct nodes  Is this really accuracy?  

Yes! Assume two nodes trust different correct nodes One of them must eventually switch, i.e. leaving a correct node

25

LE desirable properties  LE always eventually detects failures  Eventually every correct node trusts some correct node

 LE is always accurate  No two correct nodes trust different correct nodes

 But the above two permit the following

 But P1 is “inaccurately” leaving a correct leader 26

LE desirable properties  To avoid “inaccuracy” we add  Local Accuracy:  If a node is elected leader by pi, all previously elected leaders

by pi have crashed Not allowed, as P1 is correct

27

Leader election - interface

28

Leader election - algorithm

29

Matching Ω and EPFD  Eventual P weakens P by only providing eventual accuracy 

Weaken LE to Ω by only guaranteeing eventual agreement

LE Properties: eventual

LE1 (eventual completeness). Eventually every correct node trusts some correct node 

LE2 (agreement). No two correct nodes trust different correct nodes 

LE3 (local accuracy).If a node is elected leader by pi, all previously elected leaders by pi have crashed 

30

Eventual Leader election - interface

31

Eventual Leader election - algorithm  See in the book...

32

Consensus (agreement)

33

Consensus  In the consensus problem, the processes propose values and have to agree on one among these values

B

A C

 Solving consensus is key to solving many problems in distributed computing (e.g., total order broadcast, atomic commit, terminating reliable broadcast)

34

Consensus – cannonical application  a set of servers implement a distributed database   

a subset of servers participate in a particular transaction some of the servers may fail the remaining servers must agree on whether to install the results of the transaction to the database or discard them

35

Consensus – cannonical application

36

Consensus – cannonical application

37

Consensus – basic properties 

Termination  Every correct node eventually decides



Agreement  No two correct processes decide differently



Validity  Any value decided is a value proposed



Integrity:  A node decides at most once

38

FLP impossibility result  Consensus in Asynchronous System 

Impossibility of consensus in the fail-silent model



FPL (Fischer, Lynch and Peterson 1985) : consensus is impossible in the fail-silent model with deterministic processes, even if only one process crashes



No way to satisfy agreement (safety) and termination (liveness) together

39

How to solve consensus in asynchronous systems with crashes ?  How to solve consensus in the presence of crashes ? 

Either we relaxed our system model, that is, we assume partial synchrony



Either we modify the specifications  Constraining the set of inputs  Change the termination property: terminates with some

probability 

Or... 40

How to solve consensus in asynchronous systems with crashes ?  Intuitively consensus is impossible to solve because :  

1) the decision depends on one process 2) we have no idea if this process is alive (we have to wait for its message) or dead.

 Thus we add to the asynchronous system what it needs in order to solve the consensus: 

Failure detectors

41

(regular) Consensus

42

(regular) Consensus  Sample execution

Question : does it satisfy consensus ? 43

Uniform consensus

44

Uniform consensus

Question: Does it satisfy uniform consensus ?

45

Hierarchical consensus  Use perfect fd (P) and best-effort bcast (BEB)  Each node stores its proposal in proposal  

Possible to adopt another proposal by changing proposal Store identity of last adopted proposer in lastprop

 Loop through rounds 1 to N 

In round i  node i is leader and  broadcasts proposal v, and decides proposal v

 other nodes  

adopt i’s proposal v and remember lastprop i or detect crash of i 46

Hierarchical consensus idea  Basic idea of hierarchical consensus 

There must be a first correct leader p,  P decides its value v and bcasts v  BEB ensures all correct nodes get v  

Every correct node adopts v Future rounds will only propose v

47

Problem with orphan messages...

Only adopt from node i if i>lastProp?

48

Invariant to avoid orphans  Leader in round r might crash, 

but much later affect some node in round>r

 Invariant  

adopt if proposer p is ranked lower than lastprop otherwise p has crashed and should be ignored

49

Execution without failure...

50

Execution with failure...

Is it uniform ?

51

Hierarchical consensus Impl. (1)

Last adopted proposal and Last adopted proposer id

52

Hierarchical consensus Impl. (2) set node’s initial proposal, unless it has already adopted another node’s If I am leader Trigger once per round Trigger if I have proposal Permanently decide Next round if deliver or crash Invariant: only adopt “newer” than what you have

53

Correctness  Validity 

Always decide own proposal or adopted value

 Integrity  

Rounds increase monotonically A node only decide once in the round it is leader

 Termination 

Every correct node makes it to the round it is leader in  If some leader fails, completeness of P ensures progress  If leader correct, validity of BEB ensures delivery

54

Correctness (2)  Agreement 

No two correct nodes decide differently

 Take correct leader with minimum id i  

By termination it will decide v It will BEB v  Every correct node gets v and adopts it  No older proposals can override the adoption  All future proposals and decisions will be v

 How many failures can it tolerate? 

N-1

55

Self-stabilization

56

Recall  Main challenges in distributed systems:  

Failures Concurrency

 In presence of (permanent) failures, a robust algorithm guarantees  

Liveness properties are eventually achieved Safety properties are never violated

57

Self-Stabilization  Self-stabilization is a different approach to fault tolerance  

it considers transient (temporary) failures it is more optimistic  If bad thing happen (safety is violated), the system will recover within a finite time, and will behave nicely afterwards.

58

Definition

 “A system is self-stabilizing when, regardless of its initial state, it is guaranteed to arrive at a legitimate state in a finite number of steps.” 1 Edsger W. Dijkstra

[1] Edsger W. Dijkstra, Self-stabilizing systems in spite of distributed control, Communications of the ACM, v.17 n.11, p.643644,Nov. 1974 59

Self-Stabilization  System S is self-stabilizing with respect to predicate P that identifies the legitimate states, if: 



Convergence  Starting from any arbitrary configuration, S is guaranteed to reach a configuration satisfying P, within a finite number of state transitions. Closure  P is closed under the execution of S. That is, once in a legitimate state, it will stay in a legitimate state.

60

Some advantages of Self-Stabilizing systems  No need for consistent initialization. 

Starting in any arbitrary state, the system will converge to a legitimate state.

 Possibility of sequential composition without the need for termination detection.

61

A self-stabilizing algorithm: Dijkstra's Token ring

62

Dijkstra's Token ring 





A single token circulates over the ring and grants privilege to the process holding it. N+1 processes: P0, P2,….,Pn Connected in a ring  Predecessor of Pi pred(Pi ) = P(i-1) mod N+1 

Successor of Pi

succ(Pi ) = P(i+1) mod N+1 63

Token Ring stabilization 



Pi has a local variable Xi Xi can take values from 0 to K-1 (K >= N)



Each process, can read the value of its predecessor (Shared Memory Model)



There is a scheduler, which selects a process at each step, in a random but fair manner.

64

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1

65

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

66

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

67

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

You have the token. 68

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

You have the token. 69

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

Fire: change your state. 70

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

Fire: change your state. 71

Token Ring stabilization Transition rule for P1 to Pn if Xi != Xi-1 Xi := Xi-1 Transition rule for P0 if X0= Xn X0 := (X0 + 1) mod K

Fire: change your state. 72

Legitimate or illegitimate?

73

Legitimate or illegitimate?

74

Legitimate or illegitimate?

75

Legitimate or illegitimate?

76

Legitimate or illegitimate?

77

Legitimate or illegitimate?

78

Proof of closure

79

Proof of closure

80

Proof of closure

81

Proof of closure 

If there is only a single token in the ring, when the machine that owns the token fires, it loses the token and will give it to its successor, and to no one else.



This single token is handed over along the ring.

82

Proof of convergence Lemma 1. P0 eventually receives the token. - assume it does not have the token, i.e. X0!= Xn - let j be the minimum value such that Xj!= X0 - for all i < j: Xi= X0 → Xj!= Xj+1 → Pj is privileged - Pj will fire, thus increasing j if j < N, or making X0 = Xn if j = N. → P0 will eventually receive the token.

83

Proof of convergence 

Initially all the process states are white.

84

Proof of convergence 



Initially all the process states are white. Whenever P0 fires, we colour its state.

85

Proof of convergence 





Initially all the process states are white. Whenever P0 fires, we colour its state. Whenever a state is copied from a coloured state, it gets the colour

86

Proof of convergence 



Initially all the process states are white. Whenever P0 fires, we colour its state.



Whenever a state is copied from a coloured state, it gets the colour



Whenever a state is checked upon a coloured state it gets the colour.

87

Proof of convergence 



Initially all the process states are white. Whenever P0 fires, we colour its state.



Whenever a state is copied from a coloured state, it gets the colour



Whenever a state is checked upon a coloured state it gets the colour.

88

Proof of convergence 



Initially all the process states are white. Whenever P0 fires, we colour its state.



Whenever a state is copied from a coloured state, it gets the colour



Whenever a state is checked upon a coloured state it gets the colour.

89

Proof of convergence 

Lemma 2. At most after N firings at P0 , all the local

states are coloured. - Assume h is the number of times that P0 fires when Pn is white. - For each firing, X0 has to be the same as Xn. - So, Xn has taken h distinct values. - These values can only be copied from other nodes in the ring. - At the time of first firing, we can have at most N distinct values in the ring. - Therefore, h is bounded to N.

90

Proof of convergence 

If P0 initially starts at state K-1, the first N firings of P0 have

created states 0 to N-1. (K >= N) 

When P0 is in state N-1,  



All the nodes are coloured. Scanning from P0 to Pn, the state of the nodes is in a nonincreasing order . next firing at P0 will happen when X0 = Xn = N-1.

→At the time of Nth firing of P0, all the states are N-1.

91

Proof of convergence

92

Proof of correctness



Starting in an arbitrary state, we ended up in a legitimate state. => Convergence



We also showed that, once in a legitimate state, we will remain in a legitimate state. => Closure

93

Suggest Documents