A Distributed System for Detection, Isolation, and Recovery

1

DIRNET

arXiv:1504.07281v2 [cs.DC] 29 Apr 2015

The DIR Net A Distributed System for Detection, Isolation, and Recovery Vincenzo De Florio Katholieke Universiteit Leuven Departement Elektrotechniek Afdeling ESAT/ACCA

Technical Report ESAT/ACCA/1998/1

(pre-release TEX/0.6) 7 May, 1998 (revised on 27 April, 2015)

2

DIRNET

1.

The DIR net. This document describes the DIR net [1], a distributed environment which is part of the EFTOS fault tolerance framework [2]. The DIR net is a system consisting of two components, called DIR Manager (or, shortly, the manager) and DIR Backup Agent (shortly, the backup). One manager and a set of backups is located in the system to be ‘guarded’, one component per node. At this point the DIR net weaves a web which substantially does two things: • makes itself tolerant to a number of possible faults, and • gathers information pertaining the run of the user application. As soon as an error occurs within the DIR net , the system executes built-in recovery actions that allow itself to continue processing despite a number of hardware/software faults, possibly doing a graceful degradation of its features; when an error occurs in the user application, the DIR net, by means of custom- and user-defined detection tools, is informed of such events and runs one or more recovery strategies, both built-in and coded by the user using an ancillary compile-time tool, the rl translator [3]. Such tools translates the user-defined strategies into a binary “R-code”, i.e., a pseudo-code interpreted by a special component of the DIR net, the Recovery Interpreter, rint (in a sense, rint is a r-code virtual machine.) This document describes the generic component of the DIR net, a function which can behave either as manager or as backup. 2.

The first mailbox-id available to the DIR-net

#define DIR_MBOX_OFFSET 20 3.

Same as above, but for Alias-id’s

#define DIR_ALIAS_OFFSET 20

§4

DIRNET

THE DIR NET

3

4. On each node n ∈ [0, MAX_PROCS[ a DIR net component is to run. This component can be addresses internally by means of mailbox MBOX(n) and externally (from other nodes) by means of alias ALIAS(n). #define MBOX(i) DIR_MBOX_OFFSET #define IAT_MBOX DIR_MBOX_OFFSET + 1 #define RINT_MBOX DIR_MBOX_OFFSET + 2 #define DB_MBOX DIR_MBOX_OFFSET + 3 #define TOM_MBOX DIR_MBOX_OFFSET + 4 #define ALIAS(i) DIR_ALIAS_OFFSET #define IAT_ALIAS IAT_MBOX #define IA_FLAG_TIMEOUT 10 #define IA_FLAG_CYCLIC TOM_CYCLIC #define IA_FLAG_DEADLINE IMALIVE_CLEAR_TIMEOUT #define MIA_TIMEOUT 15 #define MIA_CYCLIC TOM_CYCLIC #define MIA_DEADLINE MIA_SEND_TIMEOUT #define TAIA_TIMEOUT 20 #define TAIA_CYCLIC TOM_CYCLIC #define TAIA_DEADLINE TAIA_RECV_TIMEOUT #define TEIF_TIMEOUT 30 #define TEIF_CYCLIC TOM_NON_CYCLIC #define TEIF_DEADLINE IMALIVE_SET_TIMEOUT #define IA_FLAG_TIMEOUT_B 50 #define IA_FLAG_CYCLIC_B TOM_CYCLIC #define IA_FLAG_DEADLINE_B IMALIVE_CLEAR_TIMEOUT #define MIA_TIMEOUT_B 55 #define MIA_CYCLIC_B TOM_CYCLIC #define MIA_DEADLINE_B MIA_RECV_TIMEOUT #define TAIA_TIMEOUT_B 60 #define TAIA_CYCLIC_B TOM_CYCLIC #define TAIA_DEADLINE_B TAIA_SEND_TIMEOUT #define TEIF_TIMEOUT_B 70 #define TEIF_CYCLIC_B TOM_NON_CYCLIC #define TEIF_DEADLINE_B IMALIVE_SET_TIMEOUT #define IAT_TIMEOUT 40 #define IAT_CYCLIC TOM_CYCLIC #define IAT_DEADLINE IMALIVE_SET_TIMEOUT #define INJECT_FAULT_TIMEOUT 6 #define INJECT_FAULT_DEADLINE 6000000 /∗ 6 secs ∗/ h Global Variables and # include’s 5 i h Generic component of the DIR net 6 i h Alarm function 52 i h I’m Alive Task 61 i h GetState and SetState 56 i h TEX routines simulated on EPX 59 i h DIR Print Message 60 i

4

THE DIR NET

5.

We need to include a number of header files, i.e., those pertaining the timeout manager, ...

h Global Variables and # include’s 5 i ≡ #include #include #include #include #include #include #include #include #include #include #include "tom.h" #include "timeouts.h" #include "dirdefs.h" #ifdef EPX1_2 typedef int IDF; #endif #include "dirtypes.h" #include "rcode.h" #include "trl.h" /∗ TEX-specific includes types ∗/ #ifdef TEX #include #include #include #else int TEXFirstActivation = 1; typedef int Alias t; typedef int IDF; #define MSG_OK 0 #define INFINITE 0 #endif #ifdef EPX1_2 #include int RemoteSendMessage (int, int, char ∗, int); int TEXSendMessage (int, char ∗, int); int TEXReceiveMessage (int, char ∗, int ∗, int); int GetRoot (void); #endif int TEXGetState ( status t ∗ ) ; int TEXSetState ( status t ∗ ) ; #ifndef EPX int Export (Alias t, IDF); STATUSTEXGetTaskStatus (IDF); #endif int TEXGetNumTasks (void); int TEXStopTask (void); int TEXRestartTask (void); int flag = 0; char ∗role2ascii (int); char ∗DIRPrintTimeout (int); char ∗DIRPrintMessage (int);

DIRNET

§5

§5

DIRNET

char ∗DIRPrintCode (int); int send timeout message ( TOM ∗ ) ; extern Semaphore t sem , ∗tom sem ; #ifndef _TOM__H_ typedef struct { int running ; int deadline ; int(∗alarm )(struct TOM ∗); unsigned char id , subid ; unsigned char cyclic ; unsigned char suspended ; } timeout t; typedef struct block t { struct block t ∗next ; timeout t timeout ; unsigned char used ; } block t; typedef struct TOM { block t ∗top ; int tom id ; block t ∗block stack ; int block sp ; int(∗default alarm )(struct TOM ∗); LinkCB t ∗ link [2]; unsigned int starting time ; } TOM; typedef struct { timeout t ∗timeout ; unsigned char code ; } tom message t; #endif #ifndef __DIR_TYPES_ typedef struct { char configuration ; int runlevel ; } DIR state t; typedef struct { char primary ; char role ; } status t; #endif #define MAXARG 5 typedef struct { int arg [MAXARG]; int subid ; int type ; char local ; } message t; int IA flag ; message t message ; extern DIR db t db ; int errors ;

THE DIR NET

5

6

THE DIR NET

DIRNET

§5

This code is used in section 4.

6. On every node a DIRNetGenericComponent ( ) function is run. This function first runs a protocol to understand its role and who is the manager, and to build or re-build a global database; after this phase it runs either as a manager or as a backup agent. Merging the codes for the backup and the manager into one running component has two major benefits: (1) you need to foresee n replicas of this generic component, without specifying which role each replica has to play, and (2) a lot of code is shared between the two components. h Generic component of the DIR net 6 i ≡ void DIRNetGenericComponent (void) { h Variables local to the Manager and the Backup InitSem (tom sem , 1); h DIR net initialisation 8 i h Spawn the I’m Alive Task 69 i if (role ≡ DIR_MANAGER) { h DIR net manager 9 i } else { h DIR net backup agent 33 i } TEXStopTask ( ); }

7i


7. These variables are shared between the Manager and the Backup Agent. h Variables local to the Manager and the Backup 7 i ≡ status t mystate ; STATUSstatus ; int role , managerid ; TOM ∗tom ; timeout t mia [MAX_PROCS], taia [MAX_PROCS], teif , ia , inject ; char suspicion period [MAX_PROCS]; int n, sender ; This code is used in section 6.

§8

DIRNET

THE DIR NET

7

8. This function initialises the DIR net. It checks whether this node and this components are both “new” (a node is new if it has never been rebooted; a component is “new” if it is the original component of this node, i.e., a primary). If so, it gets the role from the state, as stored in the database, otherwise, it asks the neighbouring nodes who is the manager. This section also takes care of building or receiving the system database: if the node is new, then a local database is built and the global database is created by a broadcast session; otherwise, the global database is requested from a remote node. h DIR net initialisation 8 i ≡ TEXGetState (&mystate ); if (TEXFirstActivation ∧ mystate .primary ) { /∗ only two roles are possible – backup or manager ∗/ h Read your role from the RL script 57 i; if (db .role ≡ DIR_BACKUP) role = DIR_BACKUP; else role = DIR_MANAGER; LogError (EC_ERROR, "Generic", "role of %d is %s", GetRoot ( ), role2ascii (role )); fork ( ); /∗ export your main mailbox ∗/ Export (ALIAS(GetRoot ( )), MBOX(GetRoot ( ))); /∗ export your I’m Alive Task’s mailbox ∗/ Export ((Alias t) IAT_ALIAS, (IDF) IAT_MBOX); /∗ export the database mailbox ∗/ Export ((Alias t) DB_MBOX, (IDF) DB_MBOX); h Check who is the manager according to the RL script 58 i; } else { h send WITM to all 54 i h wait for NMI messages to come 55 i if (message .arg [0] ≡ GetRoot ( )) role = DIR_MANAGER; else role = DIR_BACKUP; managerid = message .arg [0]; } if (TEXFirstActivation ) { int i; n = TEXGetNumTasks ( ); h store number of tasks (n); 44 i for (i = 0; i < n; i ++ ) { status = TEXGetTaskStatus ((IDF) i); h store in local database(i, status) 45 i } h broadcast local database and receive the others’ databases 46 i h build a global database 49 i h mark as ‘reboot-resistant’ the whole database via DataReset ( ) 50 i } else { h request a copy of the global database 51 i } LogError (EC_MESS, "Generic", "global database OK"); This code is used in section 6.

8

THE DIR NET

9.

The code for the manager of the DIR net.

DIRNET

§9

h DIR net manager 9 i ≡ { managerid = GetRoot ( ); LogError (EC_ERROR, "Manager", "Manager starts..."); /∗ the alarm of these timeouts simply sends a message of id , subid = subid to the manager ∗/ h insert timeouts (IA-flag-timeout, MIA-timeouts, TAIA-timeouts) 10 i h clear the suspicion period [ ]’s 14 i h clear IA-flag 16 i LogError (EC_ERROR, "Manager", "Manager activates IAT..."); h activate IAT 17 i LogError (EC_ERROR, "Manager", "Manager loop starts..."); h manager loop (waiting for incoming messages) 18 i } /∗ end manager ∗/ This code is used in section 6.

10.

The manager initialises a set of timeout objects and inserts them into the timeout list [4] [5].

h insert timeouts (IA-flag-timeout, MIA-timeouts, TAIA-timeouts) { tom = tom init (send timeout message ); h declare and insert MIA timeouts 11 i h declare and insert the IA-flag timeout 12 i h declare and insert TAIA timeouts 13 i }

10 i

≡


11. At most every MIA_DEADLINE ticks a “Manager Is Alive” (MIA) message needs to start towards a backup. h declare and insert MIA timeouts { int i;

11 i

≡

for (i = 0; i < MAX_PROCS; i ++ ) { if (i ≡ GetRoot ( )) continue; tom declare (mia + i, MIA_CYCLIC, TOM_SET_ENABLE, MIA_TIMEOUT, i, MIA_DEADLINE); tom insert (tom , mia + i); } #ifdef INJECT tom declare (&inject , TOM_NON_CYCLIC, TOM_SET_ENABLE, INJECT_FAULT_TIMEOUT, i, INJECT_FAULT_DEADLINE); tom insert (tom , &inject ); #endif } This code is used in section 10.

§12

DIRNET

THE DIR NET

9

12. Every IMALIVE_CLEAR_TIMEOUT ticks at most the IA-flag must be cleared, or the IAT will consider this component as crashed. This is accomplished by means of the following cyclic timeout: h declare and insert the IA-flag timeout 12 i ≡ { tom declare (&ia , IA_FLAG_CYCLIC, TOM_SET_ENABLE, IA_FLAG_TIMEOUT, 1, IMALIVE_CLEAR_TIMEOUT); tom insert (tom , &ia ); } This code is used in sections 10 and 34.

13. At most every TAIA_DEADLINE ticks a “This Agent Is Alive” (TAIA) message coming from a backup needs to be received by the Manager. h declare and insert TAIA timeouts { int i;

13 i

≡

for (i = 0; i < MAX_PROCS; i ++ ) { if (i ≡ GetRoot ( )) continue; tom declare (taia + i, TAIA_CYCLIC, TOM_SET_ENABLE, TAIA_TIMEOUT, i, TAIA_DEADLINE); tom insert (tom , taia + i); } } This code is used in section 10.

14.

Suspicion periods are globally set to off

h clear the suspicion period [ ]’s 14 i ≡ { int i; for (i = 0; i < MAX_PROCS; i ++ ) { suspicion period [i] = 0; } } This code is used in section 9.

15. This is the same as above, but for the backup agent. It only manages one suspicion period, viz. the one of the manager; let us choose 0 as the manager’s suspicion period. h clear suspicion period 15 i ≡ ∗suspicion period = 0; This code is used in section 33.

16. IA flag is shared between this component and its I’m Alive Task. At initialisation time and at each new message of type IA_FLAG_TIMEOUT this flag has to be cleared. h clear IA-flag 16 i ≡ IA flag = 0; This code is used in sections 9, 18, 33, and 37.

10

THE DIR NET

DIRNET

§17

17. This message wakes up the I’m Alive Task, which will start checking periodically whether the IA-flag has been cleared. (Note: each component knows that there are nProcs-1 fellows in the dirnet. It is assumed that fellow i gets mail on mailbox i, so the first nProcs -1 integers are reserved for this reason. Furthermore, each node shall export its mailbox via Export ((Alias t) GetRoot ( ), (IDF) GetRoot ( )).) Mailbox whose id and Alias is nProcs is assumed to be the I’m Alive Task, mailbox nProcs +1 is the recovery thread. h activate IAT 17 i ≡ message .type = ROUSE; message .subid = GetRoot ( ); TEXSendMessage (IAT_MBOX, (char ∗) &message , sizeof (message )); This code is used in sections 9, 18, 33, and 37.

§18

DIRNET

THE DIR NET

11

18. This loop is the real core of the manager. It has to deal with a number of messages coming from the timeout manager, its fellow backups, the recovery thread, the remote I’m Alive Tasks. The core of the fault-tolerant strategy of the DIR net is in here. h manager loop (waiting for incoming messages) 18 i ≡ while (1) { h wait for an incoming message 53 i tom dump (tom ); switch (message .type ) { case INJECT_FAULT_TIMEOUT: LogError (EC_ERROR, "Manager loop", "Fault injection"); /∗ the time-out manager is detached ∗/ tom close (tom ); break; case IA_FLAG_TIMEOUT: LogError (EC_ERROR, "Manager loop", "IA_FLAG_TIMEOUT message −> clear IA−flag."); /∗ time to clear the IA-flag! ∗/ h clear IA-flag 16 i break; case MIA_TIMEOUT: LogError (EC_ERROR, "Manager loop", "MIA_TIMEOUT message (time to send a MIA to Backup %d).", message .subid ); /∗ time to send a MIA to a backup ∗/ h send MIA to backup subid 19 i tom dump (tom + message .subid ); tom renew (tom , mia + message .subid ); break; case DB: LogError (EC_ERROR, "Manager loop", "DB message."); /∗ a message which modifies the db ∗/ if (message .subid ≡ GetRoot ( )) /∗ the message is local ∗/ { h renew MIA timeout on all subid ’s 20 i h update your copy of the db 22 i h broadcast modifications to db 21 i break; } else /∗ if it’s a remote message, it’s also a piggybacked TAIA ∗/ { h update your copy of the db 22 i /∗ don’t break ∗/ } /∗ don’t break ∗/ case TAIA: LogError (EC_ERROR, "Manager loop", "TAIA message from node %d.", message .subid ); /∗ if a TAIA comes in or a remote DB message comes in... ∗/ if (¬tom ispresent (tom , taia + message .subid )) { h insert TAIA-timeout, subid=subid 23 i h broadcast NIUA 24 i /∗ node is up again! ∗/ } /∗ if you get a TAIA while expecting a TEIF, then no problem, simply get out of the suspicion period ∗/ if (suspicion period [message .subid ] ≡ TRUE) { LogError (EC_ERROR, "Manager", "got a TAIA while expecting a TEIF => get out of the suspicion period"); suspicion period [message .subid ] = FALSE; } else {

12

THE DIR NET

DIRNET

§18

LogError (EC_ERROR, "Manager", "got a due TAIA in time −− renewing TAIA timeout %d.", message .subid ); h renew TAIA-timeout, subid 25 i } break; case TAIA_TIMEOUT: LogError (EC_ERROR, "Manager loop", "TAIA_TIMEOUT message: no heartbeat from Backup %d −− entering suspicion period", message .subid ); /∗ no heartbeat from a remote component. . . enter a suspicion period then. ∗/ suspicion period [message .subid ] = TRUE; h insert TEIF timeout 26 i h delete timeout (TAIA-timeout, subid ) 27 i break; case TEIF: LogError (EC_ERROR, "Manager loop", "TEIF message: IAT@node%d sent an alarm and went to sleep.", message .subid ); /∗ a TEIF message has been sent from a IAT: as its last action, the IAT went to sleep ∗/ if (suspicion period [message .subid ] ≡ TRUE) { /∗ ∗/ h delete timeout (TEIF-timeout, subid ) 28 i suspicion period [message .subid ] = FALSE; /∗ agent recovery will spawn a clone of the backup subid . If no backup clones are available, the entire node will be rebooted. ∗/ h Agent-Recovery(subid ) 29 i } else { if (message .subid ≡ GetRoot ( )) { h clear IA-flag 16 i h activate IAT 17 i } else { int to = message .subid ; message .type = ENIA; /∗ i.e., “ENable IAt” ∗/ message .subid = GetRoot ( ); RemoteSendMessage (to , ALIAS(to ), (char ∗) &message , sizeof (message )); } } break; case TEIF_TIMEOUT: LogError (EC_ERROR, "Manager loop", "TEIF_TIMEOUT message −− the Manager con\ cludes that the suspected node (%d) has crashed.", message .subid ); if (suspicion period [message .subid ] ≡ TRUE) { h delete timeout (TAIA-timeout, subid ) 27 i /∗ the entire node will be rebooted. ∗/ suspicion period [message .subid ] = FALSE; h Node-Recovery(subid ) 31 i } break; case ENIA: LogError (EC_ERROR, "Manager loop", "ENIA message."); h clear IA-flag 16 i /∗ if an IAT gets an “activate” message while it’s active, that message is ignored ∗/ h activate IAT 17 i break;

§18

DIRNET

THE DIR NET

13

case WITM: LogError (EC_ERROR, "Manager loop", "WITM message."); sender = message .subid ; message .subid = message .arg [0] = GetRoot ( ); message .type = NMI; RemoteSendMessage (sender , ALIAS(sender ), (char ∗) &message , sizeof (message )); break; case NIUA: LogError (EC_ERROR, "Backup", "NIUA message −− node %d is up again −− watching restarts...", message .subid ); /∗ a Node Is Up Again: restart watchin’ it ∗/ if (¬tom ispresent (tom , taia + message .subid )) tom insert (tom , taia + message .subid ); break; case REQUEST_DB: LogError (EC_ERROR, "Backup", "REQUEST_DB message."); /∗ a node is requesting a full copy of the database ∗/ RemoteSendMessage (message .subid , ALIAS(message .subid ), (char ∗) &db , sizeof (db )); break; default: LogError (EC_ERROR, "Backup", "Other messages."); h deal with these other messages 32 i } /∗ end switch (message.type) ∗/ h clear IA-flag 16 i h renew IA-flag-timeout 43 i } /∗ end manager loop ∗/ This code is used in section 9.

19.

A “Manager Is Alive” message needs to be sent to component message .subid :

h send MIA to backup subid 19 i ≡ message .type = MIA; /∗ Note: MIA.arg[0] == managerid!! ∗/ message .arg [0] = GetRoot ( ); RemoteSendMessage (message .subid , ALIAS(message .subid ), (char ∗) &message , sizeof (message )); This code is used in section 18.

20. A broadcast is about to take place—this implies that a suite of implicit MIA’s will be sent in piggybacking. As a consequence, all MIA_SEND_TIMEOUT’s needs to be renewed. h renew MIA timeout on all subid ’s { int i;

20 i

≡

for (i = 0; i < GetRoot ( ); i ++ ) { tom renew (tom , mia + i); } for (i ++ ; i < MAX_PROCS; i ++ ) { tom renew (tom , mia + i); } } This code is used in section 18.

14

THE DIR NET

DIRNET

§21

21. A message modified the database, and that message was local, i.e., generated on this node. To keep all instances of the database up to date, we need to propagate this message to all the other components: h broadcast modifications to db 21 i ≡ { int i; for (i = 0; i < GetRoot ( ); i ++ ) { RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); } for (i ++ ; i < MAX_PROCS; i ++ ) { RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); } } This code is used in sections 18 and 37.

22. The just arrived message is of type DB, i.e., it concerns the database. Our copy of the database needs then to be updated. h update your copy of the db 22 i ≡ switch (message .arg [0]) { case DB_NEW_STATUS: db .node [message .subid ].status = message .arg [1]; break; case DB_NEW_ROLE: db .node [message .subid ].role = message .arg [1]; break; case DB_INC_REBOOT: db .node [message .subid ].reboot nr ++ ; break; case DB_NEW_TASK_STATUS: db .node [message .subid ].task [message .arg [1]].status = message .arg [2]; break; case DB_NEW_TASK_ERROR: db .node [message .subid ].task [message .arg [1]].status = message .arg [2]; break; } This code is used in sections 18 and 37.

23. A TAIA_SEND_TIMEOUT needs to be inserted again in the timeout list. h insert TAIA-timeout, subid=subid 23 i ≡ tom insert (tom , taia + message .subid ); This code is used in section 18.

24.

A node is up again. Broadcast the news (of course, skipping yourself and the node back to life. . . )

h broadcast NIUA 24 i ≡ { int i, n, this; /∗ note: message.subid is the id of the Node which Is Up Again ∗/ for (message .type = NIUA, n = MAX_PROCS, i = 0, this = GetRoot ( ); i < n; i ++ ) if (i 6= this ∧ i 6= message .subid ) RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); } This code is used in section 18.

25.

A node is up again, so I need to start again watching its component.

h renew TAIA-timeout, subid 25 i ≡ tom renew (tom , taia + message .subid ); This code is used in section 18.

§26

DIRNET

THE DIR NET

15

26. We just entered a suspicion period—we need to discriminate the case ‘an agent is down’ from the case ‘a complete node is down’. To do so, we start waiting for at most IMALIVE_SET_TIMEOUT for some signs of life coming from the I’m Alive Task on node message .subid . This is managed bu simply inserting an acyclic timeout in the timeout list as follows: h insert TEIF timeout 26 i ≡ tom declare (&teif , TEIF_CYCLIC, TOM_SET_ENABLE, TEIF_TIMEOUT, message .subid , IMALIVE_SET_TIMEOUT); tom insert (tom , &teif ); This code is used in sections 18 and 37.

27. If we are suspecting an agent, there’s no need to expect something from it (dosn’t it sound like philosophy? ;-) so we need to suspend the cyclic TAIA timeout; we do that deleting it temporarily from the list. h delete timeout (TAIA-timeout, subid ) 27 i ≡ tom delete (tom , taia + message .subid ); This code is used in section 18.

28. Luckily, only the remote component is crashed, not the entire node where it was running onto. This seems to be true because a TEIF message has been sent from the I’m Alive Task of that node. The TEIF timeout is consequently deleted. h delete timeout (TEIF-timeout, subid ) tom delete (tom , &teif );

28 i

≡


29. Agent recovery will spawn a clone of the backup subid . If no backup clones are available, the entire node will be rebooted. Now the problem is—who can do that? The only component being alive on that node is. . . the I’m Alive Task, so the only way to accomplish this task should be by sending a “spawn new component” (SPAN) message to that I’m Alive Task: h Agent-Recovery(subid ) 29 i ≡ message .type = SPAN; RemoteSendMessage (message .subid , IAT_ALIAS, (char ∗) &message , sizeof (message )); This code is used in section 18.

30. Same as above, but for the manager. h Manager-Recovery(managerid ) 30 i ≡ message .type = SPAN; RemoteSendMessage (managerid , IAT_ALIAS, (char ∗) &message , sizeof (message )); This code is used in section 37.

31. This section covers the case no sign of life seems to come from a suspected node. The node is therefore rebooted. Open problem: if TEX has a local scope, who can do this? Probably this will require TEX to send a message to its host via a (burst of) UDP writes; then the host will take care of rebooting the node in question or to do something else, e.g., triggering an alarm for the operator. Anyway, for us this is simply a TEXReboot ( ). h Node-Recovery(subid )

31 i

≡

This code is used in sections 18 and 37.

/∗ TEXReboot(message.subid); TEXReset(message.subid); ∗/

16

THE DIR NET

DIRNET

§32

32. Other messages are foreseen—their management will be put in here. Note: for the moment, this section is shared amongst manager and backups. h deal with these other messages 32 i ≡ { switch (message .type ) { /∗ ...to be added... ∗/ } /∗ end switch ∗/ } This code is used in sections 18 and 37.

33. A backup agent can be considered as a manager of a system collapsing to its node. It only takes localscope decisions and actions. Nevertheless, quite a lot of its code is inherited almost without modification from the Manager, and its structure is basically the same of this latter. h DIR net backup agent 33 i ≡ { LogError (EC_ERROR, "Backup", "Backup starts..."); /∗ the alarm of these timeouts simply sends a message of id ¡timeout-type¿, subid = subid to the backup agent ∗/ h insert four timeouts (IA-flag, MIA, TAIA, and TEIF) 34 i /∗ TimeWaitHigh(TimeNowHigh() + 300000); ∗/ /∗ suspicion period is set to off ∗/ h clear suspicion period 15 i h clear IA-flag 16 i LogError (EC_ERROR, "Backup", "activating IAT..."); LogError (EC_ERROR, "Backup", "Backup activates IAT..."); h activate IAT 17 i LogError (EC_ERROR, "Backup", "Backup loop starts..."); h backup loop (waiting for incoming messages) 37 i } /∗ end backup ∗/ This code is used in section 6.

34. The backup initialises a set of timeout objects and inserts them into the timeout list, more or less the way the manager does; only, there’s just one MIA timeout comin’ in and one TAIA coming out. h insert four timeouts (IA-flag, MIA, TAIA, and TEIF) 34 i ≡ { tom = tom init (send timeout message ); h declare and insert the one MIA timeout 35 i h declare and insert the IA-flag timeout 12 i h declare and insert the one TAIA timeout 36 i } This code is used in section 33.

35. At most every MIA_DEADLINE_B ticks a “Manager Is Alive” (MIA) message needs to be received by a backup. Note how, regardless the actual value of managerid , the entry being filled in is always entry 0. h declare and insert the one MIA timeout 35 i ≡ { LogError (EC_ERROR, "backup", "managerid == %d", managerid ); tom declare (mia + managerid , MIA_CYCLIC_B, TOM_SET_ENABLE, MIA_TIMEOUT_B, managerid , MIA_DEADLINE_B); tom insert (tom , mia + managerid ); tom dump (tom ); } This code is used in section 34.

§36

DIRNET

THE DIR NET

17

36. At most every TAIA_DEADLINE_B ticks a “This Agent Is Alive” (TAIA) message needs to be sent from this backup to the Manager. Note how, regardless the actual value of managerid , the entry being filled in is always entry 0. h declare and insert the one TAIA timeout 36 i ≡ { tom declare (taia , TAIA_CYCLIC_B, TOM_SET_ENABLE, TAIA_TIMEOUT_B, 0, TAIA_DEADLINE_B); tom insert (tom , taia ); } This code is used in section 34.

18

THE DIR NET

DIRNET

§37

37. This loop is the real core of the backup agent, the way the manager loop was for the manager. It has do deal with a number of messages coming from the timeout manager, the manager, remote I’m Alive Tasks. As we said in the corresponding section of the manager, this is the core of the fault-tolerant strategy of the DIR net. h backup loop (waiting for incoming messages) 37 i ≡ while (1) { h wait for an incoming message 53 i LogError (EC_ERROR, "Backup", "message received %d [type == %s] from node %d", message .type , DIRPrintCode (message .type ), message .local ? GetRoot ( ) : message .subid ); switch (message .type ) { case IA_FLAG_TIMEOUT: /∗ time to clear the IA-flag! ∗/ h clear IA-flag 16 i LogError (EC_ERROR, "Backup loop", "IA_FLAG_TIMEOUT −> IA−flag cleared"); break; case TAIA_TIMEOUT_B: /∗ time to send a TAIA to the manager ∗/ LogError (EC_ERROR, "Backup loop", "TAIA_TIMEOUT_B −> send TAIA to manager"); h send TAIA to the manager 38 i break; case DB: /∗ a message which modifies the db ∗/ LogError (EC_ERROR, "Backup loop", "DB message."); if (message .subid ≡ GetRoot ( )) /∗ the message is local ∗/ { h update your copy of the db 22 i h broadcast modifications to db 21 i /∗ this is sent also to the manager, therefore it is a TAIA in piggybacking. ∗/ h renew TAIA timeout 39 i } else { h update your copy of the db 22 i if (message .subid ≡ managerid ) tom renew (tom , mia + managerid ); } break; case MIA: LogError (EC_ERROR, "Backup loop", "MIA message: managerid==%d, arg[0]==%d, %s.", managerid , message .arg [0], (managerid ≡ message .arg [0]) ? "equal" : "different"); /∗ if a MIA comes in... ∗/ if (¬tom ispresent (tom , mia + message .arg [0])) { LogError (EC_ERROR, "Backup loop", "MIA timeout %d is not present!", message .arg [0]); tom dump (tom ); /∗ a new manager has been chosen ∗/ tom delete (tom , mia + managerid ); LogError (EC_ERROR, "Backup loop", "MIA timeout %d deleted!", managerid ); managerid = message .arg [0]; LogError (EC_ERROR, "Backup loop", "New manager is %d", managerid ); } /∗ if you get a MIA while expecting a TEIF, then no tom renew (tom , mia + managerid ); problem, simply get out of the suspicion period ∗/ if (∗suspicion period ≡ TRUE) { ∗suspicion period = FALSE; LogError (EC_ERROR, "Backup", "got a MIA while expecting a TEIF −− got out of the suspicion period!"); } break;

§37

DIRNET

THE DIR NET

19

case MIA_TIMEOUT_B: LogError (EC_ERROR, "Backup loop", "MIA_TIMEOUT_B message: no heartbeat fro\ m the manager −− suspicion period entered"); /∗ no heartbeat from the manager... enter a suspicion period then ∗/ ∗suspicion period = TRUE; message .subid = managerid ; LogError (EC_ERROR, "Backup loop", "About to insert a TEIF"); h insert TEIF timeout 26 i LogError (EC_ERROR, "Backup loop", "TEIF inserted"); tom delete (tom , mia + managerid ); break; case TEIF: LogError (EC_ERROR, "Backup loop", "TEIF message −− received a message from IAT@node%d", message .subid ); /∗ a TEIF message has been sent from a IAT: as its last action, the IAT went to sleep ∗/ if (∗suspicion period ≡ TRUE ∧ message .subid ≡ managerid ) { /∗ delete timeout (MIA-timeout, subid ); tom delete(tom, mia + managerid); ∗/ tom delete (tom , &teif ); ∗suspicion period = FALSE; LogError (EC_ERROR, "Backup loop", "Manager needs to be recovered!"); /∗ Manager recovery will spawn a clone of the backup subid . If no more manager clones are available, the entire node will be rebooted. ∗/ h Manager-Recovery(managerid ) 30 i } else { LogError (EC_ERROR, "Backup loop", "IAT@node%d needs to be awaken −− sent ENIA to component@node%d", message .subid , message .subid ); /∗ send ENIA (i.e., “ENable IAt”) to subid ∗/ message .type = ENIA; RemoteSendMessage (message .subid , ALIAS(message .subid ), (char ∗) &message , sizeof (message )); tom renew (tom , mia + managerid ); } break; case TEIF_TIMEOUT_B: LogError (EC_ERROR, "Backup loop", "TEIF_TIMEOUT_B message."); if (∗suspicion period ≡ TRUE) { LogError (EC_ERROR, "Backup loop", "the node is suspected => has crashed."); h delete MIA-timeout coming from subid , if any exists 40 i /∗ the entire node will be rebooted. ∗/ ∗suspicion period = FALSE; LogError (EC_ERROR, "Backup loop", "node recovery!"); h Node-Recovery(subid ) 31 i LogError (EC_ERROR, "Backup loop", "‘a node is down’ message sent around"); h send ANID to all except managerid 41 i LogError (EC_ERROR, "Backup loop", "choice of new manager"); h choose next manager 42 i /∗ if this backup is to be the new manager... ∗/ if (managerid ≡ GetRoot ( )) { mystate .role = DIR_MANAGER; TEXSetState (&mystate ); TEXRestartTask ( ); } } else {

20

THE DIR NET

DIRNET

LogError (EC_ERROR, "Backup loop", "IAT of the manager needs to be awaken −− sent ENIA to Manager@node%d", message .subid ); /∗ send ENIA (i.e., “ENable IAt”) to ¡managerid¿ ∗/ message .type = ENIA; RemoteSendMessage (managerid , ALIAS(managerid ), (char ∗) &message , sizeof (message )); /∗ renew MIA-timeout ∗/ tom renew (tom , mia + managerid ); } break; case WITM: LogError (EC_ERROR, "Backup loop", "WITM message."); sender = message .subid ; message .arg [0] = managerid ; message .subid = GetRoot ( ); message .type = NMI; RemoteSendMessage (sender , ALIAS(sender ), (char ∗) &message , sizeof (message )); break; case ENIA: LogError (EC_ERROR, "Backup loop", "ENIA message."); h clear IA-flag 16 i /∗ if an IAT gets an “activate” message while it’s active, the message is ignored ∗/ h activate IAT 17 i break; case NMI: LogError (EC_ERROR, "Backup loop", "NMI message."); if (message .arg [0] ≡ managerid ) break; /∗ something is going wrong – someone has a different managerid ∗/ LogError (EC_ERROR, "Backup loop", "node %d thinks the manager is %d, while I think it is %d.", message .subid , message .arg [0], managerid ); break; case REQUEST_DB: LogError (EC_ERROR, "Backup loop", "REQUEST_DB message."); /∗ a node is requesting a full copy of the database ∗/ RemoteSendMessage (message .subid , ALIAS(message .subid ), (char ∗) &db , sizeof (db )); break; default: LogError (EC_ERROR, "Backup loop", "other messages."); h deal with these other messages 32 i } /∗ end switch (message.type) ∗/ h clear IA-flag 16 i h renew IA-flag-timeout 43 i } /∗ end backup loop ∗/ This code is used in section 33.

38. A “This Agent Is Alive” message needs to be sent to the manager. h send TAIA to the manager 38 i ≡ message .type = TAIA; message .subid = GetRoot ( ); LogError (EC_ERROR, "Backup", "sending TAIA to manager..."); RemoteSendMessage (managerid , ALIAS(managerid ), (char ∗) &message , sizeof (message )); LogError (EC_ERROR, "Backup", "...TAIA sent to manager."); This code is used in section 37.

§37

§39

DIRNET

THE DIR NET

21

39. The local database has been modified—this calls for a broadcast of the modifications. One of those who will receive these modifications is the manager. Such information holds implicitly (in piggybacking) a TAIA, so we can renew that timeout. h renew TAIA timeout 39 i ≡ tom renew (tom , taia ); This code is used in section 37.

40. A TEIF_TIMEOUT_B has come from node message .subid . Here we build a timeout object and try to delete an entry pertaining node message .subid and holding a MIA_TIMEOUT_B. h delete MIA-timeout coming from subid , if any exists { timeout t t;

40 i

≡

tom declare (&t, MIA_CYCLIC_B, TOM_SET_ENABLE, MIA_TIMEOUT_B, message .subid , MIA_DEADLINE_B); tom delete (tom , &t); } This code is used in section 37.

41.

“A Node Is Down” (ANID) is sent to everyone but the down node.

h send ANID to all except managerid 41 i ≡ { int i, n, this; for (i = 0, message .type = ANID, n = MAX_PROCS, this = GetRoot ( ); i < n; i ++ ) if (i 6= managerid ∧ i 6= this) RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); } This code is used in section 37.

42. This is the na¨ivest strategy for choosing a new manager. Some more sophisticated (and safer) protocol will be used in the future. h choose next manager 42 i ≡ managerid ++ ; if (managerid ≥ MAX_PROCS) managerid = 0; This code is used in section 37.

43.

Renew the IA-flag timeout.

h renew IA-flag-timeout 43 i ≡ tom renew (tom , &ia ); This code is used in sections 18 and 37.

44.

The number of tasks on this node is stored in the database.

h store number of tasks (n); 44 i ≡ { db .node [GetRoot ( )].task nr = n; } This code is used in section 8.

22

THE DIR NET

45.

Information pertaining task i is stored in the database.

DIRNET

§45

h store in local database(i, status) 45 i ≡ { int this node = GetRoot ( ); /∗ is thread i running or waiting? Is it isolated / faulty / ok... ∗/ /∗ initialize error nb to zero ∗/ db .node [this node ].task [i].status = status ; db .node [this node ].task [i].error nr = 0; } This code is used in section 8.

46. This section runs the algorithm used in the Voting Farm to manage the problem of global broadcasts (each component has to broadcast some data to all the others and has to deliver data sent by all the others). See also “the Algorithm of Pipelined Broadcast”. h broadcast local database and receive the others’ databases { int nProcs = MAX_PROCS; int task nr , i; for (i = 0; i < nProcs ; i ++ ) { if (i ≡ GetRoot ( )) { h broadcast the local part of the database } else { h deliver remote database 47 i } } } This code is used in section 8.

48 i

46 i

≡

§47

DIRNET

THE DIR NET

23

47. A remote database is delivered in three steps: first, the sender sends an integer with its node-id; second, the number of tasks to be transfered is sent; and third, the task information is received into the proper ‘slot’. The first two integers are received in the node’s “normal” mailbox, the bulk of data is received via the DB_MBOX mailbox. Errors are also sent, if any. h deliver remote database 47 i ≡ { int n; n = sizeof (int); TEXReceiveMessage (MBOX(GetRoot ( )), (char ∗) &sender , &n, INFINITE); fflush (stdout ); n = sizeof (int); TEXReceiveMessage (MBOX(GetRoot ( )), (char ∗) &task nr , &n, INFINITE); if (task nr ) { n = task nr ∗ sizeof (DIR task t ); TEXReceiveMessage (DB_MBOX, (char ∗) db .node [sender ].task , &n, INFINITE); } db .node [sender ].task nr = task nr ; n = sizeof (int); TEXReceiveMessage (MBOX(GetRoot ( )), (char ∗) &db .node [sender ].error nr , &n, INFINITE); if (db .node [sender ].error nr ) { n = db .node [sender ].error nr ∗ sizeof (DIR task t ); TEXReceiveMessage (DB_MBOX, (char ∗) db .node [sender ] . error , &n, INFINITE ) ; } } This code is used in section 46.

48. This is symmetrical to the previous section. The order of broadcast is optimal with respect to maximizing the throughput of a fully connected (crossbar) system (see “the Algorithm of Pipelined Broadcast ”). The first two integers are sent to the destination node’s “normal” mailbox, the bulk of data is sent to the destination node’s DB_MBOX mailbox. Error informationation is also sent the same way. h broadcast the local part of the database { int i, nprocs , me ;

48 i

≡

me = GetRoot ( ); nprocs = MAX_PROCS; for (i = 0; i < nprocs ; i ++ ) { if (i ≡ me ) continue; RemoteSendMessage (i, ALIAS(i), (char ∗) &me , sizeof (int)); RemoteSendMessage (i, ALIAS(i), (char ∗) &db .node [me ].task nr , sizeof (int)); if (db .node [me ].task nr ) { RemoteSendMessage (i, DB_MBOX, (char ∗) db .node [me ].task , db .node [me ].task nr ∗ sizeof (DIR task t )); } RemoteSendMessage (i, ALIAS(i), (char ∗) &db .node [me ].error nr , sizeof (int)); if (db .node [me ].error nr ) { RemoteSendMessage (i, DB_MBOX, (char ∗) db .node [me ] . error , (db .node [me ].error nr ) ∗ (sizeof (DIR error t )) ) ; } } } This code is used in section 46.

24

THE DIR NET

DIRNET

§49

49. Here we fill in with default values the “dynamic” (modifiable) part of the database, i.e., errors and the like. h build a global database 49 i ≡ { int nProcs = MAX_PROCS; for (i = 0; i < nProcs ; i ++ ) { db .node [i].error nr = 0; db .node [i].update nr = 0; db .node [i].reboot nr = 0; } } This code is used in section 8.

50. The whole database needs to survive to reboots. This is accomplished by means of the DataReset ( ) function, which marks data as ‘reboot-resistant’. h mark as ‘reboot-resistant’ the whole database via DataReset ( ) 50 i ≡ { /∗ TXT has to supply this information ∗/ } This code is used in section 8.

51. Sends a REQUEST_DB message to one or more of the neighbouring dirnet components, and waits until it gets a full copy of the global db. Note: it uses a special mailbox for that, because the size of those messages is extremely larger with respect to the size of messages for ‘normal’ mailboxes. Each node, say node n, sends requests first to node n + 1 mod MAX_PROCS, on a circular basis. h request a copy of the global database 51 i ≡ { int this node = GetRoot ( ); int nProcs = MAX_PROCS; int i; int size ; int retval ; message .type = REQUEST_DB; message .subid = this node ; for (i = this node + 1; i < nProcs ; i ++ ) { RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); size = sizeof (db ); retval = TEXReceiveMessage (DB_MBOX, (char ∗) &db , &size , REPLY_DB_TIMEOUT); if (retval ≡ MSG_OK) break; } if (retval 6= MSG_OK) for (i = 0; i < this node ; i ++ ) { RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); size = sizeof (db ); retval = TEXReceiveMessage (DB_MBOX, (char ∗) &db , &size , REPLY_DB_TIMEOUT); } } This code is used in section 8.

§52

DIRNET

THE DIR NET

25

52. This is the function which is called when a timeout occurs. In a sense, it translates a timeout event into a message event which is sent to the local DIR net component. h Alarm function 52 i ≡ int send timeout message (TOM ∗tom ) { message .type = tom~ top~ timeout .id ; message .subid = tom~ top~ timeout .subid ; message .local = 1; return TEXSendMessage (MBOX(GetRoot ( )), (char ∗) &message , sizeof (message )); /∗ no test on return value at the moment ∗/ } This code is used in section 4.

53. Note: at the moment, this is shared among backup and manager! The message should be received into a global variable called “message”, holding fields like “type”, “subid”, “arg[0..argc]”. . . Open question: is it possible to specify a loop like the following one, or is it better to have even a small timeout? h wait for an incoming message 53 i ≡ { int size = sizeof (message t); LogError (EC_MESS, "", "waiting..."); TEXReceiveMessage (MBOX(GetRoot ( )), (char ∗) &message , &size , INFINITE); } This code is used in sections 18 and 37.

54.

Broadcast a ‘Who Is The Manager’ (WITM) message.

h send WITM to all { int i;

54 i

≡

message .type = WITM; for (i = 0; i < GetRoot ( ); i ++ ) { RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); } for (i ++ ; i < MAX_PROCS; i ++ ) { RemoteSendMessage (i, ALIAS(i), (char ∗) &message , sizeof (message )); } } This code is used in section 8.

26

THE DIR NET

DIRNET

§55

55. If a node has rebooted or a new ‘generic component’ took the place of another one, then the running component needs to clarify its own role in the dirnet. This is done by sending a WITM around. The following step waits until some NMI (‘New Manager Is. . . ’) messages come in. h wait for NMI messages to come { int i, n, size ;

55 i

≡

for (i = 0, n = MAX_PROCS, size = sizeof (message t); 1; /∗ forever ∗/ i = ++ i % n) if (TEXReceiveMessage (MBOX(i), (char ∗) &message , &size , 0) ≡ MSG_OK) if (message .type ≡ NMI) break; else { LogError (EC_ERROR, "Backup", "NMI message expected, %d message received from node %d", message .type , i); } } This code is used in section 8.

56. These two functions resp. get and set the overall state of this generic component. h GetState and SetState 56 i ≡ TEXGetState (status t ∗s) { /∗ a dirty trick for the time being... ∗/ memcpy (s, &db .status , sizeof (status t)); s~ primary = 1; } TEXSetState (status t ∗s) { memcpy (&db .status , s, sizeof (status t)); } This code is used in section 4.

57.

If we are at true initialisation time, then the roles are read from the RL script.

h Read your role from the RL script 57 i ≡ { int i; for (i = 0; i < RCODE_CARD; i ++ ) { if (rcodes [i][0] ≡ R_SET_ROLE ∧ rcodes [i][1] ≡ GetRoot ( )) { switch (rcodes [i][2]) { case R_AGENT: db .role = DIR_AGENT; break; case R_BACKUP: db .role = DIR_BACKUP; break; case R_MANAGER: db .role = DIR_MANAGER; break; } break; } } } This code is used in section 8.

§58 58.

DIRNET

THE DIR NET

If we are at true initialisation time, then the identity of the manager is read from the RL script.

h Check who is the manager according to the RL script 58 i ≡ { int i; for (i = 0; i < RCODE_CARD; i ++ ) { if (rcodes [i][0] ≡ R_SET_ROLE ∧ rcodes [i][2] ≡ R_MANAGER) { managerid = rcodes [i][1]; break; } } LogError (EC_ERROR, "init", "the manager is on node %d.", managerid ); } This code is used in section 8.

27

28

THE DIR NET

DIRNET

§59

59. h TEX routines simulated on EPX 59 i ≡ #ifndef TEX RemoteSendMessage (int node , int alias , char ∗message , int size ) { int retval ; #ifdef EPX_MAILBOXES LogError (EC_DEBUG, "sender", "about to send a %d byte msg to node %d, req−id %d", size , node , alias ); retval = PutMessage (node , alias , MSG_TYPE_USER_START, 1, −1, message , size ); if (retval < 0) { char st [80]; LogError (EC_ERROR, "RemoteSendMessage", "PutMessage error:"); sprintf (st , "PutMessage on node %d: ", GetRoot ( )); perror (st ); } #else retval = SendNode (node , alias , message , size ); if (retval < 0) { printe ("%d: RemoteSendMessage:\n", GetRoot ( )); perror ("Error sending with SendNode"); } else if (retval > 0) { printe ("%d: RemoteSendMessage:\n", GetRoot ( )); printe ("%d: send message size is greater than the receive buffer\n", GetRoot ( )); } #endif } TEXSendMessage (int mbox , char ∗message , int size ) { int retval ; #ifdef EPX_MAILBOXES LogError (EC_DEBUG, "sender", "about to send a %d byte msg to node %d, req−id %d", size , GET_ROOT( )~ ProcRoot~ MyProcID , mbox ); retval = PutMessage (GET_ROOT( )~ ProcRoot~ MyProcID , mbox , MSG_TYPE_USER_START, 1, −1, message , size ); if (retval < 0) { char st [80]; LogError (EC_ERROR, "RemoteSendMessage", "PutMessage error:"); sprintf (st , "PutMessage on node %d: ", GetRoot ( )); perror (st ); } #else retval = SendNode (GET_ROOT( )~ ProcRoot~ MyProcID , mbox , message , size ); if (retval < 0) { printe ("%d: RemoteSendMessage:\n", GetRoot ( )); perror ("Error sending with SendNode"); } else if (retval > 0) { printe ("%d: RemoteSendMessage:\n", GetRoot ( )); printe ("%d: send message size is greater than the receive buffer\n", GetRoot ( ));

§59

DIRNET

THE DIR NET

29

} #endif } TEXReceiveMessage (int mbox , char ∗message , int ∗size , int timeout ) { int retval ; #ifdef EPX_MAILBOXES RR Message t ∗ m; m = malloc (sizeof (RR Message t )); #endif if (flag ) LogError (EC_ERROR, "RecvMessage", "TEXReceiveMessage about to receive a msg (size=%d) from any node on req−id %d", ∗size , mbox ); #ifdef EPX_MAILBOXES retval = GetMessage (−1, mbox , MSG_TYPE_USER_START, −1, m); ∗size = m~ Header .Size ; memcpy (message , m~ Body , ∗size ); if (retval < 0) { char st [80]; LogError (EC_ERROR, "RemoteSendMessage", "GetMessage error:"); sprintf (st , "GetMessage on node %d: ", GetRoot ( )); perror (st ); } #else ∗size = RecvNode (−1, mbox , message , ∗size ); if (flag ) LogError (EC_ERROR, "RecvMessage", "TEXReceiveMessage received a msg on req−id %d", mbox ); if (∗size < 0) printe ("%d: TEXReceiveMessage error: received message is greater th\ an receive buffer\n", GetRoot ( )); #endif } GetRoot ( ) { return GET_ROOT( )~ ProcRoot~ MyProcID ; } Export (Alias t a, IDF b) {} int TEXGetNumTasks ( ) { extern int num tasks [ ]; return num tasks [GetRoot ( )]; } /∗ this function should return a value meaning that the task is running (DIR_RUNNING) or waiting for being enabled (DIR_WAITING). For the time being, as we don’t have access to this information, we only return DIR_RUNNING ∗/ STATUSTEXGetTaskStatus (IDF t) { return DIR_RUNNING; } TEXStopTask ( ) {

30

THE DIR NET

DIRNET

exit (1); } TEXRestartTask ( ) { DIRNetGenericComponent ( ); TEXStopTask ( ); } #endif This code is used in section 4.

60.

This function converts a message-id (as defined in dirdefs.h) into a human-intelligible message.

h DIR Print Message 60 i ≡ char ∗DIRPrintMessage (int i) { if (i ≥ FIRST_MESSAGE ∧ i ≤ LAST_MESSAGE) return DIRMessage [i − FIRST_MESSAGE]; return ""; } char ∗DIRPrintTimeout (int i) { switch (i) { case IA_FLAG_TIMEOUT: return "IA flag timeout"; case MIA_TIMEOUT: return "MIA timeout"; case TAIA_TIMEOUT: return "TAIA timeout"; case TEIF_TIMEOUT: return "TEIF timeout"; case IA_FLAG_TIMEOUT_B: return "IA flag ‘B’ timeout"; case MIA_TIMEOUT_B: return "MIA ‘B’ timeout"; case TAIA_TIMEOUT_B: return "TAIA ‘B’ timeout"; case TEIF_TIMEOUT_B: return "TEIF ‘B’ timeout"; case IAT_TIMEOUT: return "IA Task timeout"; case INJECT_FAULT_TIMEOUT: return "F. Injecting timeout"; default: return ""; } } char ∗DIRPrintCode (int i) { char ∗s; s = DIRPrintTimeout (i); if (strcmp (s, "") ≡ 0) return DIRPrintMessage (i); return s; } char ∗role2ascii (int role ) { switch (role ) { case DIR_MANAGER: return "manager"; case DIR_BACKUP: return "backup agent"; default: return ""; } } This code is used in section 4.

§59

§61

DIRNET

THE DIR NET

31

61. The I’m Alive Task. After initialisation time, the IAT enters a waiting status in which it waits for an activation message from the agent it guards... h I’m Alive Task 61 i ≡ message t amessage ; timeout t aia ; TOM ∗atom ; h The IAlarm function 63 i h The DIRAlive function 62 i This code is used in section 4.

62. This is the code for the I’m Alive Task. h The DIRAlive function 62 i ≡ int DIRAlive (void) { int subid ; int IAlarm (TOM ∗); atom = tom init (IAlarm ); restart : LogError (EC_ERROR, "IAT", "IAT starts..."); h wait for activation message 64 i LogError (EC_ERROR, "IAT", "IAT activated ..."); subid = amessage .subid ; /∗ the alarm of these timeouts simply sends a message of id ¡timeout-type¿, subid = ¡subid¿ to the IAT ∗/ h insert timeout (IA-flag-timeout, CYCLIC, subid ) 65 i LogError (EC_ERROR, "IAT", "timeout activated, now enter the main loop..."); while (1) { LogError (EC_ERROR, "IAT", "waiting for a message"); h IATask: wait for an incoming message 66 i LogError (EC_ERROR, "IAT", "a message came in : %d (%s)", amessage .type , DIRPrintCode (amessage .type )); if (amessage .type ≡ IAT_TIMEOUT) { /∗ time to check the IA-flag! ∗/ if (IA flag ≡ 0) { LogError (EC_ERROR, "IAT", "IA flag is zero, correct."); IA flag = 1; LogError (EC_ERROR, "IAT", "IA flag has been set again."); } else { LogError (EC_ERROR, "IAT", "A L A R M!!!"); h send TEIF to all except subid 67 i h delete timeout (IA-flag-timeout) 68 i break; } } } /∗ end loop ∗/ /∗ TEXRestartTask(); ∗/ goto restart ; } /∗ end alive ∗/ This code is used in section 61.

32

THE DIR NET

63.

The Alarm function for the I’m Alive Task.

DIRNET

§63

h The IAlarm function 63 i ≡ int IAlarm (TOM ∗atom ) { amessage .type = IAT_TIMEOUT; TEXSendMessage (IAT_MBOX, (char ∗) &amessage , sizeof (amessage )); } This code is used in section 61.

64. The I’m Alive Task here is waiting for an activation message from its generic component. h wait for activation message 64 i ≡ { int size = sizeof (message t); LogError (EC_ERROR, "", "waiting for messages..."); /∗ flag=1; ∗/ TEXReceiveMessage (IAT_MBOX, (char ∗) &amessage , &size , INFINITE); LogError (EC_ERROR, "", "got a message (%s)", DIRPrintCode (amessage .type )); } This code is used in section 62.

65. Also the I’m alive thread makes use of the timeout manager. h insert timeout (IA-flag-timeout, CYCLIC, subid ) 65 i ≡ { int IAlarm (TOM ∗); tom declare (&aia , IAT_CYCLIC, TOM_SET_ENABLE, IAT_TIMEOUT, subid , IAT_DEADLINE); tom insert (atom , &aia ); } This code is used in section 62.

66.

Same as , but for the I’m Alive Task.

h IATask: wait for an incoming message { int size = sizeof (message t);

66 i

≡

TEXReceiveMessage (IAT_MBOX, (char ∗) &amessage , &size , INFINITE); LogError (EC_ERROR, "IAT", "got message %d (%s)", amessage .type , DIRPrintCode (amessage .type )); } This code is used in section 62.

§69

DIRNET

THE DIR NET

33

67. h send TEIF to all except subid { int i; int you = GetRoot ( );

67 i

≡

for (i = 0; i < MAX_PROCS; i ++ ) { if (i 6= subid ∧ i 6= you ) { amessage .type = TEIF; /∗ i.e., “ENable IAt” ∗/ amessage .subid = GetRoot ( ); RemoteSendMessage (i, ALIAS(i), (char ∗) &amessage , sizeof (amessage )); } } } This code is used in section 62.

68. h delete timeout (IA-flag-timeout) tom delete (atom , &aia );

68 i

≡


69. h Spawn the I’m Alive Task 69 i ≡ { int DIRAlive (void); StartThread (DIRAlive , 65536, &errors , 0, Λ); } This code is used in section 6.

/* eof dirnet.w */ __DIR_TYPES_: 5. _TOM__H_: 5. a: 59. aia : 61, 65, 68. alarm : 5. ALIAS: 4, 8, 18, 19, 21, 24, 37, 38, 41, 48, 51, 54, 67. alias : 59. Alias t: 5, 8, 17, 59. amessage : 61, 62, 63, 64, 66, 67. ANID: 41. arg : 5, 8, 18, 19, 22, 37. atom : 61, 62, 63, 65, 68. b: 59. block sp : 5. block stack : 5. block t: 5. Body : 59. code : 5. configuration : 5. cyclic : 5. DataReset : 50.

db : 5, 8, 18, 22, 37, 44, 45, 47, 48, 49, 51, 56, 57. DB: 18, 22, 37. DB_INC_REBOOT: 22. DB_MBOX: 4, 8, 47, 48, 51. DB_NEW_ROLE: 22. DB_NEW_STATUS: 22. DB_NEW_TASK_ERROR: 22. DB_NEW_TASK_STATUS: 22. deadline : 5. default alarm : 5. DIR_AGENT: 57. DIR_ALIAS_OFFSET: 3, 4. DIR_BACKUP: 8, 57, 60. DIR db t : 5. DIR error t : 48. DIR_MANAGER: 6, 8, 37, 57, 60. DIR_MBOX_OFFSET: 2, 4. DIR_RUNNING: 59. DIR state t: 5. DIR task t : 47, 48. DIR_WAITING: 59. DIRAlive : 62, 69.

34

THE DIR NET

DIRMessage : 60. DIRNetGenericComponent : 6, 59. DIRPrintCode : 5, 37, 60, 62, 64, 66. DIRPrintMessage : 5, 60. DIRPrintTimeout : 5, 60. EC_DEBUG: 59. EC_ERROR: 8, 9, 18, 33, 35, 37, 38, 55, 58, 59, 62, 64, 66. EC_MESS: 8, 53. ENIA: 18, 37. EPX: 5. EPX_MAILBOXES: 59. EPX1_2: 5. error nr : 45, 47, 48, 49. errors : 5, 69. exit : 59. Export : 5, 8, 17, 59. FALSE: 18, 37. fflush : 47. FIRST_MESSAGE: 60. flag : 5, 59. fork : 8. GET_ROOT: 59. GetMessage : 59. GetRoot : 5, 8, 9, 11, 13, 17, 18, 19, 20, 21, 24, 37, 38, 41, 44, 45, 46, 47, 48, 51, 52, 53, 54, 57, 59, 67. Header : 59. i: 8, 11, 13, 14, 20, 21, 24, 41, 46, 48, 51, 54, 55, 57, 58, 60, 67. ia : 7, 12, 43. IA flag : 5, 16, 62. IA_FLAG_CYCLIC: 4, 12. IA_FLAG_CYCLIC_B: 4. IA_FLAG_DEADLINE: 4. IA_FLAG_DEADLINE_B: 4. IA_FLAG_TIMEOUT: 4, 12, 16, 18, 37, 60. IA_FLAG_TIMEOUT_B: 4, 60. IAlarm : 62, 63, 65. IAT_ALIAS: 4, 8, 29, 30. IAT_CYCLIC: 4, 65. IAT_DEADLINE: 4, 65. IAT_MBOX: 4, 8, 17, 63, 64, 66. IAT_TIMEOUT: 4, 60, 62, 63, 65. id : 5, 52. IDF: 5, 8, 17, 59. IMALIVE_CLEAR_TIMEOUT: 4, 12. IMALIVE_SET_TIMEOUT: 4, 26. INFINITE: 5, 47, 53, 64, 66. InitSem : 6. inject : 7, 11. INJECT: 11.

DIRNET

§69

INJECT_FAULT_DEADLINE: 4, 11. INJECT_FAULT_TIMEOUT: 4, 11, 18, 60. LAST_MESSAGE: 60. link : 5. LinkCB t : 5. local : 5, 37, 52. LogError : 8, 9, 18, 33, 35, 37, 38, 53, 55, 58, 59, 62, 64, 66. malloc : 59. managerid : 7, 8, 9, 30, 35, 36, 37, 38, 41, 42, 58. MAX_PROCS: 4, 7, 11, 13, 14, 20, 21, 24, 41, 42, 46, 48, 49, 51, 54, 55, 67. MAXARG: 5. mbox : 59. MBOX: 4, 8, 47, 52, 53, 55. me : 48. memcpy : 56, 59. message : 5, 8, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 29, 30, 32, 37, 38, 40, 41, 51, 52, 53, 54, 55, 59. message t: 5, 53, 55, 61, 64, 66. MIA: 11, 18, 19, 20, 34, 35, 37. mia : 7, 11, 18, 20, 35, 37. MIA_CYCLIC: 4, 11. MIA_CYCLIC_B: 4, 35, 40. MIA_DEADLINE: 4, 11. MIA_DEADLINE_B: 4, 35, 40. MIA_RECV_TIMEOUT: 4. MIA_SEND_TIMEOUT: 4, 20. MIA_TIMEOUT: 4, 11, 18, 60. MIA_TIMEOUT_B: 4, 35, 37, 40, 60. MSG_OK: 5, 51, 55. MSG_TYPE_USER_START: 59. MyProcID : 59. mystate : 7, 8, 37. n: 7, 24, 41, 47, 55. next : 5. NIUA: 18, 24. NMI: 18, 37, 55. node : 22, 44, 45, 47, 48, 49, 59. nprocs : 48. nProcs : 17, 46, 49, 51. num tasks : 59. perror : 59. primary : 5, 8, 56. printe : 59. ProcRoot : 59. PutMessage : 59. R_AGENT: 57. R_BACKUP: 57. R_MANAGER: 57, 58. R_SET_ROLE: 57, 58. RCODE_CARD: 57, 58.

§69

DIRNET

rcodes : 57, 58. reboot nr : 22, 49. RecvNode : 59. RemoteSendMessage : 5, 18, 19, 21, 24, 29, 30, 37, 38, 41, 48, 51, 54, 59, 67. REPLY_DB_TIMEOUT: 51. REQUEST_DB: 18, 37, 51. restart : 62. retval : 51, 59. RINT_MBOX: 4. role : 5, 6, 7, 8, 22, 37, 57, 60. role2ascii : 5, 8, 60. ROUSE: 17. RR Message t : 59. runlevel : 5. running : 5. s: 56, 60. sem : 5. Semaphore t : 5. send timeout message : 5, 10, 34, 52. sender : 7, 18, 37, 47. SendNode : 59. size : 51, 53, 55, 59, 64, 66. Size : 59. SPAN: 29, 30. sprintf : 59. st : 59. starting time : 5. StartThread : 69. status : 7, 8, 22, 45, 56. STATUS: 5, 7, 59. status t: 5, 7, 56. stdout : 47. strcmp : 60. subid : 5, 9, 17, 18, 19, 22, 23, 24, 25, 26, 27, 29, 33, 37, 38, 40, 51, 52, 62, 65, 67. suspended : 5. suspicion period : 7, 14, 15, 18, 37. t: 40, 59. taia : 7, 13, 18, 23, 25, 27, 36, 39. TAIA: 13, 18, 27, 34, 36, 38. TAIA_CYCLIC: 4, 13. TAIA_CYCLIC_B: 4, 36. TAIA_DEADLINE: 4, 13. TAIA_DEADLINE_B: 4, 36. TAIA_RECV_TIMEOUT: 4. TAIA_SEND_TIMEOUT: 4, 23. TAIA_TIMEOUT: 4, 13, 18, 60. TAIA_TIMEOUT_B: 4, 36, 37, 60. task : 22, 45, 47, 48. task nr : 44, 46, 47, 48. teif : 7, 26, 28, 37.

THE DIR NET

35

TEIF: 18, 28, 37, 67. TEIF_CYCLIC: 4, 26. TEIF_CYCLIC_B: 4. TEIF_DEADLINE: 4. TEIF_DEADLINE_B: 4. TEIF_TIMEOUT: 4, 18, 26, 60. TEIF_TIMEOUT_B: 4, 37, 40, 60. TEX: 5, 59. TEXFirstActivation : 5, 8. TEXGetNumTasks : 5, 8, 59. TEXGetState : 5, 8, 56. TEXGetTaskStatus : 5, 8, 59. TEXReboot : 31. TEXReceiveMessage : 5, 47, 51, 53, 55, 59, 64, 66. TEXRestartTask : 5, 37, 59. TEXSendMessage : 5, 17, 52, 59, 63. TEXSetState : 5, 37, 56. TEXStopTask : 5, 6, 59. this: 24, 41. this node : 45, 51. timeout : 5, 52, 59. timeout t: 5, 7, 40, 61. to : 18. TOM: 5, 7, 52, 61, 62, 63, 65. tom : 7, 10, 11, 12, 13, 18, 20, 23, 25, 26, 27, 28, 34, 35, 36, 37, 39, 40, 43, 52. tom close : 18. TOM_CYCLIC: 4. tom declare : 11, 12, 13, 26, 35, 36, 40, 65. tom delete : 27, 28, 37, 40, 68. tom dump : 18, 35, 37. tom id : 5. tom init : 10, 34, 62. tom insert : 11, 12, 13, 18, 23, 26, 35, 36, 65. tom ispresent : 18, 37. TOM_MBOX: 4. tom message t: 5. TOM_NON_CYCLIC: 4, 11. tom renew : 18, 20, 25, 37, 39, 43. tom sem : 5, 6. TOM_SET_ENABLE: 11, 12, 13, 26, 35, 36, 40, 65. top : 5, 52. TRUE: 18, 37. type : 5, 17, 18, 19, 24, 29, 30, 32, 37, 38, 41, 51, 52, 54, 55, 62, 63, 64, 66, 67. update nr : 49. used : 5. WITM: 18, 37, 54, 55. you : 67.

36

NAMES OF THE SECTIONS

h Agent-Recovery(subid ) 29 i Used in section 18. h Alarm function 52 i Used in section 4. h Check who is the manager according to the RL script 58 i Used in section 8. h DIR Print Message 60 i Used in section 4. h DIR net backup agent 33 i Used in section 6. h DIR net initialisation 8 i Used in section 6. h DIR net manager 9 i Used in section 6. h Generic component of the DIR net 6 i Used in section 4. h Global Variables and # include’s 5 i Used in section 4. h I’m Alive Task 61 i Used in section 4. h IATask: wait for an incoming message 66 i Used in section 62. h Manager-Recovery(managerid ) 30 i Used in section 37. h Node-Recovery(subid ) 31 i Used in sections 18 and 37. h Read your role from the RL script 57 i Used in section 8. h Spawn the I’m Alive Task 69 i Used in section 6. h TEX routines simulated on EPX 59 i Used in section 4. h The DIRAlive function 62 i Used in section 61. h The IAlarm function 63 i Used in section 61. h Variables local to the Manager and the Backup 7 i Used in section 6. h activate IAT 17 i Used in sections 9, 18, 33, and 37. h backup loop (waiting for incoming messages) 37 i Used in section 33. h broadcast local database and receive the others’ databases 46 i Used in section 8. h broadcast modifications to db 21 i Used in sections 18 and 37. h broadcast the local part of the database 48 i Used in section 46. h broadcast NIUA 24 i Used in section 18. h build a global database 49 i Used in section 8. h choose next manager 42 i Used in section 37. h clear IA-flag 16 i Used in sections 9, 18, 33, and 37. h clear the suspicion period [ ]’s 14 i Used in section 9. h clear suspicion period 15 i Used in section 33. h deal with these other messages 32 i Used in sections 18 and 37. h declare and insert MIA timeouts 11 i Used in section 10. h declare and insert TAIA timeouts 13 i Used in section 10. h declare and insert the IA-flag timeout 12 i Used in sections 10 and 34. h declare and insert the one MIA timeout 35 i Used in section 34. h declare and insert the one TAIA timeout 36 i Used in section 34. h delete MIA-timeout coming from subid , if any exists 40 i Used in section 37. h delete timeout (IA-flag-timeout) 68 i Used in section 62. h delete timeout (TAIA-timeout, subid ) 27 i Used in section 18. h delete timeout (TEIF-timeout, subid ) 28 i Used in section 18. h deliver remote database 47 i Used in section 46. h insert TAIA-timeout, subid=subid 23 i Used in section 18. h insert four timeouts (IA-flag, MIA, TAIA, and TEIF) 34 i Used in section 33. h insert timeout (IA-flag-timeout, CYCLIC, subid ) 65 i Used in section 62. h insert timeouts (IA-flag-timeout, MIA-timeouts, TAIA-timeouts) 10 i Used in section 9. h insert TEIF timeout 26 i Used in sections 18 and 37. h manager loop (waiting for incoming messages) 18 i Used in section 9. h mark as ‘reboot-resistant’ the whole database via DataReset ( ) 50 i Used in section 8. h renew IA-flag-timeout 43 i Used in sections 18 and 37. h renew TAIA-timeout, subid 25 i Used in section 18. h renew MIA timeout on all subid ’s 20 i Used in section 18. h renew TAIA timeout 39 i Used in section 37.

DIRNET

DIRNET

NAMES OF THE SECTIONS

37

h request a copy of the global database 51 i Used in section 8. h send ANID to all except managerid 41 i Used in section 37. h send MIA to backup subid 19 i Used in section 18. h send TAIA to the manager 38 i Used in section 37. h send TEIF to all except subid 67 i Used in section 62. h send WITM to all 54 i Used in section 8. h store in local database(i, status) 45 i Used in section 8. h store number of tasks (n); 44 i Used in section 8. h update your copy of the db 22 i Used in sections 18 and 37. h wait for activation message 64 i Used in section 62. h wait for an incoming message 53 i Used in sections 18 and 37. h wait for NMI messages to come 55 i Used in section 8. h GetState and SetState 56 i Used in section 4.

References [1] V. De Florio, G. Deconinck, R. Lauwereins. An Algorithm for Tolerating Crash Failures in Distributed Systems. In Proc. of the 7th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (ECBS), Edinburgh, Scotland, 3–5 April 2000. IEEE. [2] V. De Florio, G. Deconinck, R. Lauwereins. The EFTOS Voting Farm: a Software Tool for Fault Masking in Message Passing Parallel Environments. In Proc. of the 24th Euromicro Conference (Euromicro ’98), Workshop on Dependable Computing Systems, V¨ aster˚ as, Sweden, August 1998. IEEE. [3] V. De Florio, G. Deconinck, R. Lauwereins. Software Tool Combining Fault Masking with User-defined Recovery Strategies. IEE Proceedings – Software 145(6), 1998. IEE. [4] V. De Florio, C. Blondia. Dynamics of a Time-outs Management System. Complex Systems 16(3), 2006. Complex Systems Publications, Champaign, IL. [5] V. De Florio, C. Blondia. Design Tool To Express Failure Detection Protocols. IET Software 4(2), 2010. IET.

A Distributed System for Detection, Isolation, and Recovery

A Distributed System for Detection, Isolation, and Recovery

Suggest Documents

A Distributed Intrusion Detection System for ...

Distributed Fault Detection and Isolation of ... - ScienceDirect.com

Distributed Coordinate-free Hole Detection and Recovery

a fault detection and isolation system for cooperative ... - SciELO

Distributed Fault Detection and Recovery for Networked Robots

Document Overlap Detection System for Distributed ... - FreeBSD.org

An Actuator Fault Detection, Isolation and Estimation System for an ...

Fault Detection and Isolation for Nonlinear System via ...

Page 1 An Architecture for a Distributed Intrusion Detection System ...

A MA-based System for Information Leakage Detection in Distributed ...

A Distributed Control System for Rapid Astronomical Transient Detection

nnovative Fault Detection, Isolation and Recovery Strategies On ...

A Formal Model of Crash Recovery in a Distributed System

Distributed detection and isolation of topology attacks in power networks

Distributed Fault Detection and Isolation with Imprecise Network Models

A Failure Recovery Mechanism for Distributed ...

Architecture Recovery for Distributed Systems

An Isolation Intrusion Detection System for ... - Semantic Scholar

An Isolation Intrusion Detection System for ... - Semantic Scholar

Deadlock Detection & Deadlock Prevention of Distributed System

Bolidozor-Distributed radio meteor detection system

A distributed system architecture for a distributed ...

A distributed system architecture for a distributed ...

A distributed system architecture for a distributed ... - Google Sites