On Communication Protocols in Unreliable Mesh ...

5 downloads 0 Views 200KB Size Report
tion (relevant entities are vertices) and bond percola- tion (relevant entities are edges). A natural extension of percolation is directed percolation. It is percolation.
On Communication Protocols in Unreliable Mesh Networks and their Relation to Phase Transitions Martin Neh´ez∗ Duˇsan Bern´at† Faculty of Informatics and Information Technologies Slovak University of Technology Ilkoviˇcova 3, 842 16 Bratislava, Slovak Republic {nehez,bernat}@fiit.stuba.sk Abstract We present probabilistic and experimental analysis of induced shortest path one-to-all and one-to-one communication protocols on unreliable mesh networks in this paper. We show that the obtained results are closely related to the results concerning the percolation thresholds studied in physics. Keywords: communication protocols, unreliable mesh, percolation threshold.

1

Introduction

Mesh, lattice or grid topologies represent popular architectures for multiprocessor computing systems as well as for distributed systems with interconnection networks. There were designed many parallel and distributed algorithms for these architectures, cf. [14]. One of the primary objects of study is the problem of reliable/robust execution of the algorithms over parallel and distributed systems with defective or overloaded components. Many unreliable and fault models of parallel and distributed systems were examined: there are various types of faults in combination with several different models of their distribution. In this paper we adopt the model of interconnection networks with crash-fault communication links satisfying probabilistic distribution. This model is more realistic in practice than a so called worst-case one in which a bounded number of adverse distributed faults occur. The model used in our paper was formally described in [17] by a generalization of the standard Erd¨os-R´enyi random graph model and it is also closely related to the percolation in physics (especially the model of random meshes). ∗ Institute † Institute

of Informatics and Software Engineering of Computer Networks and Systems

Percolation and Phase Transition Phenomena. Percolation is often considered as the propagation of activities through connected space (medium). An example is forest fire, where the source could be a camp fire, the space is the forest and the activity is burning. Another example is conductivity in metals. There are two kinds of percolation: site percolation (relevant entities are vertices) and bond percolation (relevant entities are edges). A natural extension of percolation is directed percolation. It is percolation with a special direction along which the activity can only propagate one way but not the other. An example is forest fire when the wind blows in the one direction. A connected space through which an activity can disseminate is often some material or environment. A phase transition phenomenon can be often observed in these materials or environments. In physics, a phase transition is the transformation of a thermodynamic system from one phase to another. An alternative view on this phenomenon also identifies it as the breaking of symmetries in the structure of the system. An example is the emergence of superconductivity in certain metals when cooled below the critical temperature. In this case, at some point there will be an exponential increase in the conduction: the system is between the subcritical and supercritical phase. This critical point (critical phase) is called the percolation threshold, or threshold value. For the survey on percolation we refer the monograph [20]. The summary of concepts and results concerning the directed percolation can be found in [12] and for some new results see [13, 16]. The percolation thresholds for several directed percolation problems from [12] are listed in the table in Figure 1. Phase Transitions in Random Graphs. Very interesting structures in discrete mathematics in which the phase transition effects can be also observed are random graphs. The first work devoted to the evolution of random graphs and their phase transitions was

Structure and problem Square mesh, bond percolation Cubic mesh r-dimensional mesh

Dimension 2 3 large

Percolation threshold pt 0.6445 ± 0.0005 0.383 ± 0.003 r−1 + 1/2r3 + r−4 + O(r−5 )

Reference [12] [12] [12]

Figure 1: Previous results of phase transition thresholds in bond percolation models. the paper of P. Erd¨os and A. R´enyi, [6]. The survey of current results concerning this topic is presented in [10], Chapter 5. A crucial role in the phase transition phenomena of random graphs plays the notion of the threshold function. A. Goerdt in [9] has stated that p = 1/(d − 1) is a threshold probability for the existence of linear-sized component in almost all random d-regular graphs. The structure of random regular graphs are closely related to the unreliable interconnection networks. The threshold functions for the multiconnectivity properties for random regular graphs were claimed in [17]. The threshold for a linear-sized component of the form p = 1/k for a k-dimensional random hypercube can be found in [2] and the threshold for the same problem for 2-dimensional n × n random grid is p = 1/2, cf. [11]. The lower bounds on thresholds for connectivity properties for random tori and random generalized hypercubes was claimed by M. Neh´ez in [18]. Recently, the phenomenon of phase transition was observed also in the small world graphs, [15]. All these results can be understood in such a way that one phase in the phase transition of the random graph is more symmetrical than the other one.

2

The Communication Model and the Problem Statement

To describe a model of interconnection networks with faulty communication links satisfying probabilistic distribution we will use the model of random graphs in this paper. Let B = (V, E) be a simple connected base graph, where V is the set of nodes (or processors) and E is a set of edges (or bidirectional communication links). The links of the network are either operational or inoperational (failure). Assume that a random graph G is obtained from a graph B with node set {1, . . . , |V (B)|} by independently removing each edge of B with a failure probability 1−p. For each spanning subgraph G of B it holds that V (G) = V (B), E(G) ⊆ E(H) and we will denote the set of all spanning subgraphs of B by 2B . Let G(B, p) = (Ω, Pr) be a probability space of random graphs, such that G ∈ G(B, p) is a uniform labeled

random spanning subgraph of B, where Ω = 2B and Pr[G] = p|E(G)| (1 − p)|E(B)|−|E(G)| . We consider a point-to-point communication network modeled by an r-dimensional mesh with s nodes in each dimension. For r ≥ 2 and s fixed, the rdimensional mesh Msr is a Cartesian product of r snode paths with n = sr nodes. Let us define a random r-dimensional mesh M to be a member of probability space G(Msr , p), where Msr is the base graph and p is the probability as before. This model, called the reliability mesh network model, appears in [5] and [17] and is described in more details in [10]. According to the taxonomy of communication problems due to P. Ruˇziˇcka, [19], we examine one-to-all and one-to-one communication patterns. Let V denote the set of nodes of the network. One-to-all communication pattern is a set P(S) of all ordered pairs (S, v), where S is a source node and v ∈ V such that v 6= S. One-to-one communication pattern is a singleton set P(S, R) = {(S, R)}, where S ∈ V is a source node and R ∈ V is a receiver node such that S 6= R. Communication problem for one-to-all communication pattern is called broadcasting and it is generalization of the communication problem for one-to-one communication pattern. The communication problem for oneto-one communication pattern over an unreliable network is called end-to-end communication problem (or reliable communication), cf. [1, 7]. According to the described communication patterns, we can distinguish among several kinds of path systems. To optimize the traffic in the network we restrict ourselves to induced shortest paths exclusively. For two nodes u, v of the reliability mesh M ∈ G(Msr , p) a path Pu,v with extremities u and v is said to be shortest path induced by the graph Msr , or shortly induced shortest path, or ISP, if the length of Pu,v is equal to the distance of u and v in the graph Msr . Clearly, induced shortest path is the subgraph of M induced by Msr . We consider oblivious (memoryless) protocols which take their routing decision at every node of the network, but the routing decision is based entirely on the content of the packet header. Each node of the networks “knows” only its self unique identification num-

ber and uniquely distinguished number of its output ports (outgoing edges). The advantage of the oblivious protocol is their high tolerance to the failures occurred in the network, cf. [1]. Each end-to-end communication protocol must satisfy the requirement of reception and termination. For further details see [1, 7].

3

Probabilistic Analysis

Monotonicity and Threshold Functions. Asymptotic notation (such as O, o, Ω, Θ, cf. [10]) is used in the usual way in this paper. Moreover, for two sequences (or equivalently functions) a = (an )∞ n=0 and b = (bn )∞ n=0 , we will write an ¿ bn if an ≥ 0 and an = o(bn ). Let B be a base graph and G1 , G2 the spanning subgraphs of B. A family of subsets Q ⊆ 2B is said to be increasing if G1 ⊆ G2 and G1 ∈ Q imply that G2 ∈ Q. A family of subsets Q0 is decreasing if 2B \ Q0 is increasing. A family which is either increasing or decreasing is called monotone. A family of graphs from 2B can be identified with a graph property. Threshold functions are defined for monotone properties. In this paper we need to introduce them only for increasing properties. For further details see [10]. Let B be an n-node graph and let Q be an increasing property of graphs from 2B . A function pb = pb(n) is called threshold function for Q iff the following condition holds: ½ 0 if p ¿ pb , Pr[ G(B, p) has Q ] → 1 if p À pb . The notion of the threshold function plays a crucial role in the study of the phase transition phenomena of random graphs, cf. [3, 10]. Let Qu,v denote the following property: “graph G ∈ 2B contains at least one path Pu,v of the length distB (u, v)”. Note that Pu,v is the shortest path also in G, hence it is a B-induced shortest path. We will use the following property proved by M. Neh´ez in [18]. Proposition 1 [18] The property Qu,v is increasing. Lower Bounds on the Thresholds. The main theoretical results of present paper are estimations of the threshold functions for the one-to-all ISP communication problems. Theorem 1 The threshold function pb for the problem of one-to-all Msr -induced shortest path communication protocol over a reliability mesh network M ∈ G(Msr , p) is given by the following inequality: pb ≥ r−1 .

Sketch of Proof. The proof is based on the argument that was used by M. Neh´ez in [18]. Let S be a source node and R be a receiver. Nodes S and R of V (Msr ) are considered as r-dimensional vectors over {1, . . . , s}. We assume s to be an increasing sequence (si )∞ QS,R denote the i=1 , whereas r be a constant. Let r following property: “graph M ∈ 2Ms contains at least one path PS,R of the length distMsr (S, R) = l”. Note that by the Proposition 1, QS,R is an increasing property. Supposing l is related to s (l ≤ s), l can be also considered as an increasing sequence. Let XS,R be a random variable on G(Msr , p) associated to the property QS,R . Consequently, XS,R counts the number of paths PS,R in reliability mesh M ∈ G(Msr , p). The expectation E(XS,R ) of the random variable XS,R is given by: µ ¶ l l! E(XS,R ) = pl = pl , l1 , . . . , lr l1 ! . . . lr ! where li denotes the distance of ith coordinates of S and R for each i ∈ {1, . . . , r}. The function E(XS,R ) claims its maximum value for l1 = . . . = lr = l/r. Putting l1 = . . . = lr = l/r we obtain the following inequality: E(XS,R ) ≤ l! · pl · [(l/r)!]−r . Note that for p = 1/r it holds: pl · rl → 1 as l → ∞. The Stirling’s formula l! ∼ (l/e)l , the fact that r ¿ l and the Markov’s inequality, cf. [10] yield: Pr[XS,R > 1] ≤ E(XS,R ) < 1 for each p < 1/r. It means that the probability that there is a shortest path PS,R induced by the graph Msr in G(Msr , p) is approaching to 0 as l → ∞. Let us consider an arbitrary one-to-all induced shortest path communication protocol X . Consequently, the probability that a packet sent from S will be delivered to any other node according to X is approaching to 0 as l → ∞. Hence, the reception requirement does not hold for an arbitrary protocol X for each p < 1/r. ♦ The refinement of the threshold function for a twodimensional case is the following. Theorem 2 If r = 2 then the threshold function pb for the same problem in a reliability mesh network M ∈ G(Ms2 , p) is: 22/3 . pb ≥ 3 The proof of Theorem 2 is based on the fact that for r = 2 we can substitute the average values l1 = s/3 + O(1), l2 = 2s/3 + O(1) and l = s + O(1) into formula for E(XS,R ).

remains constant (regardless of the mesh size s). It depends only on the mesh dimension r as it bears the information about the direction of message, not the whole address.

alpha[%] × + ♦ ×♦ ×♦× × ♦ + × ×× ♦ ×× + ♦ ×× + × × + × + × + × + × + ×

90

♦ ♦

80 70



+

× +

×

60

+

×



50

×

+ +

×

40

+

×



30

×

0

+

×

20 10





100

× ♦ × + × × × × +× +× +♦× +× +♦ +× +× +♦× +× +♦ +× +× +♦× +× +♦ +× +× +× +× +× +++++++++++++ × ♦ 0

0.1

0.2

0.3

0.4

0.5

0.6

+

p 0.7

0.8

0.9

1.0

Figure 2: Relative number of nodes reached by the broadcast as a function of link probability p. Dimension values are r = 2, 3, 4 from the right to the left, respectively.

4

Experiments

We have done several simulation experiments of oblivious communication protocol in the network with mesh topology described by the random graph from G(Msr , p). The aim of simulations is to obtain the number of nodes reached by a broadcast from a single source as a function of the link probability. Description of the Protocol. Communication protocol for one-to-all (and one-to-one) ISP single packet transfer in the network with the topology modeled by a random mesh M ∈ G(Msr , p) is based on flooding. The source node S sends the packet to all of its neighbors and then each node passes the packet to all of its output ports, as long as the neighbor node connected to this particular port has greater distance from the source (in the base graph) than the current one. This prevents infinite loops and guarantees the termination property, though the reception is not fulfilled completely (except of the case when p = 1). The broadcasting will stop when all packets reach the boundary nodes (those with degree in the base graph less than 2r) which have no other neighbor with greater distance from the source. Nevertheless, for the implementation it is sufficient to have only O(r) bits of information regarding the direction of packet, stored in the packet header. On the other hand, it was shown in [8] that the lower bound on header size is Ω(log s) for the case of two dimensional mesh with size s and the oblivious end-to-end communication protocol fulfilling both, termination and reception properties. In our case, if we set r = 2, the header size

Simulation Experiments. The simulation program was implemented as a PERL script. For each simulation it is possible to set the mesh parameters s, r, the edge probability p and the number of simulation cycles from which the average value of number of reached nodes is computed. Since each node may have two links to each dimension, the memory needed to store information about the mesh is Θ(rsr ). This can be halved when links are bidirectional. We examined the meshes with the number of nodes order of millions. The memory required was hundreds of megabytes. Most of the simulations were performed on the system with Pentium-4 hyperthreading processor, running at 2.6 GHz and 512MB of memory. But number of other architectures (e.g. 2xUltraSPARCII 400MHz, 2GB RAM; cluster 16xCeleron 733MHz, 256MB RAM) were tested in order to speed up the simulations. Simulation time ranged from hours to days, depending on particular parameter settings. Experimental Results. Assume we have a random mesh M ∈ G(Msr , p). Let Rp (M ) be the average number of nodes in mesh M reached from the source. Then we define αp (M ) to be the relative number of nodes reached by broadcast as a function of link probability and the mesh parameters: αp (M ) =

Rp (M ) . sr

In general, the expected number of nodes reached by broadcast is a function of three parameters, i.e. αp (M ) = α(s, r, p). For particular graph with fixed dimension r and size s we consider αp (M ) as the function of link probability and we denote it α(p). The simulation results for mesh dimensions r ∈ {2, 3, 4} are summarized in the following table and corresponding functions are depicted in Figure 2. Graph 2 M1280 3 M50 4 M30

Cycles 1000 500 50

Time 235h41m 4h00m 24h41m

pc 0.6492 0.4460 0.3262

As we can see from the graphs, the phase transition phenomenon can be observed for each dimension. Let pc denote the critical value of probability. Then for p < pc , the number of nodes where the packet has arrived to is small, tending to zero. Above this value the expected number of reached nodes grows to 100%.

Clearly, the same holds for the critical value pc which moves slightly towards higher values as the graph size increases. Finally, it reaches the theoretical threshold for infinitely large graphs, so we have pˆ = lims→∞ pc . Obtained results are in good correspondence with theoretical lower bounds derived for the case of communication on diagonal mesh elements. Perhaps more precise probabilistic analysis (and the experimental results satisfy this too), taking into account all nodes, including non-diagonal ones, may yield a better lower bound values.

E, V +

100

+ +

90 +

80 70

+

60 +

50 40

+

30 + 20 10 0

+ ♦ 0

+ ♦

+ ♦ 0.1

+ ♦

+ ♦ 0.2

+ ♦

+ ♦ 0.3

+ ♦

+ ♦ 0.4

+ ♦

+ ♦ 0.5

+ ♦

+ ♦











♦ ♦ ♦

0.6

0.7

0.8

0.9

p

1.0

5 Figure 3: Expected value E (+ points) and standard deviation 2 mesh V (• points) of the number of nodes reached in the M30 experiment.

The value of pc moves to the left significantly with increasing mesh dimension r. As the nodes possess more connections, even relatively lower amount of available links suffices to deliver the packet to greater number of nodes. We can see in Figure 3 that the standard deviation for number of reached nodes is zero when the exact result is known for probability zero and one. It has the maximum in the point of highest slope of the expectation. When the slope of curve and thus its derivative is high the result is more sensitive to small changes of probability p, hence the deviation in resulting value is high too. The critical value of probability corresponding to particular curve was established as an intersection of the p axis and the tangent of the appropriate curve in its point of inflection pi . We approximate the derivative with the first difference, hence the point of inflection pi can be found as the probability value where the difference takes its maximum. Thus, for the value of critical probability we have:

Conclusions

We have showed that the problems of one-to-all and one-to-one induced shortest path communication in the reliability mesh networks is related to the state transition phenomena observed in the directed bond percolation model. The following table summarizes our theoretical and experimental values of the thresholds for the problem with comparison to the known result for directed bond percolation. r

Lower bound on pb

2 3 4

0.523 [Thm. 2] 1/3 [Thm. 1] 1/4 [Thm. 1]

pc 0.6492 0.4460 0.3262

Bond percolation threshold pt 0.6445 ± 0.0005 [12]

Acknowledgement. This paper was finished according to the project in Gratex Research Centre. The work was supported by the Gratex Research Centre and Slovak Science Grant Agency VEGA, projects No. 1/0162/03 and 1/0157/03. The authors thank Mrs. M´aria Markoˇsov´a and Mr. Viliam Solˇc´any for the fruitful discussions and valuable comments on this work. Many thanks also to the anonymous referees for their suggestions on the manuscript.

pc = pi − λ · α(pi ), pi+1 − pi where λ = ≈ α(pi+1 ) − α(pi )

Ã

!−1 ¯ dα(p) ¯¯ . dp ¯p=pi

Probability values pk ∈ [0, 1] used for simulation are equidistant. Quantity λ is an approximation of the inverse value of first derivative in the point pi in which the difference takes its maximum. Although this point might be independent of the graph size s, unfortunately the first derivative depends on it anyway.

References [1] M. Adler, F. Fich: The Complexity of End-to-End Communication in Memoryless Networks, In Proc. 18th ACM Symposium on Principles of Distributed Computing, PODC’99, 239–248. [2] M. Ajtai, J. Komlos, E. Szemeredi: Largest random Component of a k-Cube, Combinatorica, 2 (1), 1982, 1–7.

c c c c c c c c c c c c c c c c

c c c c c c c c c c c c c c c c

c c c c c c c c c c c c c c s s

c c c c c c c c c c c c c c c c c c c c c c c c c c c c s s su gs

c c c c c c c c c c c c c c s s

c c c c c c c c c c c c c s s s

c c c c c c c c c c c s s s s s

c c c c c c c s s s s s s s s s

c c c c c c c s s s s s c c s s

c c c c c c s s c c s s s s s s

c c c c c c s s c c s s s s s s

c c c c c s s s c c s c s s s s

c c c c c s c c c c s s s s s s

c c c c c c c c s s s s s s s c

c c c c c c c c s s s s c s s c

s s s s s s s s c c c c c c c c

s s s s s c c c c c c c c c c c

s s s s u sg s s s c s c s c s c s c s c s c s c s c s c s c c c c

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s s s s s s s s s s s s s s s

s s c s s s s s s s s s s s s s

s s c c s s s s s s s s s s s s

s c c c s s s s s s s s s s s s

2 with total of n = 256 nodes. The source of broadcasting is marked with double Figure 4: Results of two simulations on meshes M16

circle. White circles represent unreached nodes while solid circles represent visited ones. On the left side the line probability is p = 0.7 and the result is RM (0.7) = 76, α(0.7) = 29.6875%. On the right side RM (0.9) = 217, α(0.9) = 84.7656%. Source S was chosen randomly within the mesh.

[3] N. Alon, P. Erd¨os, J. Spencer: The Probabilistic Method, John Wiley & Sons, New York, 1992. [4] H. Attiya, J. Welch: Distributed Computing: Fundamentals, Simulations and Advanced Topics, McGraw-Hill, London, 1998. [5] B. Bollob´as: Random Graphs, Academic Press, New York, 1985. [6] P. Erd¨os, A. R´enyi: On the evolution of random graphs, Publ. Math. Inst. Hungar. Acad. Sci., 5 (1960), 17–61. [7] F. Fich: End-to-End Communication, OPODIS 1998, 37–44. [8] P. Fraigniaud, C. Gavoille: Lower bounds for oblivious single-packet end-to-end communication, In Proc. 17th Int. Symp. on Distrib. Comp. (DISC 2003), Springer, LNCS vol. 2848, 2003, 211–223. [9] A. Goerdt: The Giant Component Threshold for Random Regular Graphs with Edge Faults, Theor. Comp. Sci. 259 (2001), 307–321. [10] S. Janson, T. Luczak, A. Rucinski: Random Graphs, John Wiley & Sons, New York, 2000. [11] H. Kesten: The Critical Probability of Bond Percolation on the Square Lattice Equals 1/2, Communication in Math. Physics, 74 (1980), 41–59. [12] W. Kinzel: Directed Percolation, In Percolation Structures and Processes, G. Deutscher, R. Zallen, J. Adler (Eds.), Bristol, England: Adam Hilger, 1983, 425–445.

[13] K. B. Lauritsen, K. Sneppen, M. Markoˇsov´a, M. H. Jensen: Directed percolation with an absorbing boundary, Physica A 247 (1997), 1–9. [14] F. T. Leighton: Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann Publishers, San Mateo, California, 1992. [15] M. Markoˇsov´a: Small World Networks (in Slovak), In Proc. Congnition and Artificial Life III, elfa, Koˇsice, 2003, 227–232. [16] M. Markoˇsov´a, J. Antala: Directed site percolation on the square lattice - Pascal like triangle approach, unpublished manuscript, 2004. [17] S. Nikoletseas, K. Palem, P. Spirakis, M. Yung: Connectivity Properties in Random Regular Graphs with Edge faults, Int. J. Foundation Comp. Sci., 11 (2), 2000, 247–262. [18] Neh´ez, M.: Robustness Properties Thresholds in Reliability Networks, In Proc. 6th Int. MultiConference Information Society IS-TCS 2003, ABO Grafika, Ljubljana, 2003, 288–291. [19] P. Ruˇziˇcka: On Efficiency of Path Systems Induced by Routing and Communication Schemes, Computing and Informatics, 20 (2001), 181–205. [20] D. Stauffer, A. Aharony: Introduction to Percolation Theory, 2nd ed. London: Taylor & Francis, 1992.