Neighbourhood Broadcasting Schemes for Cayley Graphs with Background Traffic D. D. Kouvatsos and I. M. Mkwawa Department of Computing, University of Bradford D.D.Kouvatsos,
[email protected]
Abstract
nection of stand alone processors each of which has its own memory. Examples of this type of network are distributed memory multicomputers such as Intel Hypercube and Ncube machines(c.f., Akers et al [2]) in which all processors exchange or forward jobs through the network, as appropriate. Fast and efficient broadcasting schemes are fundamental prerequisites underpinning the computational algorithms of INs such as those relating to fast Fourier transformations (FFTs), parallel matrix and graph manipulations and distributed network applications. For hypercube INs, the broadcasting of a message is required to support the transfer of control messages necessary for synchronization and/or for the support of remote file access. Two of the main generic schemes of broadcasting in static networks (c.f., Jang-Ping et al [3], Mendia and Sakar [4]), namely, the one-to-all and all-to-all broadcasting schemes, support the process of passing one unit of information from a single and every, respectively, processor to all other processors in the network. This is achieved by a sequence of calls over the links of a static network where a processor can only call an adjacent processor and each call from one processor to another evokes a unit or a weighted time cost, as appropriate. Moreover, both these schemes may combine with the neighbourhood broadcasting scheme which is based on the sending of a job from a source node to all its adjacent nodes. Fertin and Raspaud in [5] and [6] studied neighbourhood broadcasting in paths, trees, cycles, 2 dimensional grids and 2 dimensional tori under store and forward, single port, unit cost model. For most of these families the optimal neighbourhood broadcasting scheme was given within an additive constant never exceeding 2. Neighbourhood broadcasting schemes have also been studied in [7] for star graphs and it was showed that the upper bound is unit time steps. However, these schemes are only applicable without the presence of background traffic. Neighbourhood broadcasting schemes without background traffic for Cayley graphs of permutation groups formed by transposition has been investigated by Mkwawa and Kouvatsos [8] with
The increase of computationally intensive parallel application has prompted an increase into analysis, design and development of high performance parallel computers. The success of parallel computers is highly dependent on the efficiency of their underlying interconnection networks (INs), and an important feature characterizing the suitability of an IN is the ability to effectively disseminate the information among its processors. This paper proposes new neighbourhood broadcasting schemes for Cayley graphs of permutation groups formed by transposition with bursty background traffic and a singleport mode of message passing communication. The schemes utilise a maximum entropy (ME) based queue-by-queue decomposition algorithm for arbitrary queueing network models (QNMs) [1] and are based on binomial trees and graph theoretic concepts. It is shown that the neighbourhood broadcasting scheme for Cayley graphs of permutation groups formed by transposition is determined by , where is an upper bound of the num. ber of steps and is equal to
!"$#%&('*)
1. Introduction Interconnection networks (INs) are comprised from a set of processing elements (or nodes) and communication channels having their own local memory and inter-process communication schemes. These networks can be generally classified into static and dynamic networks depending on the mode of communication amongst parallel computers. Dynamic networks are built out of switching components and are used in the design of shared memory multiprocessors (each of which uses a shared memory to send information to others c.f., Akers et al [2]). Static networks may not have switching devices and represent a fixed intercon-
,!-".#0/&)213!4&
+
This work is jointly supported by EU IST Programme Grant IST-200132392 and the School of Informatics, University of Bradford, UK.
1
ISBN: 1-9025-6009-4 © 2003 PGNet
!" #0/&)$13! &
the tighter bound of unit time steps. This paper proposes new neighbourhood broadcasting schemes for a light traffic of special messages traversing through Cayley graphs of permutation groups formed by transposition with bursty background multiple class traffic. The schemes facilitate inter-process communication amongst INs and it is applicable to a single-port mode of message passing communication. Moreover, they make use of an extended product form approximation and a queueby-queue decomposition algorithm, based on the principle of maximum entropy (ME) and the generalized exponential (GE) distribution (c.f., Kouvatsos [9]), for an arbitrary open queueing network model (QNM) with multiple job classes, random routing with class switching, head-of-line (HOL) priorities, complete buffer sharing (CBS) and repetitive service blocking with random destination (RS-RD) (c.f., Kouvatsos and Awan [1]). More specifically, the background traffic’s mean aggregate delay (or, response time) at each queueing station is used to determine a transmission cost towards the transformation of the Cayley graphs into a weighted directed graphs. Consequently, a binomial tree is embedded prior to the implementation of the neighbourhood broadcasting schemes. Cayley graph of permutation groups graphs are introduced in Section 2 together with the transformation process of the original hypercube into a weighted graph. Neighbourhood broadcasting algorithms are devised in Section 3 and conclusions follow in Section 4.
Remarks
The CBS management scheme stipulates that jobs of any class can join a finite capacity queue as long as there is free buffer space.
The RS-RD blocking occurs when a job, upon service completion at queue , is rejected by destination queue whose capacity is full. Consequently, the blocked job immediately receives another service at queue . This is repeated until the job completes service at queue at a moment where a downstream queue, which is not at full capacity, is selected independently of the initially chosen destination queue .
2. Cayley Graphs of Permutation Groups with Background Traffic In this section, the ME decomposition algorithm of an open QNM (c.f., Kouvatsos and Awan [1]) is employed to transform Cayley graphs of permutation groups with bursty background traffic into a weighted directed graphs which are subsequently used to devise optimal neighbourhood broadcasting schemes for a light traffic of special messages of the network.
!#"$&%( ' $)%* ,+ 2 "
Consider a directed graph , comprised vertices (or, nodes) and from a set of directed links a set of indicating a connection from vertex to vertex . Cayley graphs of permutation groups with background traffic may be generally modelled as weighted directed graphs whose vertices represent the processing elements (or nodes), the links correspond to the communication channels and W is a set of weights (or, costs) at each communication channel. be a finite group with ‘.’ called the product and Let as the identity. Let be a generator set for , such that
- .0/1
34 5 678
!:9; &
?9
9
@+A= then B +A= and < A +' = . Given !19;= & , then a Cayley graph C !D *& can be defined as !E9 and 4!,
!:F2G &IHKJ F2GL+M and N+ directed links !OFPG &H and = such that y = x.g , i.e two !OG* F &&HB are viewed as "Q DR SUTWVIS Q edge !OFPG & in the graph . Since = is a generator set for 9 , clearly is connected and XY=,X dictates the degree and the diameter of the Cayley graph . if
Note that a hypercube is taken as a case study to represent all Cayley graphs of permutation groups formed by transposition. An n-dimensional hypercube is a vertex symbe metry graph with edge connectivity equals to . Let a network of processors connected as an n-dimensional hynodes and links, i.e., the graph percube with whose nodes are all binary strings (or, symbols) of lengths and two nodes and are connected by a link iff , is an exclusive or opwhere a generator eration. In other words the generator will make the corresponding strings differ precisely in one position. is an ordered tree defined recurA , there are nodes, sively. For the binomial tree the height of the tree is and the root has degree . The maximum degree of any node in an n-node binomial tree . The binomial tree is the only tree that gives an is optimal broadcasting time in a single port mode of message passing communication. In the context of this paper, a weighted directed graph of an IN with background traffic may be interpreted as an arbitrary open QNM with finite capacity queueing stations corresponding to the congested links of the Cayley graphs. In such networks, it is assumed that special messages have to be occasionally broadcasted according to an optimal scheme under the presence of an external bursty background traffic of multiple class messages emanating from a source node(or, nodes) of the IN and propagating to other nodes according to a compound poisson process (CPP) with geometrically distributed batches (or, equivalently, GE-type interarrival times). Under random
"
/[B
L / [ ] _ /K Y4"
` "U $a V$R SUS^b
%
d e! 6&7
&
b
Z\[
^ ]
c /
%
"
routing with class switching, these messages visit some, but not necessarily all, link queues of the network, subject to HOL priority scheduling, CBS management scheme and RS-RD blocking. An optimal broadcasting scheme, dealing with the light traffic of special messages, should be exploiting, at any given time, those routes of the network consisting of less congested link queues utilized by the bursty background traffic. As a consequence, special messages will be broadcasted in an optimal fashion causing a minimal interference on the background traffic. Thus, as soon as a special message arrives at a source node of the Cayley graphs for broadcasting, it is essential for an optimal scheme to have timely, reliable and up-to-date information regarding the state of the network’s congestion and, in particular, basic performances metrics for the link queues of the network. Without loss of generality, the communication cost experienced by the arrival of a special message at a link queue of the QNM prior broadcasting is taken to be equivalent to the mean aggregate delay of bursty background messages at the same link. To this end, the cost-effective analytic ME decomposition algorithm of [1] is employed to compute the mean aggregate delay of the bursty traffic at each link queue, and thus, facilitating the transformation of the Cayley graphs into a weighted directed graph . Consider an arbitrary open QNM at equilibrium with single GE-type server queues, distinct HOL priority classes (indexed from 1 to R in descending order of priority), GE-type external interarrival times, random routing with class switching, CBS buffer management scheme and RS-RD blocking. Each queueing station is assumed to be modelled by a building block GE/GE/1/N /HOL/CBS queue with finite ca pacity . Notation
!
% e! %0 ./ W !
&
&
!e6&7
&
%
& For each queue %$!1%^ ./ & and job class ! _/ W & , let be the mean rate and SCV of the overall actual
interarrival process of class
%
jobs at queue , respec-
be the mean rate and SCV of the actual service process of class jobs at queue % , respectively, " ! " & be the number of jobs of class at queue % waiting and receiving service, , !"2 4"2 . 4"2 & be the aggregate joint state the tively,
be the mean arrival rate and SCV of the actual external interarrival process of class jobs at queue % , respectively,
a "Q be the mean arrival rates and SCVs of the actual and effective, respec tively, interarrival processes of class jobs transmitted from queue % to queue as class jobs,
The notation of [1] also applies in this and subsequent sections with the difference that each link queue is simply represented as , whilst the class subscript is separated by a comma(s). To this end, the weight at each link queue may be defined by
d " &" I) %*
%
) & "
&" 7 W"
7 &" #"$I) %*
(1)
!
where is the mean aggregate delay of bursty traffic at link , and "$#%'& ( is the mean transmission time of a special message at the link with ! being the largest transmission rate associated with the highest priority class of the background traffic. A computational procedure named Prebroad, describing the transformation of the hypercube as a case study into a weighted directed graph , via the ME analysis of a corresponding QNM, is presented below. The application of procedure Prebroad utilizes the ME decomposition algorithm leading to the initiation of the proposed neighbourhood broadcasting schemes (c.f., Section 3). A diagrammatic illustration of the proposed procedure Prebroad can be seen in Figure 1. Procedure Prebroad Input Data
&"
!e 6&7
!e*&W &"$ &"
)
*
&" &" *
!
&
&" a & " *
*
Begin Step 1 Formulate, initially, a directed graph hypercube;
I) %*I 2 +-,
. /
!e*&
of the
Step 2 Create an open QNM corresponding to directed graph G(U,L);
a a be the transition probabilities (first order Markov chain) that a class job transmitted from queue % leaves the network or attempts to join queue as a network,
class job, respectively,
Step 3 Apply the queue-by-queue ME decomposition algorithm of the QNM;
% ) %
Step 4 Determine the mean aggregate delay link queue , , of the QNM;
!D 67
7 &"
at each
&
Step 5 Transform the hypercube, via the QNM, into a directed weighted graph with weights ; -
&" $) % $
Step 6 Initiate neighbourhood broadcasting schemes; End
000 1
001 4
Background traffic generated at the source node 1; 2
011 5
010
100 3
6 101
Initial directed Hypercube with background traffic; 7 110
8 111
Hypercube G(U,L) transformation into an open QNM;
12
25
58 " # %'&
& # %'& & #%'&
13 # %'& !&
#%'& !& &
5&"
& # %'&
87
37
72 68
36 46
45
14
ME queue−by−queue decomposition of the open QNM and computation of mean aggregate delays;
.. .
(
12 # %'& & #%'&
&"
" # % # %
87 QNM transformation into weighted hypercube G(U,L,W) and computation of weights;
ω 14
000 1
PSfrag replacements
ω13 3
100
001 4
ω 12 010 ω 36 ω 37
2
ω 45
ω 46
5
101 6
ω 72
011
ω 25
ω 68
ω 58
7 8 111 ω 87 110 A revised weighted hypercube G(U,L,W) with weights ω kl
Figure 1. Hypercube transformation with background traffic into a weighted hypercube
3. Neighbourhood Broadcasting Scheme
" !1 &
Phase one
be the optimal neighFor any given graph , let bourhood broadcast time steps for . It is easy to show that,
- ! !: &) & ' " !: & !1 &W (2) !1 & is the maximum degree of . The lower
where bound above is derived by counting the number of informed vertices which at most double in each time step. In this section an upper bound is proposed for the neighbourhood broadcasting scheme assuming that the trivial neighbourhood broadcasting i.e., direct communication neighbouring nodes accrues from the source node to to a very large cost. Focusing on the Cayley graphs of permutation groups structures, one of the major requirements is to find the minimum number of steps before establishing the least cost involved. To this end, the following proposition can be used to define an optimal neighbourhood broadcasting scheme for hypercubes:
!1Z [ &
Z[
, for any seProposition 1 Given a hypercube graph quence with number of altered symbols, a cycle of is formed. length
%
/_%
Clearly, proposition 1 holds since if any symbol is not in its original position, it will take one step to place it to its original position. The following is an example of proposition 1.
Z , the sequence of generators is, 54 4 4 , and the sequence of permuta8 ! & ! & tions is, ! & !- & ! & !- & . ! ;& ! &; Example 1 For a
-
-
)
-
-
Other propositions for Cayley graphs of permutation groups formed by transposition like star, bubble and modified bubble sort, extension of hypercube version 1 and 2 graphs can be seen in [8]. The scheme for hypercubes takes several forwarding phases depending on the number of neighbours to be informed. A message dissemination is depicted in Figure 2, where other binomial trees emanating from the rest of the nodes are omitted for clarity of presentation. At each forwarding phase a message is broadcasted to a group of binomial trees, each consisting of at most 8 vertices. Each for warding phase takes at most three time steps. By grouping
each forwarding phase into at most 8 vertices,
number of forwarding phases are obtained and at most number of vertices are informed. At each time step the source vertex repeatedly disseminates the message to its neighbours which in turn, will of length 2 using a forward the message along the .
"H [B
"#
R aV
g3 00000000,...,0
)
g4 g2 g5
Second phase g1
g6
Source node
g1 g7 g6
g8 g9 g10 g6
g1
Return path
Figure 2. Message dissemination for the hypercube
cycle which consists of four nodes. Upon receiving the message, other vertices will also forward the message along . their . to inform steps It takes at the most time
number of vertices in number of at most
forwarding phases. Let be the number of forwarding phases. It can be eas, the return phase for the hypercube ily seen that for takes at most 2 time steps. For , the return phase takes at the most time steps. The overall scheme for the hypertime steps and hence, cube takes
R aV
- !-" #%& ' " H [ B
" #
-!-"$# / & ' ) ) Z [ -!-" &(' " !1Z [ & - !-"$# & ' ) 13! &
(3) Next one needs to associate each time step with its corresponding accumulated weight. In this context, the complexity of the step is defined as the largest weight and, hence, the upper bound of the total cost for the neighbourhood broadcasting scheme is given by,
_ /K W !"$#%&(' ) 13!4&
" H [ B " !1Z,[ &
(4)
The proposed optimal neighbourhood broadcasting scheme can be described by the following algorithm: Neighbourhood broadcasting scheme Input Data . Begin Step 1 Run procedure Prebroad to create a weighted hypercube G(U,L,W); Step 2 Embed the binomial tree into the hypercube; // the inner for loops is executed in parallel
!e*&
V
for t=1 to - !"2#%&(' ) for all + ./ & / B # do then if ) / B i sends to / B end if end for V assign as total weight at step
V _ execute in parallel
// initialize weight at do
[3] S. Jang-Ping, W. Chao-Tsung, and C. Tzung-Shi. An Optimal Broadcast Algorithm Without Message Redundancy in Star Graphs. IEEE Parallel and Distributed Systems 1995; 6(6), pp.653–658.
[4] V. E. Mendia and D. Sakar. Optimal Broadcasting on the Star Graph. IEEE Parallel and Distributed Systems 1992; 3(4), pp. 389–396.
end for
" H [ B " !1Z,[ ;&
broadcasting cost // obtain the neighbourhood
End As it has been shown in [8] that for Cayley graphs of permutation groups formed by transposition the tighter bound unit time steps, the association is of the accumulated weight at each time step takes the same course as that of the hypercube and hence the neighbourhood broadcasting scheme for Cayley graphs of permutation graphs formed by transposition is given by,
!-" # /& ) 13! &
" H [ B " :! &
(5)
4. Conclusions This paper proposes new neighbourhood broadcasting schemes for Cayley graphs of permutation groups formed by transposition with bursty background traffic having HOL priorities, CBS scheme and RS-RD blocking. The schemes facilitate inter-process communication amongst parallel computers made up of Cayley graphs of permutation groups formed by transpositions and they are applicable to a single-port mode of message passing communication. The schemes utilize a queue-by-queue decomposition ME algorithm for arbitrary open QNMs, based on the principle of ME, in conjunction with binomial trees and graph theoretic concepts. It is shown that for any Cayley graphs of permutation groups formed by transposition, the cost for neighbourhood scheme is given by,
" H [ B " :! &
[2] S. B. Akers, D. Harel, and B. Krishnamurthy. The Star Graph: An Attractive Alternative to the n-cube. Internat. Conf. on Parallel Processing 1987; pp. 393–400.
(6)
References [1] D.D. Kouvatsos and I. Awan, Entropy Maximization and Open Queueing Networks with Priorities and Blocking, Performance Evaluation (2003); pp. 51:191–227.
[5] G. Fertin and A. Raspaud. Neighbourhood Communications in Networks. In Proc. Euroconference on Combinatorics, Graph Theory and Applications (COMB’01), Electronic Notes on Discrete Mathematics 2001; , 10. [6] G. Fertin and A. Raspaud. k-Neighbourhood Broadcasting. 8th Int. Colloquium on Structural Information and Communication Complexity (SIROCCO 2001) 2001; 11, pp. 133–146. [7] I. M. Mkwawa and D. D. Kouvatsos, ”An Optimal Neighbourhood Broadcasting Scheme for Star Interconnection Networks”, Journal of Interconnection Networks (2003); 4(1), pp. 103–112. [8] I. M. Mkwawa and D. D. Kouvatsos, ”Neighbourhood Communication Schemes for Cayley Graphs of Permutation Groups”, Technical Report, Deparment of computing, University of Bradford (2003). [9] D.D.Kouvatsos. Entropy Maximization and Queueing Network Models. Annals of Operation Research 1994; (48), pp. 63–126. [10] V. M. Lo, S. Rajopadhye, S. Gupta, D. Keldesn, M. A. Mohamed, and J. Telle. Mapping divide and conquer Algorithms to Parallel Architectures. Proc. of the 1990 International Conference on Parallel Processing. (March 1990), pp. 128–135. [11] J. Wu and B. Eduardo and Y. Luo. Embedding of Binomial Trees in Hypercubes with Link Faults. Journal of Parallel and Distributed Computing 10 October (1998); 54(1), pp. 59–74. [12] M. Garey, D. Johnson and W. Freeman. A Guide to the Theory of NP-Completeness. Computers and Intractability, 1979.