Distributed Control Routing Algorithms for ... - Semantic Scholar

2 downloads 0 Views 372KB Size Report
Wolfram Web Resource, created by Eric W. Weisstein. http://mathworld.wolfram.com/HallsCondition.html. [11] Md. Mamun-ur-Rashid Khandker and Susumu ...
952

JOURNAL OF NETWORKS, VOL. 4, NO. 10, DECEMBER 2009

Distributed Control Routing Algorithms for Rearrangeably Nonblocking Optical Banyan Networks Md. Mamun-ur-Rashid Khandker Dept. of Applied Physics and Electronic Engineering, University of Rajshahi, Bangladesh Email: [email protected]

Abstract—Optical banyan networks can be made rearrangeably nonblocking using its multiple copies in unison. Such rearrangeably nonblocking networks are known as vertically stacked optical banyan (VSOB) networks. Centralized control routing algorithm for this rearrangeably nonblocking optical banyan networks has time complexity O(Nlog2N). A distributed control routing algorithm with time complexity O(log2N) has been proposed recently in which authors have considered N completely connected processors to take routing decision which practically would be very difficult to implement for large N. In this paper we propose two distributed control routing algorithms where processors are loosely connected. In the first algorithm O ( N ) processors work in pipelined fashion, and take the routing decision in linear time. In the second algorithm N inputs synchronously hunt for a free route among O ( N ) copies of a banyan network using a look up table. This algorithm has time complexity O ( N ) . Also the processors do not need to be completely connected, which makes it more practical for implementation.

Index Terms—Vertically Stacked optical Switch networks, Nonblocking Switch Networks, Completely connected graph, Bipartite graph.

I. INTRODUCTION The ubiquitous presence of the multimedia applications and the tremendous popularity of the World Wide Web have dramatically increased the amount of traffic over the Internet, which creates the urgent requirement for high bandwidth links and switches. Due to the progress in DWDM technology, link bandwidth for carrying such traffic is no longer a problem. For DWDM networks, optical cross-connect (XOC) in nodes are key functional elements that steer whole network traffic by simultaneously switching a huge amount of optical flows. To build a large-scale OXC, generally small (e.g 2x2) switching elements (e.g. Directional couplers) are used as the basic building blocks, and are interconnected in a particular fashion to satisfy the required connectivity. Here we refer to the interconnection pattern of the optical links, the basic SEs, and the input/output ports of the

Corresponding author: [email protected]

© 2009 ACADEMY PUBLISHER doi:10.4304/jnw.4.10.952-959

switch, as optical switch network. Directional couplers (DC) [1] are considered as a promising candidate for the basic SE since it can handle optical signal of some terabits per second using WDM technology. It is notable that DC suffers from an intrinsic crosstalk problem [1][2], in which a portion of optical power in one waveguide of a DC will be coupled into the other waveguide unintentionally when two optical flows pass through the DC at the same time. This undesirable coupling effect is called first-order crosstalk, which may propagate downstream stage by stage, leading to a higher order crosstalk in each downstream stage with a decreasing magnitude. According to the blocking properties, switches networks are classified as blocking and nonblocking. If a free input cannot be connected to a free output due to internal link-blocking then it is considered as blocking. However, if multiple inputs intend to be connected with the same output, it causes output contention resulting in blocking to all requests but one, and mainly regarded as the property of the traffic pattern given to the network. Crosstalk in photonic switching networks adds a new dimension of blocking, called crosstalk blocking, which happens when an SE has two active inputs/outputs. A cost-effective solution to the crosstalk problem is to make sure that only one signal passes through each SE of a network at a time. Although this kind of space dilation of the network can eliminate the blocking and crosstalk completely, it may increases the hardware cost significantly if the interconnection pattern of the switching network is not chosen carefully. Since optical memory is not in a matured state to be used in an efficient queuing system, nonblocking networks have been favored in optical switching system. There are three different types of nonblocking networks: strictly nonblocking (SNB), wide-sense nonblocking (WSNB) and rearrangeably nonblocking (RNB). In both SNB and WSNB networks, a connection between a free input and free output can always be established without interrupting existing connections; however there is a difference. In SNB, every pair of input/output has a dedicated path, where as in WSNB, a smart algorithm is used to choose the path from several alternatives so that this connection does not result in blocking to any future connections. Therefore, SNB networks have the higher hardware cost

JOURNAL OF NETWORKS, VOL. 4, NO. 10, DECEMBER 2009

than WSNB networks but the latter have higher time complexity than the former one. RNB networks can also establish connections among idle inputs and outputs but has to allow rearrangements of the existing connections. It has the lowest hardware cost among the three, and the routing time complexity is often in between WSNB and SNB. Banyan networks [3], although a blocking networks, are attractive for constructing DC-based optical switches for their small depth, absolute loss uniformity (each path goes through exactly the same number of DCs), simple switch setting ability (self-routing, and therefore time complexity O(logN)), and for lower hardware cost (switch complexity O(NlogN)). However, optical implementation of banyan networks results in severe blocking and crosstalk. To deal with this situation, it is a novel approach to keeping the whole network nonblocking as well as crosstalk-free by vertical stacking multiple copies (planes) of an optical banyan network [4]. This network is called vertically stacked optical banyan (VSOB) network which preserves all the property of banyan networks except self-routing characteristics. There has been many works on the performance of VSOB networks. However, the performance was determined by the property of blocking probability and switch count mainly [4]-[7],[11] etc. It has been shown that total ¬(log N 1) / 2 ¼ number of T 2 planes are required to make the switch network rearrangeably nonblocking with a routing algorithm that has the time complexity O(NlogN). This time complexity may be considered too high when N is large. For optical interconnections where data rates are very high, and has no viable optical memory system, routing decision should be taken as fast as possible. Lu et. al. [8] have proposed a O(log2N) parallel routing algorithm where N completely connected processors have been used, and information is exchanged among N nodes at every computation steps. They have recursively used balanced 2-coloring algorithm of a bipartite graph G(N,k,g) to get the g-edge coloring of the graph. G(N,k,g) is a bipartite graph with N/g vertices (in each of V1 and V2) and k edges where at most g edges are incident at any vertex. When k = N, creating such a graph would take O(N) time if used single processor, but it has been considered as O(1) as all N processors are completely connected. It is practically very difficult to develop a completely connected network when N is large. Therefore, we have been encouraged to develop a more pragmatic solution to this problem. In our first algorithm N processors work in pipeline fashion, and therefore, 2

are connected very loosely. Information is exchanged only between two adjacent processors. The routing time complexity of this algorithm is O(N). In our second algorithm N inputs are provided with a controller (limited processing ability), and they hunt for right plane in which the signal can be routed without resulting in any blocking. Every input looks into a ‘plane status table’ for knowing busy/free status of a particular plane for the input. The plane status table keeps the status information in such a way that input can resolve blocking in O(T) 1

All logarithms used in this article are of base 2.

© 2009 ACADEMY PUBLISHER

953

times, where T is the number of plane. Inputs

Output Banyan network

0 1

0 1

Banyan network

: .

: .

: .

Banyan network

(a)

(b) Figure 1. a) NuN VSB network. b) A 16u16 banyan network.

The rest of this paper is organized as follows. Section II introduces the problem with their background in brief. Section III proposes our first distributed control algorithm. Section IV describes the second distributed control algorithm and compares its performance with that of previous works. We conclude this paper in section V. II. BACKGROUND A. Vertically Stacked Banyan (VSB) Networks Vertically Stacked Banyan (VSB) networks [4][7] are constructed by stacking T copies of NuN optical Bayan networks. If T is number of banyan networks required to make the network rearrangeable nonblocking as well as crosstalk-free then,

T

2¬( n 1) / 2 ¼

(1)

Here n is the number of stages in the banyan networks, i.e. n = logN. Fig. 1 shows an NuN VSB network. Circles at inputs represent 1:T input switches and circles at outputs represent T:1 combiner. An input request is sent only one banyan network (plane) at a time. So, only one of the T links is busy at any time. On the output side, a combiner receives signal from only one plane since the request is considered to be a permutation. Therefore, switching is not required at the output side. The input request is split into partial permutations (CRPP) in such a way that each of those CRPP can be realized in a banyan networks without resulting in first-order crosstalk. A centralized control circuit determines T partial permutations, and sets the input switch accordingly to route all signals of a CRPP to a banyan plane. Being a banyan network, each plane is self-routing.

954

JOURNAL OF NETWORKS, VOL. 4, NO. 10, DECEMBER 2009

1) Crosstalk-free Realizable Partial Permutation (CRPP) A full permutation is split into T partial permutations in such way that each of them can be realized in a banyan network without any crosstalk and blocking. Each of these partial permutations is called Crosstalk-free Realizable Partial Permutation (CRPP) [4]. Considering that the traffic arrives in the form of permutation, routing of the request is being done by (a) Decomposing a permutation into CRPPs (b) sending T CRPP to T planes of banyan networks. 2) CRPP Decomposition Algorithm As indicated in [4], a CRPP decomposition algorithm can easily be obtained by repetition of the simple bipartite graph coloring with two colors. Due to space limitation here we describe the algorithm in short and simple way. A bipartite graph G = (V1, V2, E) is constructed where each input switch is considered a node in set V1 (input set) and each output switch is considered a node in set V2 (output set). An edge represents that there is a connection request from an input switch to an output switch. Since each switch is a 2u2 DC, the node degree is always 2. Using Euler’s split G is split into G1 (consisting of all forward orientation) and G2 (consisting of all reverse orientation). Each of these bipartite graphs has N/2 input nodes, N/2 output nodes and N/2 edges, and represents a partial permutation. However, these partial permutations only guarantee that they are crosstalk free in the input and the output stage. §0 ¨¨ ©8

1 2

2 0

3 12

4 4

5 13

6 3

7 11

8 9

9 1

10 11 12 13 14 15 · ¸ 6 7 5 10 15 14 ¸¹

(a) I0={0,1}

O0={0,1}

I1={2,3}

O1={2,3}

I2={4,5}

O2={4,5}

I3={6,7}

O3={6,7}

I4={8,9}

O4={8,9}

I5={10,11}

O5={10,11}

I6={12,13}

O6={12,13}

III. DISTRIBUTED CONTROL ALGORITHM 1 A. Problem Formulation First we show that the input-output request patterns can be represented by a 2t regular bipartite graph. Then we prove that the graph has such perfect matching which can be routed through a banyan plane without any blocking. We will use few pipelined processors for finding and routing these perfect matching in the O(N) time complexity. For a network having odd number of stages, n, stage (n+1)/2-th is the middle stage, and for even n, stages n/2th and (n/2+1)-th are middle stage switches. Since each SE is connected to two SEs of the previous stage (as well as of the next stage), the number of inputs (outputs) ( n 1) / 2 ¼ intersects at a middle stage SE is 2 ¬ i.e. T. Therefore, we propose input and output groups as per following definitions. Definition 1: Input Groups, Ii, is defined as the set of inputs that intersect each other in a middle-stage SE. Therefore, Ii

(b) §1 ¨¨ ©2

2

5

7

8 10 12 14· ¸ 0 13 11 9 6 5 15¸¹

§0 3 4 ¨¨ 8 12 4 ©

6 3

9 11 13 15· ¸ 1 7 10 14¸¹

(c) § 1 7 10 14 · ¨¨ 2 11 6 15 ¸¸ © ¹

§ 2 5 8 12 · ¸¸ ¨¨ © 0 13 9 5 ¹

{1,2}

{0,2}

{5,7}

{5,6} {9,11}

{8,10} {12,14}

(e)

{13,15}

(d)

Figure 2. Illustration of CRPP algorithm. (a) The input request (b) Bipartite graph G=(V1,V2,E) (c) Two partial permutations resulted from G (d) Graph G1c for the left partial permutation. (e) Two CRPP resulted from graph G1c.

© 2009 ACADEMY PUBLISHER

^iT  0, iT  1, ..., iT  (T  1)`, where 0 d i d

N 1 T

(2)

Definition 2: Ouput groups, Oi, is defined as the set of outputs that intersect each other at a middle-stage SE. Therefore,

O7={14,15}

I7={14,15}

In the next iteration, G1 is reconstructed where adjacent two nodes (from input set and output set) are merged. Therefore, this new G1 has N/4 input nodes, N/4 output nodes and N/4 edges. The node degree is again 2. Using the Euler’s split algorithm in the same way as mentioned before, the G1 is split into two graphs. G2 is also split in the same way. So, we get 4 partial permutations each is represented by a graph. Using this technique recursively, T partial permutations are generated. Each of these T partial permutations is a CRPP. Xiang et. al [4] has shown that it takes O(NlogN) time to generate these CRPPs. Considering that it takes O( N ) time to send T CRPP to T planes, the time complexity for routing a permutation in VSB networks is O ( N  N log N ) O ( N log N ) .

Oi

^iT  0, iT  1, ..., iT  (T  1)`, where 0 d i d N  1

(3)

T

Definition 3: Two connections, xy and pq are said to have group conflict if p, x Ii and/or q, y Oj , where 0 d i, j d T 1 .

Fig. 3 shows the grouping of inputs and outputs for a 16u16 banyan networks. It is interesting to see that two requests, §¨ 2 ·¸ and §¨ 4 ·¸ ¨ 0¸ © ¹

¨ 7¸ © ¹

have different input groups and output groups, and they have disjoint paths from input to output. Therefore, these two requests can be routed through a 16u16 banyan network without resulting in blocking and crosstalk. We

JOURNAL OF NETWORKS, VOL. 4, NO. 10, DECEMBER 2009

need to know the minimum number of such connections for a banyan network. We give two essential Lemmas in this regard. Lemma 1: Any middle stage switch can carry connections generated (destined) from (to) only one input (output) group.

(a)

(b) Figure 3. Illustration of IG and OG (a) Even number of stages (b) Odd number of stages

Proof: Let’s first prove it for inputs. A switch in stage i is connected to two adjacent switches of stage i-1 (and i+1). A switch at stage i is connected to only 2i adjacent inputs. Therefore, a middle stage switch is connected to « n 1 » « 2 » ¼

only 2 ¬

(i.e T) adjacent inputs. For example, first T

inputs are connected to 0 middle stage, where

th

and k .2

0k 2

«n» «2» ¬ ¼

«n» «2» ¬ ¼

-th SEs of the

(for odd n) and

« n 1 » « 2 » ¬ ¼

(for even n). No other connections can pass through these switches. This group of inputs corresponds to an input group. Therefore, any SE in the middle stage carry connections generated from only one input groups. An SE in the middle stage is connected to two SEs of n  ¬ n21 ¼ n  n 1 2 u 2 ¬ 2 ¼ banyan networks. Therefore, an SE of the middle stage carry connections destined to these two n  n 1 n  n 1 2 ¬ 2 ¼ u 2 ¬ 2 ¼ networks. The total number of inputs 0k 2

n 1 of these two networks is 2 .2 n  ¬ 2 ¼

2 ¬ 2 ¼ T O i . So, we conclude that a middle stage SE can carry connections destined to only one output group. QED Lemma 2: Connections without group conflicts can be established in a banyan networks without crosstalk. Proof: Let us consider any two connections xy and pq do not have any group conflict. Since x  I i , p  I j and i z j, they belongs to different middle-stage SEs. Therefore, their routes up to middle stage are switch-disjoint. Similarly, y  Oi , q  O j and i z j, so their routes from output to the middle stage are also switch-disjoint. Since © 2009 ACADEMY PUBLISHER

n 1

955

Banyan network is a unique-path network, two sections of a route of the same connection must coincide at one and only one middle-stage SE. Therefore, the whole routes for xy and pq are switch-disjoint. Since they are disjoint, all SEs along their paths contain only one signal. Thereby, both pq and xy can be established without resulting in crosstalk. When n is even, there are two middle stages; stage (n/2) and stage (n/2+1). Two connections generated from different input groups are switch-disjoint up to stage n/2, and two connections destined to two different output groups are switch-disjoint up to stage n/2+1. An SE in the n/2+1-th stage is connected to two SEs in the n/2-th stage those belong to different input groups. On the other hand two SEs in the n/2-th stage are connected to two SEs in the n/2+1-th stage those belong to different output groups. That means there is only one link between these two middle stages for a connection generated in an input group and destined to a particular output group. Therefore if two connections do not have group conflict they pass through different SEs in stages n/2 and n/2+1, which ensures that both the connections have switch-disjoint paths from input to output. QED Needless to say that if two connections have disjoint paths from input to output they will certainly not produce any blocking. Therefore, a crosstalk free permutation must be nonblocking in the network. Now we give theorem 1 by which we determine the minimum number of crosstalk-free connections that can be established in an NuN optical banyan switch networks. Theorem 1: An NuN Optical banyan networks can establish at least N/T connections crosstalk-free. Proof: Let S be the size of input or output groups. Then S Ii O i . By Lemma 2, one pair of input-

output groups can have at least one connection crosstalkfree. For a switch network of size NuN, there are N/S pairs of input-output groups. Therefore, at least N/S connections can be established without any crosstalk. « n 1 » « »

Again, S 2 ¬ 2 ¼ =T. Therefore, an NuN optical banyan switch network can establish at least N/T connections crosstalk-free. Since an NuN VSB network has T vertically stacked NuN optical banyan networks it can establish T . N N T

connections simultaneously without any crosstalk. Considering a group as a node we present a bipartite graph G(V1, V2, E), in which V1 is the set of input nodes, V2 is the set of output nodes and E is the set of edges connecting them. An edge corresponds to a connection request. Therefore, E represents a request pattern which is essentially a permutation. Fig. 4(b) shows a bipartite graph for the connection pattern in Fig. 4(a). Finding a set of N/T crosstalk-free connections is a problem of finding a perfect matching in G. To realize all N connections T such perfect matching is required to be found. We prove that G is T (=2t ) regular bipartite graph and it has T perfect matchings (PMs). Then we propose an algorithm for calculating these PMs in O(N) time.

956

JOURNAL OF NETWORKS, VOL. 4, NO. 10, DECEMBER 2009

Each PM corresponds to a CRPP. First we discuss PM and its properties. A matching in G is a subset of E such that no two edges share a common vertex. A perfect matching in G is such that every v  V1 is the endpoint of an edge. For example, set of red edges in Fig. 4 is a perfect matching in G.

processors end their operation at the same time. Since all PMs are found at the same time, the overall time complexity is O(N). Each of the T/2 processors takes the responsibility of sending 2 CFPP to 2 planes. First two planes are assigned to first processor, 2nd two planes to 2nd processor and so on.

§ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15· ¨¨ ¸¸ © 8 2 0 12 4 13 3 11 9 1 6 7 5 10 15 14¹ (a)

IV. DISTRIBUTED CONTROL ALGORITHM 2

V1

In this algorithm we consider that the control is distributed to every input. Every input hunts for its route in a plane. A request may be blocked in a plane from any stage among logN stages. The input then searches all other planes one after another. Therefore, it is easy to see that a simple hunting algorithm would result in time complexity O(TlogN). However, our proposed algorithm can reduce this complexity to O(T). Since these N processors (at N inputs) are not connected to each other with dedicated links, they are said to be loosely connected.

V2

(b) Figure 4. (a) A connections pattern. (b) Corresponding G(V1, V2, E).

P. Hall’s Theorem [10] states that in a bipartite graph G(X,Y,E) there exists a perfect matching of X into Y if and only if, for any S Ž X , S d * ( S ) , where *(S ) is the set of nodes adjacent to nodes in S. Our bipartite graph G(V1,V2, E) is a T-regular graph. For any Q  V1 , 1 d *(Q ) d T . Therefore,

*( S )

¦ *( v ) t S

(3)

A. Status registers We assume that there is a status register for every output group in all planes. These registers together are represented by a table of T rows and N/T columns (status table) as shown in Fig. 5(a). A row Rgi corresponds to plane Pi. If the element of k-th column in row Rgi is zero then the output group k is free to be assigned in plane Pi. An input can only invoke a particular column of Rgi. Switch Group status register, Rg0 controller

Plane 0 16x16 Banyan net

vS

Plane 1 16x16 Banyan net

So, G has a perfect matching. B. Routing Algorithm 1 Our algorithm for decomposing a permutation into CFPP is based on Gabow’s algorithm for edge coloring of bipartite multigraphs [9] in which author showed that a PM can be found in O(m) time where m is the number of edges in the graph. We have introduced parallelization in process of partitioning a 2t regular graph into 2t-1 regular graph. We summarize our algorithm as follows: Step 1: Construct the graph G(V1,V2, E). Step 2: Using Euler orientation, split G into two 2t-1 regular subgraph G1(V1, V2, E1) and G2(V1, V2, E2). All forward edges (oriented from V1 to V2) form E1 and all backward edges (oriented from V2 to V1) form E2. Step 3: process each of the subgraphs simultaneously by different processors. Step 4: Repeat step 2 until all subgraphs are 1-regular bipartite graph. Step 1 and step 2 require O(N) time. A processor has to visit all N edges in the first call of the step 2. In the second call the processor visits N/2, in the third call N/4 and so on. Therefore the time complexity for one processor is O(N+N/2+N/4+ … ) = O(2N) = O(N). However, 2nd processor has to visit N/2 edges, 3rd processor visits N/4 edges, and so on. Therefore, all the © 2009 ACADEMY PUBLISHER

Plane 2 16x16 Banyan net Plane 3 16x16 Banyan net (a) §0 ¨¨ ©8 a

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15: Input · ¸ 1 6 7 5 15 10 14: Output¸¹ j k l m n o p : Conn. Name

2 b

0 12 4 13 3 11 9 c d e f g h i

§0 ¨¨ ©2

0

0

0

1

1

1

1

2

2

2

2

3

3

3

0

0

3

1

3

0

2

2

0

1

1

1

3

2

3 : InputGroup · ¸ 3 : OutputGrou p ¸¹

(b)

0

1

1

0

Rg0

2

1

2

1

P0

1

0

0

1

Rg1

0

3

0

3

P1

1

1

1

0

Rg2

0

0

1

2

P2

0

1

1

1

Rg3

3

2

1

3

P3

Status table

(c)

PPA

Figure 5: (a) 16u16 VSOBN with status registers. (b) An input request pattern and its I/O grouping. (c) PPA of the request and status table.

JOURNAL OF NETWORKS, VOL. 4, NO. 10, DECEMBER 2009

957

B. Primary plane assignment (PPA) The shared register array, Rgi, keeps the inputs’ plane assignment information. A ‘0’ in the column ‘0’ of Rg0 means that no connections destined to output group ‘0’ has been assigned to plane P0. A ‘1’ in Rg1 means that a connection destined to the output group ‘0’ has been assigned to plane P1. If another connections destined to the same output group invokes Rg1, it understands that the plane P1 is busy. When inputs receive a set of requests they temporarily assign themselves to the planes following Eq. (4). This assignment is called primary plane assignment. Let Pi be the group of inputs connected to plane i, and xiIj is an input belongs to this plane, then Pi

­ ® xi | xi  x j , j ¯

0,1,2,...,

N ½  1¾ , i=0,1,...,T-1 T ¿

(4)

Here input xi has originated from input group Ij. It can be seen from Eq. (4) that all N inputs are always evenly distributed among the T planes. A PPA can be represented by an array as shown in Fig. 5(c). Each row of the array corresponds to a plane and each column corresponds to an input group of an input. Each entry in a cell corresponds to the output group of an output which the input requests for. For example, in the figure the ‘0’ in plane P0 means the request arrives at the input which belongs to the input group ‘0’ and destined to output group ‘0’. Similarly, entry ‘2’ in column 3 means the request arrives at the input that belongs to input group 3 and wants to be connected to the output that belongs to output group 2. Since a plane always has inputs belonged to different input groups, there is no contention up to the middle stage from the input side. However, contention may be resulted in from any stages after the middle stage if more than one connection has the same output group as their destination (see Lemma 2). In Fig. 5, P0 has 2 in column 0 and 2; it means two inputs, one from input group 0 and the other from input group 2 want to be connected to the outputs belonged to the same output group 2. Therefore, there may be a contention. Our algorithm replaces these contentious elements in such a way that there will always be unique elements in Pi. A plane is contention-free if all the elements in the corresponding Pi are different. In the Final Plane assignment (FPA) Pi has unique elements. Definition 4: Contention: If an element has multiple instances on the same row of the PPA then there is a contention in that plane. Let ak = am, k  m, where k and m are column index in PPA, then they are said to be contentious elements. We use symbol a*k and a*m to represent their contention status. Definition 5: Free-plane: By the term ‘free-plane for q’ we mean that the plane does not have the element q. Pi is a free-plane for q if and only if qPi. Definition 6: Target plane: A free-plane is a target plane for the element for which it is free for. Definition 7: Swap: Source plane interchanges its element with the corresponding (in the same column) element on the target plane.

© 2009 ACADEMY PUBLISHER

Definition 8: Target element: If for a*kPi there exists b*kPj where Pj is the target plane for ak, then b*k is called the target element. Definition 9: Dummy element: If for a*kPi, there exists bkPj, where Pj is the target plane of ak, then bk is called the dummy element. Definition 10: Invoke operation Pi o P j : Inputs presently assigned to Pi invokes Pj and check its assignment status with the help of Rgj. Let xŽPi be the set of contentious elements, and finds a target element set yŽPj, j i, then x and y are swapped. If |x|t2, it means multiple contentious elements belonged to the same output group find free-plane, P, at the same time. In such case only one is allowed to swap. C. Routing Algorithm 2 1. Create PPA. 2. Set iteration counter j = 0. 3. Pi o P( i  j 1) mod T for i = 0,1,...,T-1. Only the contentious connections of Pi take part in the invoke operation. 4. If there is a target element, swap with it; else swap with any dummy element. 5. If contention exists, repeat for step 3 with j = j+1, else stop hunting. FPA created. 6. Each input routes itself to the plane according to FPA. Now, we prove that the algorithm is nonblocking, i.e. all contentions will be removed within T iteration. Lemma 3: Contention happens in pair and they are also resolved in pair. If an element x appears twice in plane Pi, there exists a plane Pj, i  j, where x does not appear. This plane, Pj, is called free plane for x. Absence of x in Pj also means that an element, y, y  x, appears twice in Pj, and creates a contention. Therefore, contention happens in pair. To resolve the contention, one x is removed from Pi and placed it in Pj, i  j, and at the same time y from Pj is removed and placed it in plane Pi. If this swapping of elements removes a contention in Pi, the contention in Pj is also removed. However, replacing x with y in Pi may create a contention (when y is a dummy element for x) for y in Pi. Contention for y in Pi creates a contention for another element z in another plane Pk. When swapping will take place between Pi and Pk these two contentions will be resolved. Therefore, contentions are resolved in pair. A connection has to visit T cells along a column to find the target element. In the first three iterations where j = 0,1 and 2 respectively, the search sequences are as following:

P0 o P1

P0 o P2

P0 o P3

P1 o P2

P1 o P3

P1 o P4

 PT 1 o P0

 PT 1 o P1

 PT 1 o P2

j=0

j=1

j=2

958

JOURNAL OF NETWORKS, VOL. 4, NO. 10, DECEMBER 2009

This process continues until j = T-1. In the worst case a plane can have contentions at T-1 inputs. Therefore, T planes of the network can have T (T  1) pairs of

be in state (d) instead of (c) as shown in the Fig. 6. Now j is increased to 1 for the next iteration of hunting. ii) Iteration j = 1, invoke sequences are P0 o P2 ,

contentions. Since an input controller can resolve T/2 pair of contentions in one iteration. T- 1 inputs can resolve T (T  1) pair of contentions in T iterations. 2 Therefore, the algorithm can establish all connections in T iterations without resulting any blocking.

P1 o P3 , P2 o P0 , P3 o P1 , i.e. elements of P0 look

2

D. Illustration Consider the example of a 16u16 VSOBN shown in the figure. When request arrives at the inputs, they create PPA using Eq. (4) and status table. When there is any contention in any output groups they are marked by an asterisk. See Fig. 5(c). 0 1 1 0 Rg0 2* 1* 2* 1* P0 0 1 2 3 1 0 0 1 Rg1 0* 3* 0* 3* P1 2 3 0 1 1 1 1 0 Rg2 0* 0* 1 2

P2 3 0 1 2

0 1 1 1 Rg3 3* 2 1 3* P3 0 2 1 3 (b) (c) (a) 0 1 2 3

0 1 2 3

0 1 2 3

2 0 1 3

2 0 1 3

2 0* 0* 3

0 3 1 2

3* 3* 1 2

3* 3* 1 2

3 2 0 1

0* 2 0* 1 (e)

0 2 1* 1* (d)

(f)

Figure 6: (a) Status table of 16x16 VSOBN. (b) PPA. (c) FPA . (d), (e) and (f) show the alternative path to reach in FPA.

into elements of P2, elements of P1 to P3, P2 to to P0, and P3 to P1. P0 does not have any contention, so its element controllers remain idle. Also 0*s in P1 do not find P3 as the target plane. However, P3 finds P1 as the target plane since P1 is a free plane for 1*. According to step 4, 1* in column 2 of P3 swaps with corresponding 0* in P1 (see Fig. 6(e)). ii) Iteration j = 2: Invoke operations are P0 o P3 ,

P1 o P0 , P2 o P1 , P3 o P2 . P3 finds the target element for its element 0* in column 0 in P2. They swaps and results in Fig. 6(f) which is the FPA. E. Time complexity Creating the FPA in step 1 takes constant time. For step 3 to 5, at most T iterations are required to remove all contentions. Since all connections searches for its free plane simultaneously, the time complexity of these steps is O(T). Routing N requests in step 6 according to FPA takes constant time by individual N input controllers. Therefore the overall time complexity of the algorithm is O(T). F. Comparison with O(log2N) algorithm Table I shows a comparison on time complexity. For N