A General Expansion Architecture for Large-Scale ... - Semantic Scholar

A General Expansion Architecture for Large-Scale Multicast ATM Switches Sung Hyuk Byun and Dan Keun Sung [email protected], [email protected] Dept. of EE, KAIST, 373-1 Kusong-dong Yusong-gu, Taejon 305-701, Korea

Abstract | This paper proposes a multicast Universal Multistage Interconnection Network (multicast UniMIN) switch architecture for constructing large-scale multicast ATM switches with any type of small multicast switch. The proposed architecture consists of a buered distribution network that can perform cell routing and replication simultaneously, and a column of output switch modules (OSMs). The adoption of channel grouping and virtual rstin- rst-out (FIFO) buers results in high delay/throughput performance, and the distributed lookup table scheme for multicast addressing greatly reduces the size of a single lookup table. Analytical and simulation results show that high delay/throughput performance is obtained for both unicast and multicast trac, and the proposed architecture yields an even better performance for multicast trac than for unicast trac. In addition, the multicast UniMIN switch has such good features as modular expandability, simple hardware, and no internal speed-up operation. I. Introduction

Various multicast (point-to-multipoint) services, such as teleconferencing, video on demand (VOD), and distributed data processing, are expected to be among major applications of the future Broadband Integrated Services Digital Network (B-ISDN). Multicast functions of ATM switches can be implemented either with a separate copy network preceding the routing network, or with a single switching fabric performing both replication and routing functions. The latter will be called as the uni ed approach in this paper. The copy network approach is widely employed in many spacedivision switches [1]-[4]. The copy network replicates each incoming multicast cell (called a master cell) according to its fanout value. The header of each copied cell is translated to its destined output port address through the translation table lookup at the trunk number translator (TNT) [2]; the copied cells are then routed to the destination output ports through the following point-to-point switch. One problem in the copy network architectures is over ow, which occurs when the number of copy requests exceeds the number of output ports in the copy network. To overcome this problem, Lee [2] and Turner [3] proposed asymmetric copy networks, which has a larger number of output ports than that of input ports at the expense of additional hardware complexity. Another related problem is the head-of-line (HOL) fanout blocking [5]. When the fanout of an incoming cell is greater than the remaining idle output ports, the cell is blocked; further, the rest of the cells entering at the same time slot are also blocked. To resolve this problem, a fanout limiting scheme [3] and the cellsplitting method [5] has been proposed. In addition, the memory size of TNT tables increases signi cantly as the fanout and the switch size increase, because each table per output port needs to have the routing data for all the copied cells of each multicast connection [2]. The uni ed approach is preferred in the shared medium (bus or ring) output buering [6] or shared buering [7] ATM switches, because cell replication functions can be implemented by simply modifying their switching fabrics. As a rule, the uni ed approach is simpler than the copy network approach, and it does not suer from HOL fanout blocking or over ow problems. Both approaches may have some diculties as the switch size grows; for example, large translation table of copy network type switches, and limited bandwidth of shared medium or shared buer in the uni ed types. Furthermore, all switch architectures suer fundamental physical limitations in the number of

input/output ports and the circuit density that a VLSI chip or a PBA can accommodate. Therefore, large-scale multicast switches should have multistage structures with moderately sized modules. And the general expansion architecture which can be applied to any type of small multicast ATM switches in building large-scale switches can be a premising candidate. The Clos network [8] is one of the most popular general expansion architectures [7], [6], [9], because it can be built by simply interconnecting switch modules in a regular pattern. However, its multipath property requires ef cient path hunting methods [10], and performance degradation due to the multistage con guration is inevitable [7]. Besides, the internal routing for multicast connections is even more complicated [11]. In this paper, we apply a uni ed multicast approach to our general expansion architecture for large-scale, point-to-point ATM switches, called the UniMIN (Universal Multistage Interconnection Network) [12], [13], [14]. In the UniMIN switch architecture, the output ports are divided into multiple groups accommodated by output switch modules (OSMs) which are also ATM switches by themselves. The preceding distribution network provides a self-routing path to a destined OSM for each incoming cell. Though the distribution network is a buered multistage architecture, we can minimize the queueing delay in the distribution network within several cell times and can increase its throughput up to 100%. Therefore, the performance degradation due to the multistage con guration can be negligible. The multicast UniMIN switch architecture yields such good features of unicast UniMIN [14] as the self-routing capability, modular expandability, no internal speed-up, low hardware complexity, and high delay/throughput performances. In addition, multicast capability is added by simply modifying the distribution modules without additional large copy network. The remainder of this paper is organized as follows. In Section II, the architecture of the multicast UniMIN switch and its distribution module are introduced, and the multicast operations are described. In Section III, the performance of the proposed multicast switch is evaluated, and is then veri ed by computer simulations. Finally, conclusions are given in Section IV. II. Structure of the Multicast UniMIN Switch

A. Distribution Network-Based General Expansion Architecture The N N (N = 2s n; s = 0; 1; 2; : : : ) UniMINsswitch is comprised of a distribution network with s stages of 2 n n outputbuered modules and 2s OSMs at the last stage. Fig. 1 shows an example of the UniMIN switch for N = 4n and s = 2. The internal links of the UniMIN switch are grouped with n=2 links, and the links are shared within each link group. This is the concept of channel grouping [15], and results in a small delay and small buer size in the distribution modules. Though the successive cells of a VC can take dierent links within a link group, cell sequences in the distribution network can be preserved during cell switchings. This cell sequence integrity can be guaranteed by assigning a higher priority on the upper input link when buering the cells and by transmitting the rst cell of the virtual FIFO on the top output link, the second cell on the next link, and so on. It is also maintained at OSM by processing the uppermost incoming cell rst, if OSMs has no input buers. The input queueing OSM may require a resequencing operation. Since the link groups follow the interconnection pattern of a delta network with 2 2 modules, this network can inherently

0-7803-4201-1/97/$10.00 (c) 1997 IEEE

n

n/2 n

n

n/2

IPC

n n

n

n

Running Adder

Reverse Banyan

n-to-n/2 Selector

n n

#1

1

N output ports

N input ports

n/2 upper link group

R.A.

2

#n n

#1

n/2 lower link group

R.A.

n/2 links

nxn distribution module

nxn output switch module

n

#n

Fig. 1. Multicast UniMIN switch architecture for N = 4n

Virtual FIFO

Fig. 2. n n distribution module

support the self-routing feature based on the destination output port address. Unicast cells utilize the self-routing feature according to their destination output port addresses. The rst s bits of the destination address are used as a routing tag at the distribution network, and the k-th bit in the routing tag determines the outgoing link group of the k-th stage, i.e., \0" for upper link group and \1" for lower link group. The remaining log2 n bit address is utilized in the routing at the OSM. Multicast cells have multiple destinations belonging to one or more OSMs. In the proposed architecture, cell replications take place as far away as possible from the input ports. Based on this scheme, only one copy of a multicast cell is sent to each destined OSM, and each OSM performs further cell replication and routing within the module. This scheme can signi cantly reduce actual cell trac in the distribution network, and thus can improve the delay/cell loss performances of multicast trac. The routing and replication of multicast cells is performed based on the lookup tables of each distribution module and OSM. More detailed explanations on this multicasting scheme are given in the following subsections.

AF MF BA

-

53 bytes

AF : activity flag (1bit) ( ’1’ : active cell, ’0’ : idle cell) MF : multicast flag (1bit) ( ’1’ : multicast cell, ’0’ : unicast cell) BA : buffer address at the virtual FIFO DA : destination address for a unicast cell (MF=’0’) MCN : multicast channel number for a multicast cell (MF=’1’)

Fig. 3. Internal cell format

has \11." If a multicast connection is released, the entry is set to \00." The IPC checks the AF, and if AF=\0," it ignores the remaining bit stream of idle cell. If MF=\0," it routes the cell to one of virtual FIFOs according to the routing bit of DA. If MF=\1," it regards the fourth eld as the MCN, and obtains the routing information from the multicast lookup table, and then performs a cell replication and routing based on the lookup table entry. The virtual FIFO is a kind of output buer for each output link group and can accept up to n cells and transmit up to n=2 cells in a cell time. Though the time-multiplexed shared buer architecture with internal speed-up n times [7] can be an alternative, the virtual FIFO architecture is employed to avoid the internal speed-up operation. Each virtual FIFO consists of a running adder (RA), a reverse banyan network, n separate FIFO buers, and an nto-n=2 selector. The n separate buers are operated externally as a virtual, single FIFO buer by storing and transmitting cells to and from n buers in a cyclic order. The running adder and reverse banyan perform cyclic-order cell buerings, and the n-ton=2 selector ensures the cyclic cell transmissions. Such a \virtual FIFO" concept was rst applied in the output buer design of the Knockout switch to implement output buer without internal speed-up [16], and several implementation examples can be found in [17] and [18]. More detailed description on the virtual FIFO operation can be found in [14]. C. Multicasting Scheme The multicast UniMIN switch utilizes a distributed lookup table scheme as a multicasting addressing method. The multicast cells carry only their multicast channel numbers and each distribution module has two bit information for an MCN. The OSMs also have multicast lookup tables for their own output ports, and perform their own internal multicast functions. Since multicast routing information are distributed to all the modules, each lookup table of distribution modules can be built with a extremely small memory size (2B bits where B is the number of multicast channels of whole switch), compared with NBW (N : switch size, W : size of address) of Lee's copy network [2]. Lookup table entries of distribution modules and OSMs are updated at a multicast call setup. The main principle in the construction of a multicast tree is to replicate the cell as far away as

B. Distribution Module An n n distribution module consists of n IPCs (input port controllers) and two virtual FIFO blocks, as shown in Fig. 2. Output links are divided into two groups of n=2 links each, and each output link group has a virtual FIFO block shared by all the links of a link group. The distribution module routes each incoming cell to one or both of the output link groups, according to its destination address (for a unicast cell) or lookup table entry (for a multicast cell). The IPC performs such routing decision. The internal cell format of the multicast UniMIN switches is shown in Fig. 3. It has four additional elds in front of the 53byte cell body: activity ag (AF), multicast ag (MF), buer address (BA), and destination address (DA) for a unicast cell or multicast channel number (MCN) for a multicast cell. The AF is set to \1" and \0" for active cells or idle cells, respectively. The MF identi es whether the cell belongs to a multicast connection (MF=\1") or a unicast connection (MF=\0"). The BA is used for temporary storage of the buer address in which the cell should be stored inside the virtual FIFO of each stage distribution network, and it is calculated by the running adder of each stage. The fourth eld has dierent meanings according to its cell type. For a unicast cell (MF=\0"), it is the destination output port address used as the routing tag. For multicast a cell (MF=\1"), it means the multicast channel number (MCN) to which the cell belongs. IPC comprises a multicast lookup table and a 1 2 demultiplexer and some control logic circuits. The two-bit entry for each MCN in the lookup table determines whether a multicast cell should be replicated to both output link groups, or just routed to one of two output link groups. If it is \10" or \01," the cell is routed to the upper virtual FIFO or the lower virtual FIFO, respectively. Cells are replicated out when the lookup table entry

0

DA/MCN

7

8

0

3

-

4

2

0

1

-

1

/

Distribution Network

B

A

A

A

B A 11

ii) Uniform distribution

OSM B

A 10 B 11

A

2 5 6

A 2,5 B 2,6

lookup table B

B

B

B

B B 11

A 01 B 10

Ftgeom (k) =

2n+1 2n+2 2n+3

B 1,2,3

A A

3n+3 3n+4

A 3,4

A, B: multicast channel number

destinations of MCN A = {2, 5, 3n+3, 3n+4} destinations of MCN B = {2, 6, n+3, 2n+1, 2n+2, 2n+3}

Fig. 4. Routing example of multicast cells in the 4n 4n multicast UniMIN

possible from the input ports. Hence, only one copy of a multicast cell is sent to each destined OSM, and this reduces the actual cell trac of the distribution network. Fig. 4 shows a routing example of multicast cells in the 4n 4n UniMIN network. The destination set of MCN A is f2, 5, 3n+3, 3n+4g, and that of MCN B is f2, 6, n+3, 2n+1, 2n+2, 2n+3g. Then the destined OSMs of MCN A are f0, 3g, and f0, 1, 2g for MCN B. The lookup tables of distribution network are set to deliver the multicast cells to their destined OSMs, and each OSM has only multicast routing information associated with itself. III. Performance Evaluation of the Proposed Multicast Switch We consider an N N (N = 2s n) UniMIN switch which consists

of s stages of nn distribution modules, and nn OSMs of shared medium type ATM switch modules such as the ATOM switch [6]. The switching system is assumed to operate synchronously. In the rst half of a time slot, incoming cells of each stage are stored at the corresponding buers, and then the buered cells are transferred to the next stage in the next half time slot. If the virtual FIFO has no room to accept a cell, a cell loss occurs. When a multicast cell with k copy cells is lost, it is counted as k cell losses. Under the assumption that the destinations of a multicast cell are uniformly distributed to all the output ports in the symmetric architecture, the performance of a virtual FIFO represents that of the stage to which the virtual FIFO belongs, without loss of generality. Therefore, we consider the upper virtual FIFO of uppermost distribution module of each stage. We rst analyze the distribution of the output process for the rst stage distribution network. Since this is also the input process of the second stage, we can evaluate the performance of the second stage. Continuing this procedure up to the nal OSM stage, we can obtain the cell loss probability and mean delay of the distribution network and the UniMIN switch. A. Distribution Network We assume that the arrivals of multicast cells follow a Bernoulli process with load , and consider three fanout models for multicast cells as follows; i) Constant distribution

Fconst (k) =

1 ,k=C 0 , otherwise

1=U ; 1 k U 0 , otherwise

(2)

iii) Truncated geometric distribution

n+3

3

B

Funi (k) =

(1)

8 < (1 , q )q k,1 :0;

1 , qN

;1 1 N , otherwise

(3)

For the trac load of incoming master cells, the eective oered load of multicast trac with fanout F is given by E [F ]. We rst introduce some notations related to copy cells as follows; Fm (k) = PrfAn incoming cell of stage m carries k copy cells.g Pm (l) = PrfAn incoming cell of stage m carries l copy cells destined to the upper virtual FIFO.g Dm = PrfAn incoming cell of stage m has at least one copy cell destined to the upper virtual FIFO.g Fm is the fanout distribution of incoming cells at stage m, and Pm denotes how many copy cells in an incoming multicast cell are going to the upper virtual FIFO. F1 (k) takes one of the three fanout models (1)-(3). In the m-th stage, Fm (k) is written as m,1 (k) ; 1 k 2mN,1 ; m 2: (4) Fm (k) = PD m,1 As the destinations of copy cells in a multicast cell are uniformly distributed, Pm (k) can be obtained as

Pm (k) =

N=2m,1

N=2m N=2m k j ,k Fm (j ); 0 k 2Nm : m , N=2 1 (5) j

X

j =k

The copy cell whose destination address is one of [0; N=2m , 1] is routed to the upper virtual FIFO in the m-th stage. Dm is the probability that an incoming cell is routed or duplicated to the upper virtual FIFO, and is given by

Dm =

N= 2m X j =1

Pm (j ):

(6)

Next, we derive the arrival and departure processes of the virtual FIFO. Let Cin;m (k) denote the probability of k cell arrivals during a time slot in the m-th module, and Cout;m(k) denote the probability that k cells depart from the m-th stage virtual FIFO during a time slot. Cin;m (k) (m 6= 1) is the sum of two output processes of the previous stage virtual FIFOs, which are assumed to have independent, identical distributions. It is written as

Cin;m (k) =

X

i+j =k; oi;j n2

Cout;m,1 (i)Cout;m,1 (j ); 0 k n; m 2: (7)

In the rst stage, Cin;1 (k) has a binomial distribution

Cin;1 (k) = nk k (1 , )n,k : (8) As the virtual FIFO has n inputs and n=2 outputs with a nite buer size M , we can model it as a nite buer multi-server queueing system. Let Am;t and Qm;t denote the number of cell

0-7803-4201-1/97/$10.00 (c) 1997 IEEE

arrivals during t-th time slot and the total number of buered cells just after the t-th time slot in the m-th stage, respectively. Then Qm;t is given by Qm;t = minfmaxfQm;t,1 + Am;t,1 , n=2; 0g; M , n=2g; (9)

16

Mean Delay (cell times)

14

and we can model Q1;t by a discrete time Markov chain with nite state space f0; 1; : : : ; M , n=2g. The steady state probability of k cell arrivals at the m-th stage virtual FIFO during a time slot, am (k) is expressed as

am (k) =

n X l l=k

k

(Dm )k (1 , Dm )l,k Cin;m (l);

0 k n: (10)

A set of balance equations for the steady state queue length distribution at the departure epoch, qm (k), can be derived from eq. (9). Finally, we obtain fqm (k); 0 k M , n=2gPby solving the bal,n=2 ance equations with a normalization equation M i=0 qm (i) = 1. The output process of a virtual FIFO, Cout;m(k) can be obtained as 8 P > am (i)qm (j ) > > > i+j =k; > n > 0 i< ; < 2 0 Cout;m(k) = > jP am (i)qm (j ) > > i+j n2 ; > > > : 0in;

if 0 k < n2 ; if k = n2 :

(11)

Wm = Q0m + 1 = m

MP ,n=2 k=1 n= P2 k=1

kqm (k)

kCout;m (k)

+ 1;

(12)

where Q m is the mean queue length of the virtual FIFO, and 0m is the actual input trac load except for the blocked cells due to buer over ow, which is identical to the output link load in steady state. WDN is the total delay at the s-stage distribution network, and it is written as

WDN =

s X m=1

Wm :

(13)

The cell loss probability at the distribution network Ploss;DN can be obtained as ]E [Fs ] Ploss;DN = 1 , E [EC[out;s (14) a1 ]E [F1 ] : In (14), a loss of multicast cell with k copy cells is counted as the loss of k cells. WDN and Ploss;DN are performance indicates of the distribution network of UniMIN switches, regardless of the OSM architecture. B. Output Switch Module We assume that the OSM is an n n output buering ATM switch with output buer size B . In the OSM, each copy cell within an incoming multicast cell is routed to its destined output

0

-

12

Constant, n=32, s=5

DN+OSM

10 8 DN 6 4 2 0.3

0.4

0.5

0.6 0.7 Effective Load

0.8

0.9

1

Fig. 5. Mean delay at the distribution network (DN) and whole switch (DN+OSM) of 1024 1024 UniMIN switch

is de ned as Prfthe tagged output port is one of the port. If Dom destinations of the incoming cellg, it can be represented as n h

=X 1, Dom j =1

,n,1 ,nj

i

j

Fom (j );

(15)

where Fom is the fanout distribution of incoming cells and has the same form as Fm of distribution network. The probability of k cell arrivals to the tagged output port in a time slot, aom (k) is given by

0j M , n2

This procedure is repeated up to the nal stage distribution network, and then we have the PDFs of queue length of all the virtual FIFOs in the distribution network. The delay of a copy cell at a virtual FIFO is that of a multicast cell carrying the copy cell. Hence, the mean delay at the m-th stage distribution module, Wm is given by

E[F]=1(unicast) E[F]=10

aom (k) =

n X l l=k

k l,k n; k (Dom ) (1 , Dom ) Cin;om (l); 0 k (16)

where Cin;om (l) is the probability of l cells arriving at the given OSM in a time slot, and it can be derived from Cout;s(k) using (11). The steady state queue length distribution, fqom (k); 0 k B , 1g can be obtained from classical output queueing switch analysis procedure [19]. The mean OSM delay Wom is derived using Little's Law. BP ,1

kqom (k) (17) Wom = Q0om + 1 = 1 ,k=1 qom (0)aom (0) + 1; om where 1 , qom (0)aom (0) is the output link utilization. The total cell switching delay in the UniMIN switch is Wtotal = WDN + Wom =

s X m=1

Wm + Wom:

(18)

Finally, the cell loss probability of the UniMIN switch, Ploss;total is written as (0)aom (0) : (19) Ploss;total = 1 , 1 , qom E [F1 ] C. Results We now present various numerical results of delay/cell loss analysis of the multicast UniMIN switch. Fig. 5 compares the mean delay of multicast trac with that of unicast trac at 1024 1024 UniMIN switch. Both cases show very small delay (11 cell times for load=0.9), and the delay of multicast trac is smaller than that of unicast trac. It is noted that the delay of distribution network for multicast trac remains at the minimum value (5 cell times for a ve-stage distribution network), even for a high trac load of 0.95. This is because cell replications take place as close as possible to their destinations. Fig. 6 illustrates that dierent fanout models do not incur any dierence in the mean delay of

7

8

0

3

-

4

2

0

1

14 13

1e-02

T-geometric,anal. Uniform,anal. Constant,anal. T-geometric,simul. Uniform,simul Constant,simul.

10 9

1e-04

DN+OSM

n=16 s=3 E[F]=5

8 7 6 5 4 3

DN

1e-05 1e-06 1e-07

E[F]=1, unicast

1e-08 E[F]=2

1e-09 1e-10

2 1

E[F]=5

1e-11

0 0.1

0.2

0.3

0.4

0.5 0.6 Effective Load

0.7

0.8

0.9

1

Fig. 6. Mean delay for various fanout distribution models (n=16, s=3, E[F]=5)

1e-04 1e-06

M=32 n=16, s=3, Constant

1e-16

2

60

80 100 Buffer size of OSM, B

3

4

5 6 Mean Fanout, E[F]

7

8

9

10

Fig. 7. Cell loss probability of distribution network in the multicast UniMIN switch (constant fanout, s=3, n=16)

multicast UniMIN switch. The analytical results agree very well with simulation ones. As the mean fanout increases, the actual cell arrival rate in the distribution network decreases. This eect results in the lower cell loss probability in the distribution network for the larger mean fanout, as shown in Fig. 7. Fig. 8 shows the cell loss probability of 1024 1024 UniMIN switch with 32 32 output buering OSMs. The cell loss probability decreases exponentially as the output buer size B increases, and when the output buer size B exceeds a certain value, the cell loss probability remains at the constant value which is determined by the buer size of the virtual FIFO at the distribution module. This gure also illustrates that the cell loss performance for multicast trac is better than that for unicast trac. From the above results, we can expect that 1024 1024 or even larger multicast ATM switches meeting the given performance objectives in terms of cell loss and delay can be constructed by adopting the proposed UniMIN switch architecture. IV. Conclusions

120

140

Fig. 8. Cell loss probability of the 1024 1024 multicast UniMIN switch (constant fanout, n=32, s=5, M=64, e. load=0.9)

References

1e-18 1

40

This study is supported in part by the Ministry of Information and Communications, Korea.

1e-10

1e-14

20

Acknowledgement

M=16 1e-08

1e-12

1e-12

good features as a general expansion architecture for large scale ATM switches [14], and is thus expected to be a promising solution for large-scale ATM switches for future public B-ISDN.

load=0.9,analysis load=0.8,analysis load=0.9,simulation load=0.8,simulation

1e-02

Cell Loss Probability

n=32, s=5, M=64, eff. load=0.9 Constant

1e-03

Cell Loss Probability

Mean Delay (cell times)

12 11

In this paper, we proposed the large-scale multicast UniMIN switch architecture. It consists of a buered distribution network performing both cell replications and routings simultaneously and a column of OSMs. The proposed architecture does not require an additional large copy network to support multicast trac, and does not suer from the over ow or HOL fanout blocking problems of copy network architectures. The distributed lookup table scheme of the multicast UniMIN also greatly reduces the size of the single lookup table to 2B bits (B is the number of multicast channels). The performance of the multicast trac is even better than that of the unicast trac, which is the opposite result compared with that of conventional copy network type multicast switches. The multicast UniMIN switch can support a multicast trac ratio up to 100% with more improved delay/loss performances. It has excellent multicast capability in addition to such

[1] J. S. Turner, \Design of a broadcast packet network," IEEE INFOCOM'86, pp.667{675, 1986. [2] Tony T. Lee, \Nonblocking copy networks for multicast packet switching," IEEE J. Select. Areas Commun., vol. 6, pp.1455{1467, Dec. 1988. [3] J. S. Turner, \A practical version of Lee's multicast switch architecture," IEEE Trans. Commun., vol. 41, pp.1166{1169, Aug. 1993. [4] Wen De Zhong, Y. Onozato, and J. Kaniyil, \A copy network with shared buers for large-scale multicast ATM switching," IEEE/ACM Trans. Networking, vol. 1, pp.157{165, Apr. 1993. [5] Xinyi Liu and H. T. Mouftah, \A dynamic cell-splitting copy network design for ATM multicast switching," IEEE GLOBECOM'94, San Francisco, California, USA, pp.458{462, Nov. 1994. [6] H. Suzuki, H. Nagano, T. Suzuki, T. Takeuchi, and S. Iwasaki, \Outputbuer switch architecture for asynchronous transfer mode," IEEE ICC'89, vol. 1, pp.99{103, 1989. [7] Y. Shobatake, M. Motoyama, E. Shobatoke, T. Kamitake, S. Shimizu, M. Noda, and K. Sakaue, \A one-chip scalable ATM switch LSI employing shared buer architecture," IEEE J. Select. Areas in Commun., vol. 9, pp.1248{1254, Oct. 1991. [8] C. Clos, \A study of non-blocking switching networks," Bell Syst. Tech. J., vol.32, pp.406{424, Mar. 1953. [9] Y. Sakurai, N. Ido, S. Gohar, and N. Endo, \Large-scale ATM multistage switching network with shared buer memory switches," IEEE Commun. Magazine, vol. 29, pp.90{104, Jan. 1991. [10] Martin Collier and Tommy Curran, \Path allocation in a three-stage broadband switch with intermediate channel grouping," IEEE INFOCOM'93, San Francisco, California, USA, pp.927{934, Mar. 1933. [11] Soung C. Liew, \Multicast routing in 3-stage Clos ATM switching networks," IEEE Trans. Commun., vol. 42, pp.1380-1390, Feb/Mar/Apr., 1994. [12] Sung Hyuk Byun and Dan Keun Sung, \A universal multistage interconnection network for large scale ATM switches," IEEE GLOBECOM'93, Houston, Texas, USA, pp.19{23, Nov. 1993. [13] Sung Hyuk Byun and Dan Keun Sung, \A UniMIN switch architecture with shared output link type interconnection modules," JC-CNSS'94, Taejon, Korea, pp.263{268, Jul. 1994. [14] Sung Hyuk Byun and Dan Keun Sung, \The UniMIN switch architecture for large-scale ATM switches," Submitted to IEEE/ACM Trans. Networking., Jul. 1996. [15] A. Pattavina, \Multichannel bandwidth allocation in a broadband packet switch," IEEE J. Select. Areas Commun., vol. SAC-9, pp.1489{1499, Dec. 1988. [16] K. Y. Eng, M. G. Hluchyj, and A. S. Acampora, \The Knockout switch: a simple, modular, architecture for high-performance packet switching," IEEE J. Select. Areas Commun., vol. 5, pp.1274{1283, Oct. 1987. [17] H. J. Chao, \A recursive modular terabit/second ATM switch," IEEE J. Select. Areas Commun., vol. 9, pp.1161{1172, Oct. 1991. [18] Hyong S. Kim, \Multichannel ATM switch with preserved packet sequence," IEEE ICC'92, Chicago, Michigan, USA, pp.1634-1638, Jun. 1992. [19] M. G. Hluchyj and M. J. Karol, \Queueing in high-performance packet switching," IEEE J. Select. Areas Commun., vol. 6, no. 9, pp.1587{1597, Dec. 1988.

0-7803-4201-1/97/$10.00 (c) 1997 IEEE