DESIGN AND IMPLEMENTATION OF CROSSBAR ...

47 downloads 820 Views 2MB Size Report
analyse the performance of delay and throughput metrics ... packet, node, link, queue, routing algorithm, protocols, etc. ... such as, awk, perl, python, etc. and graphs can be plotted ... lossless, buffered vs. bufferless, blocking vs. non-blocking,.
DESIGN AND IMPLEMENTATION OF CROSSBAR SWITCH IN NS2 Sujeet Kumar? and Anand Srivastava† School of Computing and Electrical Engineering Indian Institute of Technology Mandi, India, 175001 ? sujeet [email protected], † [email protected] ABSTRACT Crossbar switches (XS) are predominantly used as switching fabric in different network components such as network switches, routers, bridges, etc. These components form an important part of the communication network and the Internet. While many implementations exist in research literature for various types of XS, they are limited to a few types. NS2 is a commonly used network simulator which allows exhaustive simulations of network system by means of different parameters. However, currently, NS2 does not support various types of buffered and bufferless XS. In this work, we implement buffered and bufferless XS in NS2, and analyse the performance of delay and throughput metrics with respect to varied switch size and traffic load. The main advantage of our implementation is, it can perform simulations and analyse the performance, by changing the parameters of XS like buffer size, switch size, bandwidth etc. under varied network conditions.

graph using Xgraph. From trace file, the relevant information can be extracted using programming script and language, such as, awk, perl, python, etc. and graphs can be plotted using plotting tools like gnuplot, qtplot, grace, etc. We use NS2, version 2.35 and Linux platform to implement, and run the simulations of XS. In this paper we implement buffered crossbar switch (BXS) and bufferless crossbar switch (BLXS), and compare the performance achieved by both switches. The earlier work, [2] is related to design and implementation of switches in NS2. They implemented only BLXS and compared the performance matric throughput with theoretical results. The paper is organized as follows. Section II gives an overview of related work. Section III gives introduction about XS. Section IV describe the implementation of XS in NS2. In Section V, we present the simulation results and analysis and finally Section VI gives the conclusion and future work.

Index Terms—crossbar switch, NS2, TCP, UDP, buffer, scheduling

II. R ELATED WORK

I. I NTRODUCTION Day by day, Internet traffic is increasing and Internet is becoming more congested. Due to congestion, there is degradation in performance e.g., delay, loss rate, throughput, etc. In order to improve the performance of traffic in the Internet, we need to carry out simulation exercises. In this paper, we use a network simulator (NS2) and implement various types of crossbar switches (XS) for studying and analyzing the performance of network traffic with respect to XS. NS2 is an open source and discrete event network simulator [1]. It is designed using two languages: C++ and OTcl (object-oriented Tool Command Language). C++ is used to define internal mechanism of NS2 and OTcl is used to configure, and set up the simulation program of NS2. TclCL (Tcl with classes) is used to link C++ and OTcl objects. NS2 supports many modules for network components such as packet, node, link, queue, routing algorithm, protocols, etc. NS2 generates output in text-based (trace file), animationbased (Network AniMator); and it can also generate the

There are many existing hardware and software implementation of XS where the number of software implementations are very few. This section gives only an overview of existing software implementation of XS using either programming languages or network simulator or both. SIM simulator [3] and VOQ simulator [4] are implemented in C and Java respectively, where both simulate BLXS only. Network simulator, NS2 is used for implementation of BLXS only in [2], and it also analyze the throughput performance and validate it with the theory presented by [5]. Other network simulator, NS3 is used to implement both, BLXS and BXS in [6]. Compared with the above implementations, our implementation implements both, BLXS and BXS as well as shows better delay and throughput performances. III. C ROSSBAR S WITCH A crossbar switch is a switch, which includes M inputs and N outputs with M ×N crosspoints to connect each input port to each output port as shown in Fig. 1a. The maximum number of connections which are possible in crossbar switch

is min(M, N ). XS can be categorize based on loss vs. lossless, buffered vs. bufferless, blocking vs. non-blocking, etc. [7], [8]. In this paper, we have used crossbar switch based on buffered and bufferless category where different types of XS like output queued (OQ), input queued (IQ), combined input output queued (CIOQ), crosspoint queued (CQ), combined input crosspoint queued (CICQ), combined input crosspoint output queued (CICOQ), virtual output queued - input queued (VOQ-IQ), etc. as depicted in Fig. 1. IQ, OQ, VOQ-IQ, CIOQ, VOQ-CIOQ switches do not have buffers at cross-points and they belong to BLXS category. In BLXS, an input port transmit the packets to an output port only if the links connecting this input port and output port are free which implies that it requires a matching algorithm to make connection between separate input and output ports. In one time slot, the packets are transmitted from any input port to any output port and any other input port can not transmit to same output port. If BLXS uses single queue at the input port, then there is possibility of head of line (HoL) blocking [9]. HoL blocking occurs when a packet present in input queue, destined to a free output port is blocked by another packet present at the head of the queue. HoL blocking effects the XS by degrading the performance. To solve HoL blocking problem and achieve good performance, single queue is replaced by virtual output queued (VOQ) at the input port. VOQ is a set of queues where each queue stores the packets destined for a separate output port. Another way of solving HoL blocking problem is to speedup the packet transmission rate. XS with speedup of S can transmit up to S no. of packets from each input port and receive up to S no. of packets at each output port in each time slot [10]. Speedup also improve the performance when buffers are not used at the input port in XS e.g., OQ switch. BLXS also need scheduling algorithm at either input port or output port or both for scheduling the queued packets. BXS is a switch which have buffers at crosspoint. CICQ, CICOQ, VOQ-CICQ, VOQ-CICOQ, CQ switches belong to this category. Here, an input port transmit the packets to a crosspoint queue connected with destined output port of the packets and an output port receive the packets from a non-empty crosspoint queue, which implies that in a time slot many input port can transmit the packets destined to same output port and many output port can receive the packets from crosspoint coming from same input port. BXS doesn’t need matching algorithm like BLXS because of queue at crosspoint. VOQ and speedup can be applied in BXS. Also need scheduling algorithm for crosspoint queues and for either input port or output port or both depends upon availability of queue. Input scheduler is used to schedule the packets at the input port and sends it to the crosspoint where output scheduler schedules the packets and send to output port.

Therefore, in this paper we need to implement VOQ, crosspoint queues, different scheduling algorithms for single queue and multi queue, etc. to implement any XS depending on the need of NS2. 1

1 1

2

2 2

N

N

N

1

N

2

1

(a) XS Architecture

N

2

1

(b) OQ Switch

N

2

(c) IQ Switch

1

1 1

2

2 2

N

N

1

N

1

N

2

(d) CIOQ Switch

2

N

1

(e) CQ Switch

N

2

(f) CICQ Switch

1

1 1 2

2 2 N

N

1

2

N

1

N

(g) CICOQ Switch

2

N

1

(h) VOQ-IQ Switch

N

2

(i) VOQ-CICQ

1 1

2

2

N

N

1

N

2

(j) VOQ-CIOQ Switch

1

2

N

(k) VOQ-CICOQ Switch

Fig. 1: Types of XS

IV. D ESIGN AND I MPLEMENTATION OF C ROSSBAR S WITCH IN NS2 A. Issues in design and implementation 1) NS2 requires a network topology for simulation and that topology should be easy to simulate, work for any type of XS. 2) In NS2, single queue per link is available whereas we need either multiple links or multiple queues per link to design VOQ and crosspoint queues connected to single output port. But multiple links can not be used to implement VOQ as it makes the implementation complex. Therefore, multiple queues per link is to be implemented.

3) In XS, upto S (speedup) no. of packets can be transmitted from each link in one time slot. A module is to be designed in NS2 to solve the problem of controlling the packet transmission, as in NS2, packets are transmitted as per the link rate. B. Design and implementation processes of BXS Design and implementation processes of BXS is divided into five steps: design of topology for simulation in NS2, building queue for VOQ and crosspoint queues, addition of queue scheduling algorithms, add stop-and-wait (SnW) protocol module in link to control packet transmission, write tcl script for simulation. 1) Design of topology for simulation in NS2: We have designed a network topology for simulation of a BXS as shown in Fig. 2. The notations used in this figure are described in Table I. In Fig. 2, the topology parts inside the dashed box is CICOQ BXS. To create the topology for other types of BXS, we need to change buffers inside dashed box according to different types of BXS. To simplify the design, the number of input and output have been made equal i.e., M = N.

S1

O1

I1

S2

I2

SN

IN

X

Input Port

Crosspoint

D1

O2

D2

ON

DN

Output Port

A

Fig. 2: Network topology of NxN BXS for simulation in NS2 2) Building queue for VOQ and crosspoint queues: The object of class PacketQueue is responsible for creating the number of queues and used for packet buffering in implementation of queue in NS2. To build a queue in NS2, we need to implement a class for this queue e.g., in Fig. 3, we have shown a class diagram of NewQueueClass. Queue is a predefined base class of NewQueueClass. NewQueueClass must include the PacketQueue object, enque and deque functions. Array of PacketQueue object has been used instead of single PacketQueue object to implement VOQ. enque

Table I: Notation used in network topology of N × N BXS Si Ii X Oj Dj A QSi QIi QXj QOj λSi & dSi λIi & dIi λXj & dXj λOj & dOj λA & dA

Source node i Input node i of the XS Crosspoint node which receives packet from each input node and sends to output node Output node j of the XS Destination or sink node j Acknowledgement node, receive acknowledgement of tcp packets from the sink node and send to the source node Simple buffer to store packet coming from source node i Input buffer to store packet coming from input node i Crosspoint buffer to store packet coming from crosspoint node destined for output node j Simple buffer to store packets coming from output node j Packet transmission rate and delay between source node i to input node i Packet transmission rate and delay between input node i and crosspoint node Packet transmission rate and delay between crosspoint node and output node j Packet transmission rate and delay between output node j to sink node j Packet transmission rate and delay between sink node to acknowledgement node and acknowledgement node to source node where 1 ≤ i ≤ N and 1 ≤ j ≤ N

function is used to enqueue the incoming packet into relevant queue. deque function is used to dequeue a packet from a queue. In BXS, a crosspoint queue from each set of crosspoint queues connected with a output port transmit the packets. Since this behaviour is similar to a VOQ, NS2 uses same implementation idea to build crosspoint queues. Each set of crosspoint queues store the packets coming from all VOQs. As shown in Fig. 1k, packets from first queue of each VOQ transmitted to respective crosspoint queue connected with first output port, packets from second queue of each VOQ transmitted to respective crosspoint queue connected with second output port and so on. This routing process has been implemented with enque function of crosspoint queue class. To add a new queue in NS2, we can take the help of [11], [12]. 3) Addition of queue scheduling algorithms: In NS2, queue scheduling algorithm is implemented with deque function of queue class in queue implementation. In deque function a packet is dequeued according to the scheduling algorithm. For each different scheduling algorithm, a new queue class is added due to the differences in deque function. Also, for same scheduling algorithm two queue classes are added, one class is used for VOQ and another is used for crosspoint queue, where these two classes are different due to the differences in enque function. For example, in this paper two scheduling algorithms has

Queue

SimpleLink

snw_t_

queue_ drop_

target_

target_

tSnW_

target_

link_

target_

snwacker_

target_

ttl_

target_

drop_

head_

drophead_

target_

NewQueueClass Fig. 5: Architecture of SimpleLink object with SnW module

+Q[]: PacketQueue +----------: ---#----------: ---#enque(Packet *): viod #deque(): Packet * #----------(----) +----------(----)

Fig. 3: Class diagram of NewQueueClass been used in simulation. To implement these scheduling algorithms, four queue classes are created where two queue classes for each scheduling algorithm in which first queue class is for VOQ and second queue class is for crosspoint queue. 4) Add stop-and-wait (SnW) protocol module in link to control packet transmission: This is the solution of 3) issue described in subsection III-A. The main function of SnW is to transmit a packet from SnW transmitter and wait for an acknowledgement (ACK) from the SnW receiver before transmitting the next packet. This module will be used with all links connected to node X in Fig. 2. To implement this module in NS2, C++ and OTcl implementation is required. Fig. 4 shows the class diagram of classes used in C++ implementation of SnW module and Fig. 5 shows the architecture of SimpleLink object with SnW modules which is used in OTcl implementation of SnW module.

Handler

SnWStatus IDLE SENT ACKED DROP

Connector

SnWHandler -snw_t_: SnWT +SnWHandler(SnWT) +handle(Event *): void

SnWR #event_: Event #handler_: Handler * #pkt_: Packet * #snw_t_: SnWT * +SnWR() +recv(Packet *,Handler *): void +command(int,char *): int

SnWAcker +SnWAcker() +recv(Packet *,Handler *): void

SnWT #blocked_: int #handler_: Handler * #pkt_: Packet * #snwh_: SnWHandler #status_: SnWStatus +recv(Packet *,Handler *): void +ack(): void +resume(): void

Fig. 4: Class diagram of SnW module In C++ implementation, four classes SnWT, SnWR, SnWHandler and SnWAcker are implemented. SnWT is called SnW transmitter and SnWR is called SnW receiver,

both classes are derived from class Connector where Connector is used to connect two NsObjects. SnWHandler is derived from class handle, used as a handle for callback mechanism. SnWAcker is derived from class SnWR and responsible for transmission of ACK message. All members of four classes are described in Table II. In OTcl implementation, two instprocs are created and inserted into SimpleLink object. The first instproc is SimpleLink::snwt-link{} and configures it as shown in Fig. 5, where tSnW and snwacker are objects of SnWT and SnWAcker respectively. The second instproc is SimpleLink::snwt-link{from to} which creates the SnW link to connect node from to node to. Both instprocs are appended to the file /ns2.3X/tcl/lib/link.tcl and /ns2.3X/tcl/lib/lib.tcl respectively. 5) Write tcl script for simulation: To write the tcl script, we use the network topology as shown in Fig. 2. After creating the network topology in tcl script, use the snw-link instproc to add SnW module with respective links. Also add the code to generate the trace file for analyzing the results of simulation. Details are given in [11], [13], and [12] for write tcl script for NS2. C. Design and implementation processes of BLXS 1) Design of topology of BLXS for simulation in NS2: The similar topology is used to design this type of XS with no queues at the links between crosspoint node and output nodes i.e., size of QXj = 0 and queues either QIi or QOj or both can be used according to the types of BLXS; e.g., the topology of CIOQ switch is shown in Fig. 6. 2) Building queue for VOQ and addition of queue scheduling algorithms: Same implementation is used as was done in BXS. 3) Add stop-and-wait (SnW) protocol module in link to control packet transmission: We used the same implementation as was done in BXS. But, in BLXS, an input port can transfer the packet to an output port if the links connecting these ports are idle. In order to solve this, bipartite matching algorithm is needed to find the matching between input port and output port. An external array of size N is used in SuW module to indicate the link status (busy or idle). This array is used to implement bipartite matching algorithm in queue class of input port. Before dequeue the packets from input queue, bipartite matching algorithm is called and according to matching output packet is dequeued from PacketQueue in

Table II: Description of members of the SnW module Description of members of the SnWT class Used to block/unblock SnWT object. If it is blocked SnWT object will not transmit any packet. handler A handler of upstream queue object pkt Pointer to packet. snwh handler send to the downstream object. status SuWStatus type variable, indicates the current status (IDEL, SENT, ACKED or DROP) of SnWT object. recv Receives the packet from upstream object and send to downstream object then sets status to SENT and sets blocked to 1 to block the SnWT object ack Receive the ACK message and sets the status to ACKED. resume Sets the blocked to 0 to unblock the SnWT object then check the status value, if status value is ACKED sets the status value to IDEL and fetch the next packet from queue by invoking handler→handle(0) else if status value is DROP drop the packet stored in pkt , set status to IDEL and fetch the next packet from queue by invoking handler→handle(0). Description of members of the SnWR class handler handler of upstream queue object pkt Pointer to packet. event Object of Event class snw t Object of SnWT class. SnWR class has constructor to initialize the pointer of SnWT object and associate it with SnWT by OTcl using command function. SnWR Constructor of class SnWR and initialise the snw t to 0. recv Virtual function, implemented by the derived class SnWAcker. command Used to associate SnWR with SnWT object by OTcl command. Description of members of the SnWHandler class snw t Object of SnWT class. SnWHandler Constructor of SnWHandler class and uses reference of SnWT object. handle Call resume function of SnWT class. Description of members of the SnWAcker class SnWAcker Constructor SnWAcker class. recv Receives the packet from upstream object and send to downstream object and also send the ACK by calling ack function of SnWT.

S1

I1

S2

I2

SN

IN

O1

D1

O2

D2

ON

DN

blocked

deque function of queue class. 4) Write tcl script for simulation: Use the same procedure as used in simulation of BXS. V. S IMULATION R ESULTS AND A NALYSIS A. Simulation Setup In this simulation, TCP and UDP packet flows are transmitted where TCP is using newreno with file transfer protocol (FTP) traffic generator, UDP is using constant bit rate (CBR) traffic generator, and default settings are used for other parameters of TCP and UDP packets. We are sending

X

Input Port

Crosspoint

Output Port

A

Fig. 6: Network topology of NxN BLXS for simulation in NS2 two UDP and eight TCP newreno flows from each input port to each output port, where, UDP traffic generation rate is fifty percent of the link rate of the link between source node to input node. Droptail (FCFS scheduling algorithm) queuing mechanism is used for packet scheduling from source node to input node and from output node to sink node. For scheduling of packets from input node to crosspoint node and from crosspoint node to output node, combination of two different scheduling algorithms viz. longest queue first (LQF) [14] and round robin (RR) [15] are used. We run the simulation for 10 seconds in which all flows start at 0 second and stop at 10 seconds Other parameter values are given in Table III for network topology of Fig. 2. Table III: Parameters value of Fig. 2 N = 2, 5, 10, Simulation time = Queue Size (in No. of Packets) QSi = 200 QIi = 100 × N Q Xj = 1 QOj = 100

20, load = 0.1, 0.2, ... 1.0, and 10 sec Link Capacity Link Delay (in Mbps) (in msec) λSi = 1000 × load dSi = 10 λIi = 1000 dIi = 0 λXj = 1000 dXj = 0 λOj = 1000 × load dOj = 10 λA = 1000 × load dA = 10

B. Results and Analysis After the simulation, important data is extracted from the trace file and plotted graphs. Here, we find the throughput (ratio of total size of received packets at output ports and total size of arrived packets at input ports) in percent and delay (average delay between input ports to output ports) in µsec for BXS and BLXS. Figs. 7 and 8 shows the results for delay and throughput respectively with respect to switch size for BLXS and BXS with RR and LQF scheduler at full load. It is seen that, delay

and throughput performance of BLXS is poor as compared to BXS due to the lack of crosspoint queue in BLXS, the reason being that, a packet will wait at input queue until the connecting link between corresponding input port and output port is not free in BLXS.

LQF schedulers are performing better and has approximately similar performance. Also, the performance of both types of XS are degrading with increasing the switch size as well as traffic load. 1e+06

1e+06 100000 Delay (in µ Sec)

Delay (in µ Sec)

100000 BXS-RR/RR BXS-RR/LQF BXS-LQF/RR BXS-LQF/LQF BLXS-RR BLXS-LQF

10000 1000

BXS-RR/RR BXS-RR/LQF BXS-LQF/RR BXS-LQF/LQF BLXS-RR BLXS-LQF

10000 1000 100

100 10 0.1 10

2

4

6

8

10 12 14 Switch Size

16

18

0.2

0.3

0.4

20

Fig. 7: Delay with switch size of BXS and BLXS for different scheduling algorithms at load 1.0.

0.5 0.6 Load

0.7

0.8

0.9

1

Fig. 9: Delay with load of BXS and BLXS for different scheduling algorithms of switch size 20.

100

90

90

80

Throughput (in %)

Throughput (in %)

100

80 BXS-RR/RR BXS-RR/LQF BXS-LQF/RR BXS-LQF/LQF BLXS-RR BLXS-LQF

70 60 50

70 60 50 40

BXS-RR/RR BXS-RR/LQF BXS-LQF/RR BXS-LQF/LQF BLXS-RR BLXS-LQF

40

30

30

20 0.1

20

2

4

6

8

10 12 Switch Size

14

16

18

20

Fig. 8: Throughput with switch size of BXS and BLXS for different scheduling algorithms at load 1.0. Figs. 9 and 10 shows the results for delay and throughput respectively with respect to load for BLXS and BXS with RR and LQF scheduler for switch size 20. Here, delay and throughput performance of BLXS is poor as compared to BXS due to the lack of crosspoint queue in BLXS. It is also evident from the figures that at load 0.3 and 0.5, both delay and throughput degrade for LQF and RR scheduler respectively for BLXS. It is due to increase in traffic load as the input queue becomes full and packet drop rate increases which affects the delay and throughput. In summary, RR scheduler is better than LQF scheduler in BLXS but in BXS different combination of RR and

0.2

0.3

0.4

0.5 0.6 Load

0.7

0.8

0.9

1

Fig. 10: Throughput with load of BXS and BLXS for different scheduling algorithms of switch size 20. VI. C ONCLUSIONS AND F UTURE W ORK In this paper, we designed and implemented XS in NS2 successfully. Also, we showed the implementation procedure for one BLXS and one BXS, and compared the performance metrics (delay and throughput) of both XS. From the results, we observed that packet delay increases and throughput decreases with the increase in size of XS and traffic load. The main advantages of our design and implementation in comparison to other works is that researchers can carry out simulations for various types of XS, by changing parameters like buffer size, bandwidth, switch size, etc. In future, we are planing to develop mathematical model for BXS and BLXS to understand the behavior theoretically.

VII. R EFERENCES [1] “Nsnam,” http://nsnam.isi.edu/nsnam/index.php/Main Page. [2] Hongyun Zheng, Yongxiang Zhao, and Changjia Chen, “Design and implementation of switches in network simulator (ns2),” Proceedings of the First International Conference on Innovative Computing, Information and Control, vol. 1, pp. 721–724, 2006. [3] “SIM Simulator,” http://subversion.assembla.com/svn/ PBC SIM/. [4] D. Banovic and I. Radusinov, “Voq simulator -software tool for performance analysis of voq switches,” Telecommunications, 2006. AICT-ICIW ’06. International Conference on Internet and Web Applications and Services/Advanced International Conference on, pp. 71–71, Feb 2006. [5] M.J. Karol, M.G. Hluchyj, and S.P. Morgan, “Input versus output queueing on a space-division packet switch,” Communications, IEEE Transactions on, vol. 35, no. 12, pp. 1347–1356, 1987. [6] Xia Yu, Zeng Huaxin, and Shen Zhijun, “Design and implementation of switch module for ns-3,” Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools, pp. 3:1–3:10, 2009. [7] M.G. Hluchyj and M.J. Karol, “Queueing in highperformance packet switching,” Selected Areas in Communications, IEEE Journal on, vol. 6, no. 9, pp.

1587–1597, Dec 1988. [8] F.A. Tobagi, “Fast packet switch architectures for broadband integrated services digital networks,” Proceedings of the IEEE, vol. 78, no. 1, pp. 133–167, Jan 1990. [9] Jingshown Wu, Hsien-Po Shiang, Kun-Tso Chen, and Hen-Wai Tsao, “Delay and throughput analysis of the high speed variable length self-routing packet switch,” IEEE workshop on High Performance Switching and Routing, pp. 314–318, 2002. [10] Shang-Tse Chuang, A. Goel, N. McKeown, and B. Prabhakar, “Matching output queueing with a combined input/output-queued switch,” Selected Areas in Communications, IEEE Journal on, vol. 17, no. 6, pp. 1030–1039, Jun 1999. [11] Jae Chung and Mark Claypool, “Ns by example,” http: //nile.wpi.edu/NS/. [12] Teerawat Issariyakul and Ekram Hossain, Introduction to Network Simulator NS2, Springer US, 2009. [13] “The network simulator - ns-2,” http://www.isi.edu/ nsnam/ns/. [14] H. Shi, H. Sethu, Hongyuan Shi, and Harish Sethu, “An evaluation of the longest-queue-first scheduler for routers in a differentiated services domain,” 2001. [15] E.S. Shin, V.J. Mooney, and G.F. Riley, “Round-robin arbiter design and generation,” System Synthesis, 2002. 15th International Symposium on, pp. 243–248, Oct 2002.