I/O Performance of XY Routing in 2-D Meshes under ... - CiteSeerX

I/O Performance of X-Y Routing in 2-D Meshes under various Node-to-Disk Assignments S.R. Subramanya Rahul Simha Bhagirath Narahari Dept. of Computer Science Dept. of Computer Science Dept. of EE and CS University of Missouri{Rolla College of William and Mary George Washington University Rolla, MO 65409 Williamsburg, VA 23185 Washington, DC 20052 [email protected]

[email protected]

Abstract

High{performance parallel computers with 2-D mesh topologies are beginning to be used as media-on-demand (MOD) servers. Multimedia data such as audio and video are stored on disks and the server needs to serve as many clients as possible, while satisfying the data requests of the clients in a timely fashion. Dierent patterns of mesh nodes-to-disk assignments emerge over time as the requests from clients are served, which aect the I/O performance. This paper studies the performance of I/O transfers in a 2-D mesh using X-Y routing under various node-to-disk assignments. The transfers are done using wormhole routing. The data transfers under various node-to-disk assignments are simulated and the results are presented.

1 Introduction A Media-on-demand (MOD) server contains repositories of multimedia data, such as video and audio stored on disks. Requests for media data arrive from clients to a server, which should satisfy the requests of as many clients as possible in a timely fashion. MOD servers need to satisfy several demands such as high storage capacity, sustain a maximum number of streams, deliver minimum response time, meet quality-of-service requirements, provide reliability and availability, among others. Although highperformance systems with multiple processors (nodes) connected by a high-speed interconnection network systems have been used in scienti c computing for several years, their use as MOD servers is just beginning, where fast data retrievals and real-time guarantees are more crucial [1, 3, 4]. As requests for data arrive at a media-on-demand system, a node in the server is assigned to handle the request. The node gets the required data from one of the disks and transfers it to the client. Thus, there are requests-for-data from the nodes to disks and actual data transfers from the disks to the nodes in the mesh. The data from the disks to the nodes need to

[email protected]

be transferred adhering to certain deadlines which are required to meet a given quality of service (QoS). In this paper, we consider a high-performance parallel system with nodes connected by a 2-D mesh (exempli ed by Intel Touchstone Delta, Paragon, and a few others), being used as the server. The data transfers from the disks to nodes are assumed to be done using wormhole routing [6]. Wormhole routing is a switching scheme proposed in [5] which eectively reduces network latency and buer requirements and is being used in most of the current high-performance parallel systems. In the simulations, only the data transfers (reads) from the disks to the nodes are considered, since the requests for data from nodes to disks are of negligible size compared to the data size, and also with duplex channels, messages traversing a link in opposite directions will not con ict with each other. We assume disks connected to one of the sides of the mesh, although the results are easily extended to the case with disks at two opposite sides. In the remainder of the paper, we use the terms mesh and server interchangeably. The next section brie y describes the various node-to-disk assignment schemes. Section 3 presents the simulation results, followed by conclusions.

2 Node-to-Disk Assignment Schemes When a request for media data from the client arrives at the MOD server, a node is assigned to handle the request. This usually involves: (1) locating the disk(s) on which the desired data is resident, (2) requesting and receiving the desired data from the disk(s), (3) buering the data, and (4) transferring the data to the client. If node j requires the data resident on disk k, then a request for the data is sent to disk k and then actual transfer of data takes place. We say node j is assigned to disk k for the entire duration until the data transfer is complete. All the data, how-

to disk. Three dierent kinds of distributions of the number of assigned nodes to disks have been used. Scheme 1 uses a Zipf distribution, scheme 2 uses a mid-weighted distribution, and scheme 3 uses a reverse mid-weighted distribution. The distributions in the three non-uniform assignments and in the scheme with equal number of nodes per disk are shown in Figure 1.

Non-uniform assignment In this scheme, the number of nodes assigned to a disk varies from disk

No. of nodes

Balanced load on disks The basic scheme in the

Disks

Disks

Non-Uniform distribution (Scheme 3) No. of nodes

Non-Uniform distribution (Scheme 2) No. of nodes

disk load balancing heuristic is to transfer excess weights from those rows whose weights are more than the average load per disk, to those rows whose weights are less than the average. The disk-load balancing is a two-step process: (1) in the row with excess weight, selecting the node(s) whose weight(s) is (are) to be shifted; (2) selecting the row where the above chosen weight is to be transferred. The details of four schemes for disk-load balancing are described in [8]. In this paper, one of those schemes, the zigzag scan scheme, is used as a representative of the schemes assigning nodes to disks which ensure uniform load on disks. Starting at a corner of the mesh, a zigzag like scan is made through the entire mesh. During the scan, successive nodes are assigned to a disk until the sum of the packet sizes of the nodes exceeds the average. When this happens, assignment proceeds with the next disk. Random assignment Random numbers are uniformly generated in the interval [0; m ? 1], where m is the number of disks. Successive numbers are assigned to nodes of the mesh (either randomly or in some order). A number k assigned to a node implies that disk k is the assigned disk with which the node does I/O. Equal number of nodes per disk In this scheme, an equal number of nodes are are assigned to each disk. In a mesh with n n nodes and m disks, each disk will have dn2 =me nodes assigned to it. The particular nodes assigned to each disk could either be random or selected in some order.

Non-Uniform distribution (Scheme 1)

Equal number of nodes per disk No. of nodes

ever, may not be transferred continuously. Due to the inherent nature of wormhole routing, data may intermittently get blocked at intermediate nodes. Buering is usually needed to ensure a smooth ow of data to the client in spite of possible intermittent data delays from the disks to the nodes. As new requests arrive, free nodes are assigned to handle the requests. When a request has been completely serviced, the node handling that request would become free. Thus over a period of time, new patterns of nodes-to-disk assignments emerge. In our study, various patterns of disk-to-node assignments re ecting various applications scenarios were used. A few of them are brie y described below.

Disks

Disks

Figure 1: Various node-to-disk assignments.

3 Simulation Results I/O transfers are done using X{Y routing under wormhole routing scheme. Mesh sizes of 8 8, 16 16, and 32 32, with varying number of disks, and the various node-to-disk assignment schemes shown in Figure 1 were considered. The simulations were implemented in C and run on a Sparcstation. The overall I/O completion time, referred to as the schedule time is used as the performance metric. This is a better metric compared to others such as link congestion, average packet delays between nodes, and average packet delays between source and destination, since it accurately re ects the availability of the requested data for the application (user) in a given time. The results are shown in Fig. 2. The results represent the averages over 100 runs for each case. (Since there was no remarkable deviation in the average values of the results (schedule time) for runs of 100, 250, and 500 for a few randomly chosen cases of mesh sizes, number of disks, and node-disk assignments, 100 runs were subsequently used for all the cases). Requests for data are generated at the nodes using uniform random distribution.

8 X 8 MESH, X−Y ROUTING UNDER VARIOUS NODE−TO−DISK ASSIGNMENTS 10000

Eq.Num. Uniform Load Non−Uni. (Sch.1) Non−Uni. (Sch.2) Non−Uni. (Sch.2) Random

3000

4


9000

8000

x 10

32 X 32 MESH, X−Y ROUTING UNDER VARIOUS NODE−TO−DISK ASSIGNMENTS


3.5

3

2000

SCHEDULE TIME

7000 SCHEDULE TIME

2500 SCHEDULE TIME

4

16 X 16 MESH, X−Y ROUTING UNDER VARIOUS NODE−TO−DISK ASSIGNMENTS

3500

6000

5000

2.5

2

1.5 1500

4000 1 3000

1000 0.5

2000

500

1

2

4 NUMBER OF DISKS

6

8

1000

2

4

8

16

0

2

NUMBER OF DISKS

4

8 NUMBER OF DISKS

16

32

Figure 2: Performance of various node-to-disk assignments. It is observed that for mesh sizes of 8 8, the performance of the schemes with equal number of nodes per disk and uniform load on disks provide the best performance, followed by the random assignment scheme. For the 1616 mesh, the schemes with equal number of nodes per disk performs best, and the random assignment matches its performance with higher number of disks. For the 32 32 mesh, the scheme with uniform load on disks provides the best performance. The nonuniform distribution (scheme 1) has the worst performance in all the cases. Generally, the skewness in the distribution of nodes to disks results in the data transfer paths to be concentrated in a few links, leading to uneven congestion of the links, resulting in higher schedule times.

4 Conclusions High-performance paralel computers with 2-D mesh topology are beginning to be used in Media-OnDemand (MOD) applications. The nodes of the mesh act as servers by getting the requested data from the disks, and transferring them onto the clients. The mappings of mesh nodes to disks change as new requests arrive, and service for old requests have nished. This paper studied, using simulation, the I/O performance of X-Y routing in 2-D meshes under wormhole routing, for dierent mesh sizes and dierent number of disks, under various mappings of nodes to disks. The performance metric was the overall diskto-node I/O completion time. In general, the assignments with equal number of nodes per disk, and with uniform load on the disks, perform better than the other skewed node-disk assignments.

For larger meshes the assignment with uniform load on the disks performs better. The results could be used to develop a strategy for maintaining the node-to-disk allocation which provides the best I/O performance.

References [1] Jadav, D. and Choudhary, A. \Designing and Implementing High{Performance Media-on-Demand Servers", IEEE Parallel and Distributed Technology, Summer 1995. [2] Jadav, D., et. al. \Design and Evaluation of Data Access Strategies in a High Performance Multimedia-onDemand Server", International Conf. on Multimedia Computing and Systems, May 1995, pp 286{291. [3] Rangan, P.V., Vin, H., and Ramanathan, S. \Designing an On-demand Multimedia Service", IEEE Communications, Vol. 30, No. 7, July 1992. [4] Gemmell, D.J., et. al. \Multimedia Storage Servers: A Tutorial", IEEE Computer, May 1995, pp 40{51. [5] Dally, W.J. and Seitz, C.L. \The Torus Routing Chip", Journal of Distributed Computing, vol.1, no.3, 1986, pp187{196. [6] Ni, L.M. and McKinley, P.K. \A Survey of Wormhole Routing Techniques in Direct Networks", IEEE Computer, Vol.26, no.2, February 1993, pp 62{76. [7] Narahari, B. et. al. \Routing and Scheduling I/O Transfers on Wormhole-Routed Mesh Networks", Int'l Conf. on High Performance Computing, New Delhi, Dec. 1995. [8] Subramanya, S.R. et. al. \Schemes for Balancing Disk Loads in a 2-D Mesh", (unpublished manuscript).

I/O Performance of XY Routing in 2-D Meshes under ... - CiteSeerX

I/O Performance of XY Routing in 2-D Meshes under ... - CiteSeerX

Suggest Documents

O Performance of XY Routing in 2-D Meshes

Performance Of OLSR Routing Protocol Under Different ... - CiteSeerX

The 2D J1â J2 XY and XY-Ising Models

2D Parametrization of 3D Meshes

Performance Evaluation of 2D Object Recognition ... - CiteSeerX

Hierarchical Routing Architectures in Clustered 2D-Mesh ... - CiteSeerX

Long Wavelength Anomalous Diffusion Mode in the 2D XY Dipole

Testing Biquad Filters under Parametric Shifts using XY ... - CiteSeerX

Performance Improvement of Dynamic Source Routing ... - CiteSeerX

Performance Evaluation of Routing Protocols for ... - CiteSeerX

Optimal oblivious routing under statistical uncertainty - CiteSeerX

Performance Analysis of Wireless Routing Protocols under AMR ...

Performance comparison of BER-based routing protocols under

Performance of an Adaptive Routing Overlay under Dynamic Link ...

IO Performance Prediction in Consolidated Virtualized Environments

Textured Depth Meshes - CiteSeerX

Regular Meshes - CiteSeerX

Performance Evaluation of Multicast Routing Protocols in ... - CiteSeerX

performance evaluation of routing protocols in wireless ... - CiteSeerX

performance evaluation of routing protocols in wireless ... - CiteSeerX

Performance Evaluation of Routing Protocols in MANETs ... - CiteSeerX

Performance Analysis of AODV & DSR Routing Protocol in ... - CiteSeerX

Simulation of key performance characteristics under ... - CiteSeerX

Engine Upgrade Performance IO-470 to IO-550

I/O Performance of XY Routing in 2-D Meshes under ... - CiteSeerX