I/O Performance of X-Y Routing in 2-D Meshes under various Node-to-Disk Assignments S.R. Subramanya Rahul Simha Bhagirath Narahari Dept. of Computer Science Dept. of Computer Science Dept. of EE and CS University of Missouri{Rolla College of William and Mary George Washington University Rolla, MO 65409 Williamsburg, VA 23185 Washington, DC 20052
[email protected]
[email protected]
Abstract
High{performance parallel computers with 2-D mesh topologies are beginning to be used as media-on-demand (MOD) servers. Multimedia data such as audio and video are stored on disks and the server needs to serve as many clients as possible, while satisfying the data requests of the clients in a timely fashion. Dierent patterns of mesh nodes-to-disk assignments emerge over time as the requests from clients are served, which aect the I/O performance. This paper studies the performance of I/O transfers in a 2-D mesh using X-Y routing under various node-to-disk assignments. The transfers are done using wormhole routing. The data transfers under various node-to-disk assignments are simulated and the results are presented.
1 Introduction A Media-on-demand (MOD) server contains repositories of multimedia data, such as video and audio stored on disks. Requests for media data arrive from clients to a server, which should satisfy the requests of as many clients as possible in a timely fashion. MOD servers need to satisfy several demands such as high storage capacity, sustain a maximum number of streams, deliver minimum response time, meet quality-of-service requirements, provide reliability and availability, among others. Although highperformance systems with multiple processors (nodes) connected by a high-speed interconnection network systems have been used in scienti c computing for several years, their use as MOD servers is just beginning, where fast data retrievals and real-time guarantees are more crucial [1, 3, 4]. As requests for data arrive at a media-on-demand system, a node in the server is assigned to handle the request. The node gets the required data from one of the disks and transfers it to the client. Thus, there are requests-for-data from the nodes to disks and actual data transfers from the disks to the nodes in the mesh. The data from the disks to the nodes need to
[email protected]
be transferred adhering to certain deadlines which are required to meet a given quality of service (QoS). In this paper, we consider a high-performance parallel system with nodes connected by a 2-D mesh (exempli ed by Intel Touchstone Delta, Paragon, and a few others), being used as the server. The data transfers from the disks to nodes are assumed to be done using wormhole routing [6]. Wormhole routing is a switching scheme proposed in [5] which eectively reduces network latency and buer requirements and is being used in most of the current high-performance parallel systems. In the simulations, only the data transfers (reads) from the disks to the nodes are considered, since the requests for data from nodes to disks are of negligible size compared to the data size, and also with duplex channels, messages traversing a link in opposite directions will not con ict with each other. We assume disks connected to one of the sides of the mesh, although the results are easily extended to the case with disks at two opposite sides. In the remainder of the paper, we use the terms mesh and server interchangeably. The next section brie y describes the various node-to-disk assignment schemes. Section 3 presents the simulation results, followed by conclusions.
2 Node-to-Disk Assignment Schemes When a request for media data from the client arrives at the MOD server, a node is assigned to handle the request. This usually involves: (1) locating the disk(s) on which the desired data is resident, (2) requesting and receiving the desired data from the disk(s), (3) buering the data, and (4) transferring the data to the client. If node j requires the data resident on disk k, then a request for the data is sent to disk k and then actual transfer of data takes place. We say node j is assigned to disk k for the entire duration until the data transfer is complete. All the data, how-
to disk. Three dierent kinds of distributions of the number of assigned nodes to disks have been used. Scheme 1 uses a Zipf distribution, scheme 2 uses a mid-weighted distribution, and scheme 3 uses a reverse mid-weighted distribution. The distributions in the three non-uniform assignments and in the scheme with equal number of nodes per disk are shown in Figure 1.
Non-uniform assignment In this scheme, the number of nodes assigned to a disk varies from disk
No. of nodes
Balanced load on disks The basic scheme in the
Disks
Disks
Non-Uniform distribution (Scheme 3) No. of nodes
Non-Uniform distribution (Scheme 2) No. of nodes
disk load balancing heuristic is to transfer excess weights from those rows whose weights are more than the average load per disk, to those rows whose weights are less than the average. The disk-load balancing is a two-step process: (1) in the row with excess weight, selecting the node(s) whose weight(s) is (are) to be shifted; (2) selecting the row where the above chosen weight is to be transferred. The details of four schemes for disk-load balancing are described in [8]. In this paper, one of those schemes, the zigzag scan scheme, is used as a representative of the schemes assigning nodes to disks which ensure uniform load on disks. Starting at a corner of the mesh, a zigzag like scan is made through the entire mesh. During the scan, successive nodes are assigned to a disk until the sum of the packet sizes of the nodes exceeds the average. When this happens, assignment proceeds with the next disk. Random assignment Random numbers are uniformly generated in the interval [0; m ? 1], where m is the number of disks. Successive numbers are assigned to nodes of the mesh (either randomly or in some order). A number k assigned to a node implies that disk k is the assigned disk with which the node does I/O. Equal number of nodes per disk In this scheme, an equal number of nodes are are assigned to each disk. In a mesh with n n nodes and m disks, each disk will have dn2 =me nodes assigned to it. The particular nodes assigned to each disk could either be random or selected in some order.
Non-Uniform distribution (Scheme 1)
Equal number of nodes per disk No. of nodes
ever, may not be transferred continuously. Due to the inherent nature of wormhole routing, data may intermittently get blocked at intermediate nodes. Buering is usually needed to ensure a smooth ow of data to the client in spite of possible intermittent data delays from the disks to the nodes. As new requests arrive, free nodes are assigned to handle the requests. When a request has been completely serviced, the node handling that request would become free. Thus over a period of time, new patterns of nodes-to-disk assignments emerge. In our study, various patterns of disk-to-node assignments re ecting various applications scenarios were used. A few of them are brie y described below.
Disks
Disks
Figure 1: Various node-to-disk assignments.
3 Simulation Results I/O transfers are done using X{Y routing under wormhole routing scheme. Mesh sizes of 8 8, 16 16, and 32 32, with varying number of disks, and the various node-to-disk assignment schemes shown in Figure 1 were considered. The simulations were implemented in C and run on a Sparcstation. The overall I/O completion time, referred to as the schedule time is used as the performance metric. This is a better metric compared to others such as link congestion, average packet delays between nodes, and average packet delays between source and destination, since it accurately re ects the availability of the requested data for the application (user) in a given time. The results are shown in Fig. 2. The results represent the averages over 100 runs for each case. (Since there was no remarkable deviation in the average values of the results (schedule time) for runs of 100, 250, and 500 for a few randomly chosen cases of mesh sizes, number of disks, and node-disk assignments, 100 runs were subsequently used for all the cases). Requests for data are generated at the nodes using uniform random distribution.
8 X 8 MESH, X−Y ROUTING UNDER VARIOUS NODE−TO−DISK ASSIGNMENTS 10000
Eq.Num. Uniform Load Non−Uni. (Sch.1) Non−Uni. (Sch.2) Non−Uni. (Sch.2) Random
3000
4
Eq.Num. Uniform Load Non−Uni. (Sch.1) Non−Uni. (Sch.2) Non−Uni. (Sch.2) Random
9000
8000
x 10
32 X 32 MESH, X−Y ROUTING UNDER VARIOUS NODE−TO−DISK ASSIGNMENTS
Eq.Num. Uniform Load Non−Uni. (Sch.1) Non−Uni. (Sch.2) Non−Uni. (Sch.2) Random
3.5
3
2000
SCHEDULE TIME
7000 SCHEDULE TIME
2500 SCHEDULE TIME
4
16 X 16 MESH, X−Y ROUTING UNDER VARIOUS NODE−TO−DISK ASSIGNMENTS
3500
6000
5000
2.5
2
1.5 1500
4000 1 3000
1000 0.5
2000
500
1
2
4 NUMBER OF DISKS
6
8
1000
2
4
8
16
0
2
NUMBER OF DISKS
4
8 NUMBER OF DISKS
16
32
Figure 2: Performance of various node-to-disk assignments. It is observed that for mesh sizes of 8 8, the performance of the schemes with equal number of nodes per disk and uniform load on disks provide the best performance, followed by the random assignment scheme. For the 1616 mesh, the schemes with equal number of nodes per disk performs best, and the random assignment matches its performance with higher number of disks. For the 32 32 mesh, the scheme with uniform load on disks provides the best performance. The nonuniform distribution (scheme 1) has the worst performance in all the cases. Generally, the skewness in the distribution of nodes to disks results in the data transfer paths to be concentrated in a few links, leading to uneven congestion of the links, resulting in higher schedule times.
4 Conclusions High-performance paralel computers with 2-D mesh topology are beginning to be used in Media-OnDemand (MOD) applications. The nodes of the mesh act as servers by getting the requested data from the disks, and transferring them onto the clients. The mappings of mesh nodes to disks change as new requests arrive, and service for old requests have nished. This paper studied, using simulation, the I/O performance of X-Y routing in 2-D meshes under wormhole routing, for dierent mesh sizes and dierent number of disks, under various mappings of nodes to disks. The performance metric was the overall diskto-node I/O completion time. In general, the assignments with equal number of nodes per disk, and with uniform load on the disks, perform better than the other skewed node-disk assignments.
For larger meshes the assignment with uniform load on the disks performs better. The results could be used to develop a strategy for maintaining the node-to-disk allocation which provides the best I/O performance.
References [1] Jadav, D. and Choudhary, A. \Designing and Implementing High{Performance Media-on-Demand Servers", IEEE Parallel and Distributed Technology, Summer 1995. [2] Jadav, D., et. al. \Design and Evaluation of Data Access Strategies in a High Performance Multimedia-onDemand Server", International Conf. on Multimedia Computing and Systems, May 1995, pp 286{291. [3] Rangan, P.V., Vin, H., and Ramanathan, S. \Designing an On-demand Multimedia Service", IEEE Communications, Vol. 30, No. 7, July 1992. [4] Gemmell, D.J., et. al. \Multimedia Storage Servers: A Tutorial", IEEE Computer, May 1995, pp 40{51. [5] Dally, W.J. and Seitz, C.L. \The Torus Routing Chip", Journal of Distributed Computing, vol.1, no.3, 1986, pp187{196. [6] Ni, L.M. and McKinley, P.K. \A Survey of Wormhole Routing Techniques in Direct Networks", IEEE Computer, Vol.26, no.2, February 1993, pp 62{76. [7] Narahari, B. et. al. \Routing and Scheduling I/O Transfers on Wormhole-Routed Mesh Networks", Int'l Conf. on High Performance Computing, New Delhi, Dec. 1995. [8] Subramanya, S.R. et. al. \Schemes for Balancing Disk Loads in a 2-D Mesh", (unpublished manuscript).