Performance Evaluation of PVM on PC-LAN Distributed ... - CiteSeerX

0 downloads 0 Views 60KB Size Report
Queen Mary & West field College, University of London , E1 4NS. E-mail: ... The communication in the LAN is not taking a point to point strategy, like the one in ...
Performance Evaluation of PVM on PC-LAN Distributed Computing∗ Guo Qingping 1 Yakup Paker 2 Dennis Parkinson 2 Xiao JinSheng 1 1

Dept. of Computer Science and Engineering Wuhan Transportation University, Wuhan P. R. China 430063 2 Dept of Computer Science Queen Mary & West field College, University of London , E1 4NS E-mail: [email protected], [email protected] Abstract. This paper evaluates network distributed computing performance in PVM environment. The paper points out that essential difference between network computing and MPP computing in PVM environment is their communication behaviours, the former is sequential and the latter concurrent. An uniform formula for system performance evaluation has been derived to cover those two major parallel processing systems. Using the formula this paper derives that for distributed network computing the speed up obeys the AmdahlÆs Law, and for MPP the GustafsonÆs modification has been verified. Furthermore some criterions of speed up, efficiency and granularity for PVM network computing have been deduced and proved by measured results. Keywords. PVM, Performance Analysis, Network Computing

1. Introduction At the middle of 80Æs, Professor C.A. Hoare suggested a Communicating Sequential Processing concept [1] to handle concurrency and parallel processing. Based on this concept the Britain has designed an OCCAM language, designed and manufactured Transputer chip [2] for building memory distributed multi-transputer system. The kernel idea of the CSP concept is using message passing method for parallel processing. In parallel processing researches, several paradigms have been tried in resent decades, including shared memory, parallel compilers and message passing. The message passing model has became the paradigm of choice because the many and variety of multiprocessing systems support it, as well as in terms of software system, languages and applications that use it. Nowadays parallel processing system has two major developments: massively parallel processors (MPP) system and widespread use of network computing. However a common between distributed network computing and MPP is the concept of message passing. In message passing paradigm there are two aspects and their relation playing important roles: computation and communication [3]. This paper analyses relations ∗

Research supported by the UK Royal Society joint project (the Royal Society Q724) and the Natural Science Foundation of China (NSFC Grant No. 69773021).

between computation and communication in PVM environment on network computing. Corresponding concepts such as speed up, efficiency, granularity and their behaviours in PVM network computing are also addressed. In section 2 features of PVM network computing are analysed and described by a formula. Section 3 compares analytical results with measured results for several double-size increased calculated space nodes. Section 4 discusses speed up, efficiency and granularity. Difference of network computing and multiprocessor computing has been explored using the uniform performance formula. Section 5 summarises behaviours of PVM network computing. and points out pros and cons of PVM in network computing. Finally, section 6 gives conclusions.

2. Features of PVM on Network Computing PVM (parallel virtual machine) is an integrated set of software tools and libraries that emulates a general purpose, flexible, heterogeneous concurrent computing framework on interconnected computers of varied architecture [4]. Let us concentrate discussion on popular Ethernet LAN situation. In this environment communication between every pair of networked computers can be taken place if and only if there is non others to use the Ethernet. The communication in the LAN is not taking a point to point strategy, like the one in memory distributed multiprocessor system (e.g. multiTransputer system), but an bus competition-based technology. In fact all the concurrent communications between corresponding computers has a sequential nature which are taken place not simultaneously but sequentially. Suppose the LAN is a homogeneous one for simplicity, an execution time of application can be represented like this: T =

A A + C 2N + C3 + ε N N

(1)

Where T is execution time, A is a total calculation amount, N is number of PCs involved, so A/N represents a concurrent amount executed by each PC. C 2N describes the sequential nature of communication, C3 is a constant initial latency, and εA/N represents a variable latency related to the concurrent amount. Obviously the formula (1) can be rewritten as

T =

C1 + C 2N + C N

3

(2)

3. Network Computing Performance Measurement 3.1. Network Structure and Measuring Method The network we used is a local area network with30 PCs connected by an Ethernet in a big lab. Each PC has a Pentium 166 processor and 64Mbytes Edo RAM. The PVM version used is PVM3.3. In order to reduce interference from other users, we remote login to one of the 30 PCs in the middle night, then taking that PC as a host (master)

which spawns number of slave processes running on corresponding slave PC in the same LAN. Obviously the best way of measurement is to isolate the LAN with outside network, severing any random traffic impacts from other networks. 3.2 Algorithm Characters As algorithm design concerned there are several paradigms such as master-slave model (e.g. the processor farm), tree model, data decomposition and function decomposition [4]. In the performance measurement we chose our recently developed a PVM version of Modified Tridiagonal Matrix algorithm, which uses an implicit method to solve cyclical temperature in ceramic/metal composites [5]. This algorithm adopts a master-slave paradigm, using data decomposition methodology. The algorithm needs to divide transient time as several time steps and space distance in a cylinder as several space segments. Because the implicit methodÆs nature we can chose big time interval and fix the number of time steps while changing the number of space segments, from the largest one to the smallest one. In fact the chosen space segments number series in the measurement is from 90720, the largest one to 5670 , the smallest one, each time half smaller than the previous one. For each number of space segments running the program on the LAN, from one slave PC to the at most 24 PCs and measuring the execution time, we get a set of measured results, plotted in figure1. Using corresponding math approaches, e.g. the least square method, we determined the coefficients C1, C2 and C3 in equation (2), and plot a set of theoretical execution time curves as shown in figure2. Figure3 plots the measured results and the theoretical results in one figure. It explicitly shows the formula (2) approaches the PVM environment behaviour in network computing.

Time (sec)

PCs

Figure1 Measured Execution Time

Time (sec)

PCs

Figure2 Theoretical Execution Time

Time (sec)

PCs

Figure3 Comparison of Measured and Theoretical Execution Time

4. Speed up, Efficiency and Granularity 4.1. Turning Point of Execution Time or Speed up Curves Comparing program execution time on several slave PCs with one slave PCÆs we have got speed up. It is explicit from figure1 and 2 that the execution time of each case all

has a turning point. In terms of extremity value determination method, imagining the N is continuously changeable in equation (2) we can easy determine the turning point of execution time, as well as the speed up. In fact they are the same as:

N max =

C1 C2

(3)

Where C1 mainly represents the computation amount and C2 represents the sequential communication fact. 4.2 Difference between Network Computing and Multiprocessor Computing Most advanced multiprocessor systems have employed a communication engine for message passing, therefore the sequential feature of communication in network computing now disappears and has been replaced by a separated concurrent communication mechanism. In this situation the formula (2) is deduced into

T=

C1 + C3 N

(4)

Which has non extreme point. So the speed up can be written as

S=

C1 + C 3 T1 = TN C1 / N + C 3

Where C1/N represents parallel portion of parallel algorithm, and C 3 represents non parallel portion of the algorithm. Supports the whole execution time as 1 for algebraic simplicity we can denote

C1 + C3 = 1 N

(5)

Then the speed up S can be represents as

S = C1 + C 3 = N (1 − C 3 )

(6)

This means speed up is proportional to the number of processors involved. It is exactly the same result of the GustafsonÆs modification of AmdahlÆs Law for MPP machineÆs speed up[6]. From this point of view we can say the GustafsonÆs modification of AmdahlÆs law for speed up is only suitable for non sequential communication situation. In that case, as problem scale increased the speed up is linear to the number of processors. In network computing environment, however,

precondition of the GustafsonÆs modification has been vanished, and the speed up has a turning point, as shown in figure4, which can be determined by formula (3). Speedup

PCs

Figure4 Speedup with Turning Point 4.3. Efficiencyùa Magic Number of 0.50 A definition of processor efficiency is

E =S/N

(7)

Where S represents speed up. According to formula (2) the E can be written as

E=

C1 + C2 + C3 C1 + C2 N 2 + C3 N

(8)

Replacing the N as a value of turning point, that is the equation (3), then equation (8) becomes Eturning =

C1 + C 2 + C 3 2C1 + C 3

C1 C2

(9)

As a common knowledge we have that: C1 >> C 2 ,C 3 And because C1 >>

C3 C2

It can be yield that

C1 >> C 3

C1 C2

Therefore from equation (9) we can get E turning ≈ 0.50

Figure (5) plots a set of efficiency curves. Comparing it with Figure (4) it is explicit shown that the speed up turning points are corresponding to the 0.50 efficiency. Efficiency

PCs

Figure5 Efficiency

There comes an important argument: if efficiency is reduced to 0.50, involving more computers in network computing makes no sense, because in this point more computers involved the performance will be reduced more, no further any gain. 4.4. Maximum Speedup maximum speed up is achieved at the turning point. Therefore

Smax = C1

=

C1 + C2 + C3 C2 C1 + C2 + C3 C1 C2 C1 + C 2 + C 3 2 C1C 2 + C 3

Consider C1 >>C2 and C 3, Smax can be written as

Smax ≈

1 C1 1 = N max 2 C2 2

(10)

4.5. Granularity It is clear that for PVM environments based on networks, large granularity generally leads to better performance. The lowest limit of granularity is Gmin ≈

C1 = C1C 2 N max

(11)

Therefore quality poor network (C 2 is larger) needs bigger granularity; and big computation task (C1 is larger) brings higher granularity. As mentioned before the C1 mainly represents a calculation amount of application and C2 represents network communication nature, which is independent to applications. From formulas of Nmax and Gmin, that is equation (3) and (11), the following relations can be easy derived: C1 = Gmin ⋅ N max

(12)

Gmin N max

(13)

C2 =

Therefore the network communication feature, i.e. the C 2, can be determined from application experience. If computation amount of other applications, i.e. the C1, can be predicted, then the Gmin and Nmax as well as behaviours of those applications can be predetermined. 5. Cons and Pros of PVM PVM is a software system that permits a heterogeneous collection of computers networked together to be viewed by a userÆs program as a single virtual parallel computer. Using PVM on a network environment the sequential feature of communication limits system speed up and efficiency. Such a system is only suitable for big granularity computing. However, PVM is easy to use, has good portability from network computing to multiprocessor computing. In fact it is cost efficient: a local network can be used as two things, one is a normal network for information sharing and exchanging; other one the parallel computing, which especially useful for small or middle enterprises and institution/universities. Its portability makes it an useful R&D as well as education tool in parallel computing. Nevertheless, if using PVM on network environment for industrial real time applications, it should bear in mind that it must be a devoted network for the applications, severing from any other networkÆs communication, otherwise the random traffic impact from outside networks can bring a real disaster.

6. Conclusions From above analysis we have achieved following arguments: (1) The nature of PVM environment based on network is parallel calculation plus sequential communication, which is different to modern multiprocessor system (e.g. MPP) with separated communication engine. (2) Speed up of network computing obeys adamhlÆs law, but the MPPÆs can be described with GustafsonÆs modification. The difference of them is derived from their different communication features. (3) In network computing PVM environment, the maximum speed up Smax is equal to the half of a optimum number of computers, that is the Nmax,. The maximum number of computers involved (Nmax), and minimum amount of granularity (Gmin) for an application are simply determined by formulas (3) and (11). (4) The application feature parameter C 1 and network communication feature parameter C2 have a very simple relations with Nmax and Gmin,, which are described by formulas (12) and (13). All those amusing results have been verified by measured results.

References 1. C.A.R. Hoare, Communicating Sequential Processes, Prentice Hall International Series in Computer Science, 1985. ISBN 0-13-153271-5 (0-13-153289-8 PBK) 2. David May and Mark Homewood, Compiling Occam into Silicon, Proceedings of the 20th Annual Conference on Microprogramming, IEEE, 1987. 3. Guo Qingping and Yakup Paker 1992 4. Al Geist et al. PVM: Parallel Virtual Machine A UsersÆGuide and Tutorial for Networked Parallel Computing, The MIT Press, 1994 5. Guo Qingping, Dennis Parkinson, Xiao Jinsheng, Yakup Paker, Parallel Computing Using Domain Decomposition for Cyclical Temperatures in Ceramic/Metal Composites, to be published, 1998. 6. John L. Gustafson, Reevaluating AmdahlÆs Law, Chapter for Supercomputers and Artificial Intelligence, Edited by Kai Hwang, 1988.

Book,

Suggest Documents