PAVIS: A PARALLEL VIRTUAL ENVIRONMENT FOR SOLVING

PAVIS: A PARALLEL VIRTUAL ENVIRONMENT FOR SOLVING LARGE MATHEMATICAL PROBLEMS DANA PETCU AND DANA GHEORGHIU Western University of Timi¸soara, Computer Science Department, B-dul V.Pˆ arvan 4, 1900 Timi¸soara, Romania, E-mail: [email protected] The efficient solution of large problems is an ongoing thread of research in scientific computing. An increasingly popular method of solving these types of problems is to harness disparate computational resources and use their aggregate power as if it were contained in a single machine. We describe a prototype system allowing the data and commands exchange between different mathematical software kernels running on multiple processors of a cluster.

1

Introduction

Many scientific and engineering problems are characterized by a considerable run-time expenditure on one side and a medium-grained logical problem structure (e.g. complex simulations) on the other side. Often such problems cannot be solved in a single SCE (scientific computing environment, referring here a computer algebra system - CAS - or a specialized problem solving environment - PSE), but they can be parallelized effectively even on networked computers. For such problems we do not necessarily need a parallel SCE. Instead we need some methods to integrate the SCE and other autonomous tools into a parallel virtual system (e.g. couplings with external programs). On another hand, the numerical computing facilities of CAS can be improved by coupling them with specialized PSE. Several parallel virtual or distributed systems were constructed on top of three frequently used SCE, Maple, Matlab and Mathematica (see Section 2). We propose a system prototype, shortly described in Section 3, namely PaViS (PArallel VIrtual mathematical Solver) which interconnects the above enumerated SCE kernels running on different processors. Section 4 benchmarks the system in solving different computing intensive problems. 2

Related work

Various mechanisms have been developed to perform computations across diverse platforms. The common mechanism involves software libraries (Table 1). Unfortunately, some of these libraries are highly optimized for only certain

petcu2: submitted to World Scientific on November 29, 2003

1

Table 1. Examples of SCEs with parallel and distributed facilities SCE Maple

Matlab

Mathematica

Multiproces. version Distributed13 FoxBox3 For Paragon1 kMaplek14 Sugarbush2 AlphaBridge6 Conlab5 DPToolbox7 Falcon12 MultiMatlab15 Distributed13 Parallel Toolkit16

Built upon Java MPI Kernel Strand Linda Mex PICL PVM F90 MPI Java RSH

Type Extended Extended New Extended Extended New Extended Extended Translation Extended Extended Extended

Parallel No Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes

Cluster Yes Yes No No No No Yes Yes Yes Yes Yes Yes

platforms and do not provide a convenient interface to other computer systems. Other libraries demand considerable programming effort from the user. While several tools have been developed to alleviate these difficulties, such tools themselves are usually available on only a limited number of computer systems and rarely freely distributed. Maple, Matlab and Mathematica are examples of such tools. Moreover, the CAS users are often frustated by the time to rerun a program for different conditions, parameters or initial guesses. Such user’s problems might be solved by a system that makes it convenient to spawn CAS processes on multiple processes of a parallel computer or a cluster. In many cases the needs for communication between the processors are rather small compared with the necessary computations. Some of the available parallel SCE implementations involve completely new codes rather than the use of the existing systems (with a good reason if the aim is high performance). A such rebuild is impossible without access to the SCE code source. Another disadvantage is that the existing SCE are at present so widely used, and so extensive in their capabilities, that it is unrealistic and inefficient to try to duplicate them. Another option is to build upon existing sequential SCEs and to produce some extensions. Several attempts have been made to combine Maple with parallel or distributed computation features. kMaplek14 is a portable system for parallel symbolic computations built as an interface between the parallel programming language Strand and the sequential CAS Maple. Sugarbush2 combines the parallelism of C/Linda with the Maple. FoxBox3 provides an MPI-compliant distribution mechanism allowing parallel and distributed execution of FoxBox programs; it has a client/server style interface to Maple. Distributed Maple13 is a portable system for writing parallel programs in Maple, which allows to create concurrent tasks and have them executed by Maple kernels running on


2

different machines of a network. The system consists of two components: a Java class library which implements a general purpose communication and scheduling mechanism for distributed applications and a binding that allows to access the Java schedule from Maple. Parallel Computing Toolkit16 introduces parallel computing support for Mathematica; this commercial tool can take advantage of existing Mathematical kernels on a multiprocessor or a cluster. Distributed Mathematica13 is a public domain system allowing to create concurrent tasks and have them executed by Mathematica kernels running on different machines on a cluster. The aim of MultiMatlab15 is to provide a tool that facilitates the use of a parallel computer or a cluster to solve course-grained large-scale problems using Matlab. The system runs on top of MPI. The user operating within one Matlab session can start Matlab processes on other machines and then pass commands and data between these various processes. Several other projects are taking the route of translating Matlab into a compilable language, either a parallel language or a standard language with the addition of message passing constructs. In Falcon? for example, the Matlab code is translated into F90 and the parallelization is left to a parallel computer; in ConLab5 the code is translated into C. DPToolbox7 runs on top of PVM and has three components: a Matlab/PVM interface that provides the communication primitives of the PVM system in Matlab, an interface for developing distributed and parallel Matlab applications, and an interface to manage parallel Matlab machines. 3

Pavis overview

PaViS intends to provide a uniform, portable and efficient way to access computational resources of a cluster or even of a parallel computer. It is built upon PVM, thus enabling users to create low-cost virtual parallel computers for solving mathematical problems. PaViS implements a master–slave paradigm. All message-passing details are hidden as far as possible from the user. The master controlled by the user contacts PaViS and sends computation requests (input parameters). PaViS runs appropriate slaves and returns computation results (outputs or error status) to the master. High-level commands are provided in SCE source form, so they can serve as templates for building additional parallel programs. The system is comprised of a set of connected processing elements. Wrapping SCE is done by an external system which takes care of the concurrent execution of tasks: PVM active daemons (PVMDs) ensure the inter-processor communications. Each node connected to a PaViS session comprises three main components: command messenger, SCE interface and PVMD daemon.


3

Figure 1. Parallel virtual environment created to solve computational intensive problems using the message-passing strategy (CM denotes the command-messenger, processing element denotes a processor ant its local memory, a dash line indicates an optional system component, and actually a SCE can be Maple, Matlab or Mathematica) Table 2. SCE interface with PaViS: function package pvm Function spawn send receive settime exit time ProcID MachID, TaskID Tasks

Meaning create local or remote SCE processes send commands to SCE processes receive results from SCE processes start a chronometer kill the slave kernels and command-messengers post-processing time diagram in SCE graphic format process identifier on one machine station/process identifier into the virtual machine list the process identifiers

The command messenger is a daemon process that awaits and interprets master requests and slave responses (Figure 1). It assists the message exchanges between the SCE processes, coordinates the interaction between SCE kernels via PVM daemons (receives SCE commands from other SCE processes, local or remote, sends its results to other processes), and schedules tasks among processing elements (activates or destroy other SCE processes). A special file in SCE format, a function package, implements the interface between the SCE kernel and the command-messenger. Table 2 describes the current available functions. PaViS is built on top of three existing binding programs: PVMaple, PVMatlab and PVMathematica. Each one connects kernels of the same type. Recently described PVMaple9 prototype can be freely downloaded from www.info.uvt.ro/˜petcu/pvmaple (binaries for Win32 and Unix platforms); PVMatlab and PVMathematica will be also available soon. Parallel Virtual Maple is a prototype system allowing to study the issue of interconnecting PVM and


4

Maple. Its design principles are very similar to those of Distributed Maple. The interface between Maple and command-messenger is described in a Maple function package pvm.m. The user interacts with the system via the text oriented Maple front-end. Initialization of a PVMaple session, activation of local or remote Maple processes and message-passing facilities are provided by the function package. The first six functions from Table 2 have equivalents in Distributed Maple. Pvmaple does not allow shared objects like Distributed Maple and on-line visualization of active processes, but allows more than one Maple command to be send once to remote processes (in a string). Tests performed on Unix networks where both applications are available have not reveal significant differences in application execution times8 . The similar tool, PVMathematica, is closest to Distributed Mathematica13 than to Parallel Toolkit16. PVMatlab project is similar to DPToolbox. The same idea was used in both tools: to allow the user access to PVM functions supporting process management and messages exchanges from a frequently used SCE. DPToolbox is more complex that PVMatlab; the post-processing visualization of a session and the extension, PaViS, are the only arguments pro our tool. 4

Performance results

We prove by experiments that the response time of the SCE solvers can be improved by using cooperation between different kernels of the same type or by using the solving capabilities of other SCEs. We present here the test results obtained using a cluster of 4 dual-processor SGI Octanes linked by three 10-Mbit Ethernet sub-networks, Maple V Release 5 and Matlab 6. Table 3 shows an example of using two components of PaViS (PVMaple and PVMatlab) in order to improve the time response of basic SCEs. In this case the sequential time necessary to factorize the list of integers increases linearly with the problem dimension (list length). The sequential execution times reported by both basic SCE are roughly the same (around 140 s for 3000 integers). Splitting the list of integers and the factorization requests between several SCE kernels running on remote processors, we can obtain a shorter response time. Indeed, for the given case, a speed-up factor of 2.3 (2.2) was registered using 3 kernels of Maple (same for Matlab), one on the same processor like the user interface, two on remote processors. We expect to obtain higher speed-up values by increasing the problem dimension since the communication time increases slower with the number of integers than the computation time. To allow the collaboration between different SCE kernels the syntax of spawning, sending and receiving commands has been slightly modified, inclu-


5

Table 3. Code for distributed integer factorization within PVMaple/PVMatlab sessions: 3000 integers are randomly generated (intfac function), distributed in a round-robin fashion to 3 SCE kernels, and factorized (nfactor procedure), the final list of factors being presented in the user’s SCE interface # file intfac.txt intfac=proc(dim,big,p) local n,r,i; r:=rand(big); n:=[]; for i to dim do n:=[op(n),r()]; od; nfactor(n,p); end: nfactor:=proc(n,p) local tag,mes; mes=cat(`readlib(ifactors): n:=`, convert(n,string),`:s:=[]: for i to nops(n) do`, ìf(i mod`,convert(p,string),`)=pvm[TaskId]-1`, `then s:=[op(s),ifactors(n[i])]: fi:od: s;`); tag:=pvm[send](àll`,mes); pvm[receive](tag,àll`) end:

% file intfac.m function f=intfac(dim,big,p) n=ceil(rand(1,dim)*big) f=nfactor(n,p); % file nfactor.m function r=nfactor(n,p) mes=[’s=[]; n=’,num2str(n),’]’,. . . ’for i=1:length(n), if mod(i,’,. . . num2str(p),’)==pvm[TaskId]-1,’. . . ’s=[s,factor(n(i))]; end; end; s’]; tag:=pvm(’send’,’all’,mes); r=pvm(’receive’,tag,’all’);

# Interactive session: Maple commands > read `pvm.m`: read ìntfac.txt`; > pvm[settime](); > pvm[spawn]([`‘sgi1`,2],[`sgi3`,1]); > intfac(3000,2ˆ32,nops(pvm[tasks]())-1); > pvm[exit](); > pvm[time](àll`);

% Interac.session: Matlab commands > pvm(’init’); > pvm(’settime’) > pvm(’spawn’,’sgi1’,2,’sgi3’,1) > intfac(3000,2ˆ32,3) > pvm(’exit’) > pvm(’time’)

ding a third parameter, the SCE type (’maple’, ’matlab’ or ’mathe’). The user must send string of commands according to the destination SCE type. The second example is concerning the graphics facilities of Maple. We consider the problem of plotting a Julia fractal: the complex numbers z0 for 2 which zn → 0 where zn := zn−1 + c, n ≥ 1, c ∈ C (Figure 2). Given a rectangle in the complex plane and a large integer N , an approximate Julia set can be plotted using a regular grid inside the rectangle and by drawing those grid points z0 for which |zN | < 1. Our tests have shown that Maple is three times slower than Matlab in the computation of the zN values (in the faster variant with a cycle instead a function composition), and Matlab is twenty times slower than a similar C program. Two improvements are possible: to use more than one Maple kernel to construct parts of the plots or to let a Matlab kernel to compute the z values (a special function is needed to convert the results send by Matlab in a Maple plot structure). In the case depicted by left part of Figure 2, we have obtain a speed-up of 1.8 using 2 Maple kernels (with equal loads; code suggested by the right part of Figure 2), 3.1 using 4 Maple kernels (on the 4 regular domains; small load imbalance) and 1.9 using a couple [master:Maple,slave:Matlab], 1.3 using 2 Matlab kernels and 1.1 (inefficient) using a couple [master:Matlab,slave:Maple].


6

#Maple sequential code > f:=(x,y)− >(xˆ2-yˆ2+0.32, 2*x*y+0.043): > g:=(x,y)− >xˆ2+yˆ2: > h:=proc(x,y) if g((f@@130)(x,y)) plot3d(`h(x,y)`, x=-1..1,y=-1.15..1.15,grid=[400,400], > view=[-1..1,-1.15..1.15,0..0.75],orientation=[90,0]); #Maple distributed code > read `pvm.m`; f:=. . . g:=. . . h:=. . . ; pvm[settime](); > pvm[spawn]([`sgi1`,1,`maple`],[`sgi2`,1,`maple`]); > m:=pvm[send]([`sgi1`,1,`maple`], ` f:=...g:=...h:=...r1:=plot3d(...x=0..1...);`) > r2:=plot3d(...x=-1..0...): > r1:=pvm[receive]([`sgi1`,1,`maple`],m); > plots[display3d]([r1,r2]); pvm[exit](); pvm[time](àll`); Figure 2. Maple plot of a Julia fractal in [−1, 1] × [−1.15, 1.15] with c = 0.32 + 0.043i, a grid of 400 × 400 points and N = 300: the plotting time is O(103 ) seconds in the sequential case and can be reduced to O(102 ) by using other Maple kernels or Matlab kernels

A more complex test is concerning the usage of PaViS to solve large initial value problem provided by the semi-discretization of partial differential equations. The test problem is a mathematical model of the movement of a rectangular plate under the load of a car passing across it4 . Applying the method of lines, a large ODE system arises. The number ODE equations depends on the accuracy required in the PDE solution. Usually a such system has hundreds of equations. Maple cannot solve symbolically a such big system, and for numeric computations is too slow compared with Matlab or other programs written in standard programming languages. A first solution in order to solve the large ODE system in Maple is to use parallel Runge-Kutta methods for which subsets of stage equations can be solved independently at each time step (a sketched general procedure was shortly presented in 9 ). For the semi-discretizated plate problem with 32 ODEs, PVMaple has reported an speed-up of 2.5 using 3 Maple kernels running on the cluster (further tests with PVMaple concerning ODEs are presented in 10 ). A second solution is to use specialized SCE on numerical computations, like Matlab, to improve the response time of user’s preferred solver. Replacing Maple processes for solving stage equations with Matlab processes, we have obtained approximately one half of the initial response time with a small degradation in the system efficiency (further tests are presented in 11 ). A third solution is to use PaViS as interface between different SCE: for example the user describes the ODE problem in Maple and activates remotely an ODE numerical solver from Matlab; currently the user must translate the data, the commands and the results between different SCEs which is acceptable only if they are outputs or inputs of other computational intensive procedures described in user preferred SCE.


7

5

Future improvements

PaViS is by no means in its final form. Various extensions in functionality are under development. The current system needs improvement in the area of robustness with respect to various kinds of errors, and in its documentation. We intend to couple the system also with some PSE and to include in PaViS an automatic command and data structure translator between Maple, Matlab and Mathematica. PaViS is potentially useful for education in parallel programming, for prototyping parallel algorithms, and for fast and convenient of easily parallelizable computations on multiple processors. References 1. L. Bernadin, Maple on Massively Parallel Distributed Memory Machine, in Pasco ’97, eds. M. Hitz et al (ACM Press, New York, 1997). 2. B. W. Char, Progress report on a system for general-purpose parallel symbolic algebraic computation, in ISSAC ’90, (ACM Press, New York, 1990). 3. A. Diaz and E. Kartofen, FoxBox: a system for manipulating symbolic objects in black box representation, in ISSAC ’98, ed.O.Gloor (ACM Press, 1998). 4. Hairer E., Wanner G., Solving ordinary differential equations II. Stiff and differential-algebraic problems (Springer-Verlag, 1991). 5. P. Jacobson, B. Kagstrom and M.Rannar, Algorithm development for distributed memory computers using ConLab, Sci.Programming 1, 185-203 (1992). 6. J. Kadlec and N. Nakhaee, AlphaBridge: parallel processing with Matlab, in Proceedings of 2nd MathWorks Conference (1995). 7. S. Pawletta, Distributed and parallel application toolbox for use with Matlab, anson.ucdavis.edu/˜bsmoyers/parallel.htm (1997). 8. D. Petcu, Working with multiple Maple kernels connected by Distributed Maple or PVMaple, Preprint RISC 18-01 (Linz, 2001). 9. D. Petcu, PVMaple: a distributed approach to cooperative work of Maple processes, in LNCS 1908: PVM-MPI’00, eds. J. Dongarra et al (2000). 10. D. Petcu, A networked environment for solving large mathematical problems, accepted for publication in LNCS: Proc. of EuroPar 2001 (Springer, 2001). 11. D. Petcu, Solving large systems of differential equations with PaViS, accepted for publication in LNCS: Proc. of PPAM 2001 (Spinger, 2001). 12. L. de Rose et al, Falcon: a Matlab interactive restructuring compiler, in Languages and compilers for parallel Computing (Springer, 1995). 13. W. Schreiner, Developing a distributed system for algebraic geometry, in EuroCM-Par’99, ed. B. Topping (Civil-Comp. Press, Edinburgh, 1999). 14. K. Siegl, Parallelizing algorithms for symbolic computation using kMaplek, in 4th Symp. Princip. & Practice Par. Program. (ACM Press, San Diego, 1993). 15. A. Trefethen, MultiMatlab, www.cs.cornell.edu/Info/people/Int/multimatlab. 16. Wolfgang, Parallel computing toolkit, www.wolfgang.co.jp/news/pct.


8