Nov 7, 2000 - RHS value return: MPI arguments with In/Out data signature must appear in the .... Eaton, J.W.: âOctave: Past, Present and Futureâ, in DSC 2001 ...
MPI Toolbox for Octave J. Fernández1, M. Anguita1, S. Mota1, A. Cañas1, E. Ortigosa1, F.J. Rojas 1 1
Depto. Arquitectura y Tecnología de Computadores, ETSI Informática, Univ. Granada, c/Periodista Daniel Saucedo Aranda s/n, 18071-Granada, Spain Tlf: +34 958 248994 FAX: +34 958 248993 {javier, mancia, sonia, acanas, eva, frojas} @atc.ugr.es 1 http://atc.ugr.es
Abstract. Here we describe our LAM/MPI interface for the Octave programming environment, similar to our previous MPITB (MPI toolbox) for MATLAB and based in the experience we gained with that work. Despite the series of attempts by other developers, no complete MPI interface was yet available for Octave, and some of the previous partially successful attempts are no longer downloadable. An introductory section summarizes the degree of success and availability of these attempts, and justifies the convenience of developing our new Octave package based on the continuing user demand and current lack of support. Section 2 explains the main design criteria and implementation details of MPITB. Section 3 describes the performance measurements obtained from our Octave toolbox and compares them to the available previous packages. A final concluding section summarizes the main goals achieved with this work.
Topics: Problem Solving Environments. Cluster and Grid Computing. Parallel and Distributed Computing.
1. Introduction Recently [9] we have developed an Octave [4] version of our MPITB toolbox for MATLAB [8], which was presented in a previous VecPar Meeting. This toolbox makes the MPI library calls available to Octave users from within the Octave environment, thus letting them develop their own parallel Octave applications in a Linux cluster. GNU Octave [4, 5] is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB. MATLAB [15] handles a range of computing tasks in engineering and science, from data acquisition and analysis to application development. The MATLAB environment integrates mathematical computing, visualization, and a powerful technical language. 1
This work has been partially supported by the EU project SpikeFORCE (IST-2001-35271).
The main information sources for locating previous parallel Octave prototypes have been the Octave and Octave-Forge web pages [4, 13], as well as the Octave mailing list archives and Internet search engines. We tried to download, install, use and compare the four parallel Octave projects found: PVMOCT [14], Octave-MPI patches [11], Parallel Octave (included in Octave-Forge) [10, 13] and D-Octave (distributed Octave) [1]. Table 1 summarizes the outcome of our survey: Table 1. Survey on parallel Octave prototypes Package
based on
PVMOCT
PVM
#cmds 46
date Feb. 1999 [14]
Octave-MPI
LAM/MPI
7
Nov.2000 [11]
P-Octave
LAM/MPI
18
Jun. 2001 [10]
D-Octave
LAM/MPI MPICH LAM/MPI
2+6
Mar. 2003 [1]
160
Apr. 2004 [9]
MPITB
status developed for octave-2.0.13 would require major edition prepatched for octave-2.1.31 would require major edition works with octave-2.1.44 requires minor source code edit prepatched for octave-2.1.45 proof-of-concept only precompiled for 2.1.50-57 supported
The development of a complete interface to the main parallel programming libraries for Octave had been never accomplished [7] before MPITB. The Octave mailing list keeps record of a handful of partial successes –some of them have never gone public [17] or the download web page is now a broken link [12]. Users periodically ask about parallel support on Octave. The most recent messages at the date of writing of this paper have not been answered yet [2, 3]. It should also be noticed the proactive interest from J.W. Eaton (the main Octave developer and maintainer) in having parallel programming support for Octave [6, 7]. The main features of the surveyed prototypes are briefly commented below: PVMOCT [14], developed by R.A. Lippert in February’99, consist of one file pvm.cc which might be compiled by simply using the Octave utility mkoctfile, instead of editing the Octave sources as suggested. It would need a major rewriting if being adapted to a current Octave version. The author points out both facts in [14]. The Octave-MPI patches [12], developed by Andy Jacobson in November’00, consist of 7 loadable files and 12 patches for the octave-2.1.31 version. It would require major rewriting if being adapted to a current Octave version. Messages like [11] suggest that the author will not update or support it anymore. Parallel-Octave (P-Octave), developed by Hayato Fujiwara in June’01, is a complete autoconfigurable package which creates 18 loadable MPI functions and 1 internal wrapper to run on slave processes. The web site is still available, and the last version (Dec.2003) works with octave-2.1.44 with minor editing (moving the sendhost/recvhost methods in liboctnet/netbase.h from protected: to public:). It is the only eligible package against which to compare our MPITB. Distributed-Octave (D-Octave), developed by J.D. Cole in March’03, consist of 6 loadable MPI functions and 2 new reserved words (29 patches) for octave-2.1.45.
The interfaced functions are mpi_init/_finalize, mpi_comm_rank/_comm_size, mpi_send/_recv, and we have been able to make them work in 1 computer, but no instructions are given on how to build a parallel application and there is no reply from the documented support address. The author points out that it is a Proof-OfConcept release only [1]. Our MPITB package is solely composed of Octave scripts (85 .m files) and loadable functions (160 .oct files). Like P-Octave, it does not rely so heavily on the Octave version –no Octave patches required. It obviously requires minor updates as the liboctave API and internal representation of octave_values change –as happened from octave-2.1.50 to 2.1.57, which required editing 3 loadable functions (6 lines in MPI_Pack, _Pack_size and _Unpack) and one include file (Stat.h), due to a change in structs internal representation. It will hopefully fulfill the Octave user community demand of a parallel interface to the main parallel programming libraries. It interfaces the whole MPI-1.2 standard as provided by LAM/MPI, and some of the most useful MPI-2.0 calls.
2. MPITB implementation details The key implementation details for MPITB are: Loadable files: For each MPI call interfaced to the Octave environment, an .oct file has been built, which includes the doc-string and invokes the call-pattern. The source for the loadable file includes at most just 2 files: the global include (mpitb.h) and possibly a function-specific include (Send.h, Topo.h, Info.h, etc). Call patterns: MPI calls are classified according to their input-output signature, and a preprocessor macro is designed for each class. This call-pattern macro takes the name of the function as argument, substituting it on function-call code. Building blocks: Following the criterion of not writing things twice, each piece of self-contained code that is used at least twice (by two call patterns or building blocks) is itself promoted to building block and assigned a macro name to be later substituted where appropriate. This feature makes MPITB maintenance tractable. Include files: build-blocks and call patterns are grouped for affinity, so that only the functions that use them have to preprocess them. A building block or call pattern shared by two or more such affine groups is moved to the global include file. Pointers to data: MPI requires pointers to buffers from/to where the transmitted info is read/written. For arrays, pointer and size are obtained using methods data() and capacity() common to all Octave array types. For scalars, C++ data encapsulation is circumvented (abused), taking the object address as starting point from which to obtain the scalar’s address. Shallow-copy mechanisms are also taken into account by invoking method make_unique() if the Octave variable is shared. RHS value return: MPI arguments with In/Out data signature must appear in the Octave right-hand-side. An option would be returning the MPI Out data as Octave LHS –poor performance design decision, since the returned object should be constructed. We instead force such input argument to be an Octave symbol (named variable), so the output value can be returned in the input variable itself.
According to our experience [8], these two last features guarantee that MPITB performance will be superior to any toolbox lacking of them. In fact, P-Octave loadable functions eventually call mpi_read/save_binary_data() which are based in the previous Octave-MPI patches and ultimately in Octave’s load-save() functions (see acknowledgement in file dld/libdld_common/libdld_common.cc), with their costly object copy/create and type/symbol/format check operations (see $OCTAVE/src/loadsave.cc and ls-oct-binary.cc), which are not required for MPI operation.
3. Performance measurements As in our previous VecPar work [8], we base our performance measurements on the usual ping-pong test and a toy problem frequently cited in the literature – computation of π by numeric integration [16]. In the ping-pong experiment below (Fig. 1) we compared the point-to-point and collective operation performance of MPITB (octave-2.1.57) and P-Octave (octave2.1.44) using LAM-7.0.4 for both. The comparison is in no way fair, since P-Octave is stuck at octave-2.1.44 and the Octave API has significantly changed (improved) since then. These results are just a description of what can be expected from: MPITB point-to-point (MPI_Send/_Recv) and collective (MPI_Reduce) operation. P-Octave p2p (mpi_send/_recv) and collective (mpi_bcast) operation. MPITB performance is very good, and absolutely reproducible. P-Octave has greater overhead and less reproducible timing due to load-save() functions invoked and object construction for LHS-return in msg=mpi_recv(size,src,tag,comm). PingPong Test MPITB/P-Octave, 1000000 bytes Time
Bandwidth
1
33554432
0.25
Ping time (s)
0.0625
point2point rpi tcp collective rpi tcp point2point rpi lamd collective rpi lamd 0.015625
0.00390625
Bandwidth (B/s)
1048576
32768
1024
32
0.000976562
0.000244141
point2point rpi tcp collective rpi tcp point2point rpi lamd collective rpi lamd
1 1
32 1024 32768 1048576 Message size (bytes)
1
32 1024 32768 1048576 Message size (bytes)
Fig. 1. Results of the ping-pong test. Best performance traces come from MPITB tests.
Our other experiment (Fig. 2 below) tries to determine how performance is affected when some computation is inserted between successive message-passing calls, this being their normal use in real applications. Both general-purpose LAM RPIs (Request Progression Interface) are tested –TCP and LAM daemon. Memory-copy and load-save additional operations in P-Octave result in a more pronounced overhead. Escalability MPITB/P-Octave Pi algorithm, 800000 subdivisions Time
Scalability
1.8
8 point2point rpi tcp collective rpi tcp point2point rpi lamd collective rpi lamd server@ 400MHz slave @ 333MHz
1.6
7 6
1.2
5 Speedup
Seq/Par time (s)
1.4
point2point rpi tcp collective rpi tcp point2point rpi lamd collective rpi lamd
1
4
0.8
3
0.6
2
0.4
1
0.2
0 1
2
3 4 5 6 Number of computers
7
8
1
2
3 4 5 6 Number of computers
7
8
Fig. 2. Computation time and Speedup up to 8 PCs. Best performance traces are MPITB tests.
We stress however the ease of coding attained with P-Octave and other toolboxes’ syntax (LHS-return), so the final user decision comes to either: Choose the other toolboxes and benefit from easier Octave coding, at the price of performance loss and fixed, obsolete Octave version. Choose MPITB and benefit from minimum overhead with current Octave versions, at the price of more complex Octave coding.
4. Conclusion MPITB allows Octave users to build their own LAM/MPI based parallel applications, by simply installing our package and adding the required MPI calls to their source Octave code, without patching their Octave installation. The performance and degree of functionality provided by MPITB is excellent compared to other parallel packages. Being then the three required components free (Octave, LAM/MPI, MPITB), and being that a key advantage of Octave (very emphatically highlighted by [7]), we hope that an ever growing Octave user community gets started into parallel computing in clusters of PCs using MPITB. The porting of MPITB to Octave will show us if the limiting factor in the popularization of high-level HPC is MATLAB’s price or a lack of users willing to try free high-level parallel environments like Octave in Linux PC clusters.
There is a gap of more than 4 years from the earliest discussions on the subject [14] until a full MPI interface under Octave is available. In [7] §2.2.3, Eaton is astonished on the work being already done for MATLAB at the price of one license per process, and not for Octave which is free; later he concludes that a few simple interfaces have been written, but the non-trivial task of writing complete ones is required if Octave is ever expected to have higher-level parallel support. We hope our MPITB fills the gap.
References 1. Cole, J.D.: D-Octave package http://www.transientresearch.com/d-octave/, announced in http://www.octave.org/octave-lists/archive/octave-maintainers.2003/msg00080.html Apr’03 2. Creel, M: “Any news on the parallel front?” http://www.octave.org/octave-lists/archive /help-octave.2003/msg00547.html unanswered, September 2003. 3. Dushaw, B.: “Parallel octave - working fine?” http://www.octave.org/octave-lists/archive /help-octave.2003/msg00407.html, unanswered, August 2003. 4. Eaton, J.W.: Web http://www.octave.org/. http://www.gnu.org/software/octave/octave.html 5. Eaton, J.W.: “Octave Manual” http://www.network-theory.co.uk/octave/manual/ announced http://www.octave.org/mailing-lists/help-octave/2003/619, http://www.octave.org/docs.html 6. Eaton, J.W.: “Octave: Past, Present and Future”, in DSC 2001 Proceedings of the 2nd International Workshop on Distributed Statistical Computing, March 2001, Vienna, Austria. http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/Eaton.pdf. 7. Eaton, J.W.; Rawlings, J.B.: “Ten years of Octave –Recent developments and plans for the future”, in DSC 2003 Proceedings of the 3rd Int.Wshp. on Dstr.Stat.C, March 2003, Vienna, Austria http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/EatonRawlings.pdf 8. Fernández, J.; Cañas, A.; Díaz, A.F.; González, J.; Ortega, J.; Prieto, A.: “Performance of Message-Passing MATLAB Toolboxes”, in VECPAR 2002, LNCS 2565, pp.228-241, 2003. http://link.springer.de/link/service/series/0558/papers/2565/25650228.pdf. Toolboxes are available http://atc.ugr.es/javier-bin/pvmtb_eng, http://atc.ugr.es/javier-bin/mpitb_eng. 9. Fernández, J.: MPITB for Octave web page: http://atc.ugr.es/javier-bin/mpitb, announced in http://www.lam-mpi.org/MailArchives/lam/msg07840.php, April 2004. 10.Fujiwara, H.: Parallel Octave package http://www.higuchi.ecei.tohoku.ac.jp/octave/ seen in http://www.octave.org/octave-lists/archive/octave-sources.2001/msg00105.html, June 2001. 11.Jacobson, A.: “MPI: The example of R”. Message in the octave-maintainers mailing list, http://www.octave.org/octave-lists/archive/octave-maintainers.2003/msg00049.html Mar’03 12.Jacobson, A.: Octave-MPI patches. http://www.octave.org/octave-lists/archive/octavesources.2000/msg00065.html, web link in http://www.octave.org/octave-lists/archive/helpoctave.2000/msg00433.html broken http://corto.icg.to.infn.it/andy/octave-mpi/, Nov’00. 13.Kienzle, P. et al: Octave-Forge repository http://octave.sourceforge.net/, announcement seen in http://www.octave.org/octave-lists/archive/octave-sources.2001/msg00010.html, Oct’01. 14.Lippert, R.: PVMOCT web page http://www.eskimo.com/~ripper/research/pvmoct.html, see http://www.octave.org/octave-lists/archive/octave-maintainers.1999/msg00084.html, Feb’99 15.MATLAB web page http://www.mathworks.com/. See also http://www.mathworks.com/ products/matlab/description1.html for a detailed description. 16.Quinn, M.J.: “Parallel Computing Theory and Practice, 2nd Edition”, McGraw-Hill, N.York, 1994. Cited in Dietz, H.: “Parallel Processing HOWTO”, http://aggregate.org/PPLINUX. 17.Verstak, A.: MPI bindings, announced in http://www.octave.org/octave-lists/archive/helpoctave.2001/msg00433.html.