Automatic Binding of Native Scienti c Libraries to ... - Semantic Scholar

1 downloads 0 Views 145KB Size Report
the ParkBench suite. The results obtained on a distributed{memory IBM ..... http://www.cs.rochester.edu/u/wei/javaworkshop.html. 13. JavaSoft. Home page.
Automatic Binding of Native Scienti c Libraries to Java Sava Mintchev and Vladimir Getov School of Computer Science, University of Westminster, London, UK http://perun.scsise.wmin.ac.uk/ fs.m.mintchev, [email protected]

TR-CSPE-09, September 5, 1997 To appear in the Proceedings of ISCOPE'97, Springer LNCS

Abstract. We have created a tool for automatically binding existing native C libraries to Java. With the aid of the Java{to{C Interface generating tool (JCI) the abundance of existing C and Fortran-77 scienti c libraries can more easily be made available to Java programmers. We have applied JCI to bind MPI, PBLAS, ScaLAPACK and other libraries to Java. The approach of automatic binding ensures both portability across di erent platforms and full compatibility with the library speci cations. In order to evaluate the performance of Java code which accesses native libraries, we have run Java versions of parallel benchmarks from the ParkBench suite. The results obtained on a distributed{memory IBM SP2 machine demonstrate the viability of our approach.

1 Introduction As a programming language, Java has the basic qualities needed for writing high{performance applications. With the maturing of compilation technology, such applications written in Java will doubtlessly appear. Since Java is a fairly new language, however, it lacks the extensive scienti c libraries of languages like Fortran-77 and C. The need for access to scienti c libraries in Java can be satis ed by: { writing new libraries in Java; { manually or automatically translating Fortran-77/C library code into Java (e.g. with the f2j tool [5]); { manually or automatically creating a Java wrapper for an existing native Fortran-77/C library. The last approach, which we are primarily interested in, has the obvious advantage of involving the least amount of work, thus reducing dramatically the time for development. Moreover, it guarantees the best performance results, at least in the short term, because the well{established scienti c libraries usually have multiple implementations carefully tuned for maximum performance

on di erent hardware platforms. Last but not least, by applying the software re-use tenet, each native library can be linked to Java without any need for re-coding or translating its implementation.

2 Binding an existing native library to Java The binding of a native library to Java amounts to either dynamically linking the library to the Java virtual machine, or linking the library to the object code produced by a stand-alone Java compiler. At rst sight it appears that this should not be a problem, as Java implementations support a native interface via which C functions can be called1 . There are some hidden problems, however. First of all, native interfaces are reasonably convenient when writing new C code to be called from Java, but rather inadequate for linking pre-existing native code. The diculty stems from the fact that Java has in general di erent data formats from C, and therefore existing C code cannot be called from Java without prior modi cation. Binding a native library to Java is also accompanied by portability problems. The native interface is not part of the Java language speci cation [9], and different vendors o er incompatible interfaces. Furthermore, native interfaces are not yet stable and are likely to undergo change with each new major release of a Java implementation2. Thus to maintain the portability of the binding one may have to cater for a variety of native interfaces.

2.1 The Java{to{C interface generator In order to call a C function from Java, we have to supply for each formal argument of the C function a corresponding actual argument in Java. Unfortunately, the disparity between data layout in the two languages is large enough to rule out a direct mapping in general. For instance: { primitive types in C may be of varying sizes, di erent from the standard Java sizes; { there is no direct analog to C pointers in Java; { multidimensional arrays in C have no direct counterpart in Java; { C structures can be emulated by Java objects, but the layout of elds of an object may be di erent from the layout of a C structure; { C functions passed as arguments have no direct counterpart in Java. We want to link a large C library (e.g. MPI [16]) to a Java virtual machine. Because of the disparity between C and Java data types, we are faced with two options: 1

For simplicity we shall focus on C in the discussion that follows, but the main points generalize to Fortran-77, and to other C{linkable languages. 2 JNI in Sun's JDK 1.1 is regarded as the de nitive native interface, but it is not yet supported in all Java implementations on di erent platforms by other vendors.

2

1. Rewrite the library C functions so that they conform to the particular native interface of our Java VM; or 2. Write an additional layer of \stub" C functions which would provide an interface between the Java VM (or rather its native interface) and the library. Software engineering considerations make option (1) a non-starter: it is not our job to tamper with a library supported by others. But option (2) is not very attractive either, considering that a native library like MPI can have more than a hundred accessible functions. The solution is to choose (2), and automate the creation of the additional interface layer. The Java{to{C interface generator, or JCI, takes as input a header le containing the C function prototypes of the native library. It outputs a number of les comprising the additional interface: { a le of C stub{functions; { les of Java class and native method declarations; { shell scripts for doing the compilation and linking. The JCI tool generates a C stub{function and a Java native method declaration for each exported function of the native library. Every C stub{function takes arguments whose types correspond directly to those of the Java native method, and converts the arguments into the form expected by the C library function. As we mentioned in Section 1, di erent Java native interfaces exist, and thus di erent code may be required for binding a native library to each Java implementation. We have tried to limit the implementation dependence of JCI output to a set of macro de nitions describing the particular native interface. Thus it may be possible to re-bind a library to a new Java machine simply by providing the appropriate macros.

2.2 Binding C libraries (MPI, BLACS, PBLAS)

The largest native library we have bound to Java so far is MPI: it has in excess of 120 functions [15]. The JCI tool allowed us to bind all those functions to Java without extra e ort. Since MPI libraries are standardized, the binding generated by JCI should be applicable without modi cation to any MPI implementation. As the Java binding for MPI has been generated automatically from the C prototypes of MPI functions, it is very close to the C binding. This similarity means that the Java binding is almost completely documented by the MPI-1 standard, with the addition of a table of the JCI mapping of C types into Java types. So far we have bound MPI to two varieties of the Java virtual machine | JDK 1.0.2 [13] for Solaris and for AIX 4.1 [11]. The MPI implementation we have used is LAM of the Ohio Supercomputer Center [4]. Other libraries written in C for which we have created Java bindings are the Parallel Basic Linear Algebra Subprograms (PBLAS) and the Communication Subprograms (BLACS). The library function prototypes have been taken from the ParkBench 2.1.1 distribution [17]. Table 1 gives some idea of the sizes of JCI{ generated bindings for individual libraries. In addition, there are some 2280 lines of Java class declarations produced by JCI which are common to all libraries. 3

Size of Java binding library written in functions C lines Java lines MPI C 125 4434 439 BLACS C 76 5702 489 BLAS F77 21 2095 169 PBLAS C 22 2567 127 PB-BLAS F77 30 4973 241 LAPACK F77 14 765 65 ScaLAPACK F77 38 5373 293

Table 1. Native libraries bound to Java

2.3 Binding Fortran-77 libraries (BLAS, PB-BLAS, ScaLAPACK) The JCI tool can be used to generate Java bindings for libraries written in languages other than C, provided that the library can be linked to C programs, and prototypes for the library functions are given in C. We have created Java bindings for a number of libraries written in Fortran-77: the Basic Linear Algebra Subprograms (BLAS Level 1{3, PB-BLAS) [6], and the Scalable Linear Algebra Package (LAPACK, ScaLAPACK) [3]. The C prototypes for the library functions have been inferred by f2c [8]. The bindings generated by JCI are fairly large in size (see Table 1) because they are meant to be portable, and to support di erent data formats. On a particular hardware platform and Java native interface, much of the binding code may be eliminated during the preprocessing phase of its compilation. As our experiments on IBM SP2 machines so far have shown, a negligible amount of time is spent in the binding itself during execution of Java programs. The use of native numerical code in Java programs is certain to improve performance, as recent experiments with the Java Linpack benchmark [7] and some BLAS Level 1 functions written in C have shown [2, 12]. By binding the original native libraries like BLAS, Java programs can gain in performance on all those hardware platforms where the libraries are eciently implemented.

3 Experimental results In order to evaluate the performance of the Java binding to native libraries, we have translated into Java a C + MPI benchmark | the IS kernel from the NAS Parallel Benchmark suite NPB2.2 [1] The program sorts in parallel an array of N integers; N = 8M for IS Class A. The original C and the new Java versions of IS are quite similar, which allows a meaningful comparison of performance results. We have run the IS benchmark on two platforms: a cluster of Sun Sparc workstations, and the IBM SP2 system at the Cornell Theory Center. Each SP node used has a 120 MHz POWER2 Super Chip processor, 256 MB of memory, 4

128 KB data cache, and 256 bit memory bus. The results obtained on the SP2 machine are shown in Table 2 and Figure 1. The Java implementation we have used is IBM's port of JDK 1.0.2D (with the JIT compiler enabled), and the MPI library | a customized version of LAM 6.13. We opted for LAM rather than the proprietary IBM MPI library because the version of the latter available to us does not support the re-entrant C library required for Java [10]. The results for the C version of IS under both LAM and IBM MPI are also given for comparison. Class Language A Java C C

MPI Execution time (sec) Mop/s total implement 1 2 4 8 16 1 2 4 8 16 LAM | 48.04 24.72 12.78 6.94 | 1.75 3.39 6.56 12.08 LAM 42.16 24.52 12.66 6.13 3.28 1.99 3.42 6.63 13.69 25.54 IBM MPI 40.94 21.62 10.27 4.92 2.76 2.05 3.88 8.16 14.21 30.35

Table 2. Execution statistics for the C and Java IS benchmarks on the IBM SP2 machine at Cornell Theory Center, July 1997

It is important to identify the sources of the slowdown of the Java version of IS with respect to the C version. To that end we have instrumented the JavaMPI binding, and gathered additional measurements. It turns out that the cumulative time spent in the C functions of the JavaMPI binding is approximately 20 milliseconds in all cases, and thus has a negligible share in the breakdown of the total execution time for the Java version of IS. Clearly the JavaMPI binding does not introduce a noticeable overhead in the results from Table 2. Further experiments have been carried out with a Java translation of the MATMUL benchmark from the ParkBench suite [18, 17]. The original benchmark is in Fortran-77 and performs dense matrix multiplication in parallel. It accesses the BLAS, BLACS and LAPACK libraries included in the ParkBench 2.1.1 distribution. MPI is used indirectly through the BLACS native library. We have run MATMUL on a Sparc workstation cluster, and on the IBM SP2 machine at Southampton University (66MHz Power2 \thin1" nodes with 128Mbyte RAM, 64bit memory bus, and 64Kbyte data cache). The results are shown in Table 3 and Figure 2. It is evident from Figure 2 that Java MATMUL execution times are only 5{10% longer than Fortran-77 times. These results may seem surprisingly good, given that Java IS is two times slower than C IS (Figure 1). The explanation is that in MATMUL most of the performance{sensitive calculations are carried out by the native library routines (which are the same for both Java and Fortran-77 versions of the benchmark). In contrast, IS uses a native library (MPI) only for 3

Earlier results obtained with the original LAM 6.1 as reported in [15] show poor scalability w.r.t. the number of processors.

5

50 40

Java IS + LAM C IS + LAM C IS + IBM MPI

Execution time (sec)

30 20

10

5 4 3 2 1

2 4 8 Number of processors

16

Fig. 1. Execution time for IS class A on the IBM SP2 system at Cornell Theory Center, July 1997

Problem Lang MPI Execution time (sec) M op/s total size (N) implement 1 2 4 8 16 1 2 4 8 16 Java LAM | 17.09 9.12 5.26 3.53 | 117.0 219.4 380.2 566.9 1000 F77 LAM 16.45 8.61 5.12 3.13 121.6 232.3 390.4 638.3 F77 IBM MPI 33.25 15.16 7.89 3.91 2.20 60.16 132.0 253.6 511.2 910.0

Table 3. Execution statistics for the Fortran and Java MATMUL benchmarks on the IBM SP2 machine at Southampton University, July 1997

communication, and all calculations are done by the benchmark program. The performance results from Figure 2 must be a persuasive argument for linking native scienti c libraries to Java!

4 Conclusion In this paper we have summarised our work on high performance computation in Java. We have written a tool for automating the creation of portable interfaces to native libraries (whether for scienti c computation or message passing). We have applied the JCI tool to create Java bindings for MPI, BLAS, LAPACK etc. which are fully compatible with the library speci cations. With performance{ tuned implementations of those libraries available on di erent machines, the 6

50 40

Java + LAM F77 + LAM F77 + IBM MPI

Execution time (sec)

30 20

10

5 4 3 2 1

2 4 8 Number of processors

16

Fig.2. Execution time for MATMUL (N = 1000) on the IBM SP2 system at Southampton University, July 1997

potential exists for ecient numerical programming in Java. Our future work will focus on further experiments with Java numerical benchmarks on the IBM SP2 and other parallel platforms, as well as on making the PMPI [14] high-level message-passing interface available in Java.

Acknowledgments This work has been carried out as part of our collaboration with colleagues from the University of Southampton (U.K.) and the Cornell Theory Center (U.S.A.). In particular, we are grateful to Tony Hey (Southampton) and Susan Flynn Hummel (Cornell) for their continuous support and for making the IBM SP2 experiments possible.

References 1. D. Bailey et al. The NAS parallel benchmarks. Technical Report RNR-94-007, NASA Ames Research Center, 1994. http://science.nas.nasa.gov/Software/NPB. 2. A.J.C. Bik and D.B. Gannon. A note on native Level 1 BLAS in Java. In [12], 1997. 3. L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. ScaLAPACK: A linear algebra library for message-passing computers. In SIAM Conference on Parallel Processing, 1997.

7

4. G. Burns, R. Daoud, and J. Vaigl. LAM: An open cluster environment for MPI. In Supercomputing Symposium '94, Toronto, Canada, June 1994. http://www.osc.edu/lam.html. 5. H. Casanova, J.J. Dongarra, and D.M. Doolin. Java access to numerical libraries. In [12], 1997. 6. J. Choi, J. Dongarra, and D. Walker. PB-BLAS: A set of parallel block basic linear algebra subroutines. In Proceedings of the Scalable High Performance Computing Conference, Knoxville, TN, pages 534{541. IEEE Computer Society Press, 1994. 7. J. Dongarra and R. Wade. Linpack benchmark { Java version. http://www.netlib.org/benchmark/linpackjava. 8. S. I. Feldman and P. J. Weinberger. A Portable Fortran 77 Compiler. UNIX Time Sharing System Programmer's Manual, Tenth Edition. AT&T Bell Laboratories, 1990. 9. J. Gosling, W. Joy, and G. Steele. The Java Language Speci cation, Version 1.0. Addison-Wesley, Reading, Mass., 1996. 10. IBM. PE for AIX: MPI Programming and Subroutine Reference. http://www.rs6000.ibm.com/resource/aix resource/sp books/pe/. 11. IBM UK Hursley Lab. Centre for Java Technology Development. http://ncc.hursley.ibm.com/javainfo/hurindex.html. 12. ACM Workshop on Java for Science and Engineering Computation, Las Vegas, Nevada, June 21 1997. To appear in Concurrency: Practice and Experience. http://www.cs.rochester.edu/u/wei/javaworkshop.html. 13. JavaSoft. Home page. http://www.javasoft.com/. 14. S. Mintchev and V. Getov. PMPI: High-level message passing in Fortran77 and C. In Bob Hertzberger and P. Sloot, editors, High{Performance Computing and Networking (HPCN'97), pages 603{614, Vienna, Austria, 1997. Springer LNCS 1225. 15. S. Mintchev and V. Getov. Towards portable message passing in Java: Binding MPI. In Proceedings of EuroPVM-MPI, Krakow, Poland, November, 1997. To appear in Springer LNCS. 16. MPI Forum. MPI: A message-passing interface standard. International Journal of Supercomputer Applications, 8(3/4), 1994. 17. PARKBENCH Committe. Parallel kernels and benchmarks home page. http://www.netlib.org/parkbench. 18. PARKBENCH Committe (assembled by R. Hockney and M. Berry). PARKBENCH report - 1: Public international benchmarks for parallel computers. Scienti c Programming, 3(2):101{146, 1994.

This article was processed using the LATEX macro package with LLNCS style

8

Suggest Documents