Using Java for High Performance Scientific ... - Semantic Scholar

Using Java for High Performance Scientific Computing J. M. Bull and S. D. Telford Edinburgh Parallel Computing Centre, James Clerk Maxwell Building, The King’s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K. email: [email protected]

Abstract Java has many attractions for users of high performance computing facilities, even in traditional scientific fields. This paper examines the potential offered by Java for such applications. We review the hardware and system software requirements for supporting the Java environment, with emphasis on exploiting commodity components. Further to this, we discuss paradigms and implementation techniques for parallel programming in Java, and finally we explore the possibilities for translating existing legacy software to a Java-based system.

1 Introduction Java has yet to make a significant impact in the field of traditional scientific computing. However, there are a number of reasons why it may do in the not too distant future. Perhaps the most obvious benefits are those of portability and ease of software engineering. The former will be particularly important when grid computing comes of age, as a user may not know when they submit it what architecture their job on. The absence of pointers, automatic garbage collection and thorough type checking make Java development significantly less error prone than more traditional languages such as C, C++ and Fortran. Of course, using Java is not without its problems. Perhaps the prime concern for scientific users is performance, though JIT compilers are making rapid advances in this field. Other issues include lack of support for complex numbers and multi-dimensional arrays. In this paper we will consider the opportunities for providing flexible, low cost supercomputing facilities using commodity components coupled with Java technology. Our focus will be on parallel systems—we will not consider the issue of single processor performance here. Section 2 addresses the issues involved in building suitable hardware platforms and the system software they will need to run. Section 3 looks at possible parallel programming models for Java, Section 4 consider the possibility of running existing legacy codes on a Java based system and Section 5 provides a summary.

1

2 Hardware and System Software 2.1 Single vs. Multiple Java Virtual Machines One fundamental characteristic of any Java-based parallel system is the distinction between single virtual machine environments (henceforth referred to as single-VM environments) and multiple virtual machine (multi-VM) environments. A single-VM environment, as the name suggests, provides a single Java virtual machine, running a single multi-threaded Java application, with the Java threads distributed across the processors in the system. A multi-VM environment provides multiple Java VMs, each running a discrete Java application, with some form of inter-VM communication facility allowing interaction between the applications running on different VMs. Each VM in a multi-VM environment may be running on either a single processor or an SMP multiprocessor system and may be running a multi-threaded application. In the most general case, therefore, a multi-VM system can be though of as a collection of multi-threaded Java applications communicating (outwith the Java thread model) via some unspecified mechanism. A single-VM environment is naturally suited to a hardware architecture which provides a single system image, ie. one which presents a single address space to applications, irrespective of the physical distribution or proximity of memory and processors. Such a hardware architecture enables the use of existing Java VM software implementations with little or no modification. In addition, the widely-used and understood Java threads model can be employed in the application and no extra APIs or nonstandard changes to the code are required, assuming there is a sufficient degree of threading to keep all the processors reasonably busy. However, such platforms tend to be difficult to scale past a few hundred processors. For a hardware architecture without a single system image, eg. a classic MPP supercomputer or Beowulf cluster, a multi-VM environment is the obvious choice. However, research projects at the IBM Research Laboratory in Haifa [1] and Rice University, Texas [11] have investigated the design and implementation of Java VMs capable of providing a single-VM environment across such a distributed-memory architecture. This is a complex and ambitious approach and the efficiency of such a VM is uncertain and likely to be highly variable, depending on the behaviour of the running Java application. A multi-VM environment would map Java VMs to separate address spaces (e.g. individual nodes in a Beowulf cluster). This requires the definition of a Java API (such as RMI [22] or the Java Grande Forum’s forthcoming MPJ) for the inter-VM communication which would hopefully map efficiently to the processor interconnect hardware of the chosen platform. As mentioned above, each address space may possibly correspond to multiple processors, thus allowing the use of multi-threaded applications. Again, little or no changes would be required to existing Java VMs, with the possible exception of the addition of an efficient native interface to the interprocessor communication mechanism. In the general case, this gives a two-level parallelism model (threads plus inter-VM communication). While this results in more complex applications software, a multi-VM environment can exploit the greater scalability of non-single system image hardware platforms. Moreover, for reasonably large num-

2

bers of processors, non-single system image platforms need much less sophisticated hardware and are hence simpler and cheaper to build. In summary, there is a trade-off between potential scalability and ease of programming: using the thread model native to the Java programming model will probably limit the scalability of the application. However, the use of another (possibly nonstandard) communications model in addition to, or instead of, threads, as required by a multi-VM environment, may prove awkward.

2.2 Hardware The possible hardware architectures for a parallel Java platform are many and varied. To provide a single-VM environment, existing systems such as the Sun Enterprise 10000 (Starfire) SMP system, WildFire [10] or WildCat clusters, running the standard Java 2 Platform on Solaris are obvious options. However, these clusters presently scale up to only a few hundreds of processors. Multi-VM environments could be implemented on both conventional Solaris-based systems or more unorthodox “thin” architectures. Using conventional (non-WildFire) clustering (e.g. using Ethernet or SCI) of SMP Enterprise systems would potentially allow scaling up to many hundreds of processors, with as many Java VMs as systems in the cluster. It would also be possible to run several Java VMs on each system, although there are no obvious benefits to this configuration. Alternatively, a multi-VM Java environment could hypothetically be provided on a cluster of “thin” nodes running Java VMs over an embedded operating system which provides the necessary communication facilities and other essential environmental features such as network filesystem access etc. The only existing hardware that could serve in this scenario that the author is aware of is the now-discontinued Sun JavaStation and the similar JavaEngine 1 [23] board. While a cluster of JavaStations or JavaEngines could be used as a proof-of-concept prototype, the comparatively poor performance of their 100MHz MicroSPARC IIep processor (with a SPECfp95 benchmark rating of around 2, compared to >60 predicted for the UltraSPARC III) and Fast Ethernet networking would render such a cluster irrelevant from a performance point of view. Looking to the future, Sun’s MAJC [24] processor architecture could provide the basis of a much higher-performance “thin”-node-based multi-VM cluster. The MAJC architecture allows for multiple processors (processor units in MAJC terminology) to be incorporated into a single chip (processor cluster in MAJC terminology) and share memory through a common cache or on-chip coherency bus. It may also be possibly for multiple processor clusters to, in turn, share memory in a conventional SMP fashion, although it is not clear if this will be a supported configuration. In the initial MAJC-5200 implementation [28], a processor cluster will comprise two processor units, although this number will presumably increase in future implementations. Assuming the existence of a MAJC OS capable of providing the necessary facilities and a full Java VM implementation for MAJC, then MAJC-based parallel systems become a very attractive proposition from a performance point of view, if the published performance predictions for MAJC are correct [28]. In the absence of announced MAJC-based products, the author envisages such a system to be designed from scratch at the board level, with many MAJC chips per board. Some form of on3

board (and off-board, for multi-board configurations) inter-chip interconnect would have to be devised, possibly utilising the built in UPA ports included in the MAJC5200 chip. A single Java VM could run across all the processor units in a MAJC processor cluster, with multiple VMs communicating, at the hardware level, over the UPA interconnect. The level of scalability of this type of configuration would very much depend on the design of the interconnect hardware employed, but the 250MHz UPA ports of the MAJC-5200 would provide very good interconnect bandwidth. Both UltraSPARC/Solaris- and MAJC-5200-based systems support the PCI bus, so existing high-performance interconnect technologies such as SCI, Myrinet [29], Gigabit Ethernet [9], Quadrics QsNet [19] etc. available in PCI implementations could also be used for cluster-type multi-VM environments based on these processors. This would be a very cost-effective off-the-shelf solution and give interconnect performance comparable to existing cluster systems.

2.3 System Software For the SMP-based single-VM environment, the system software requirements are largely met by existing software: Solaris and Java 2 Platform would provide a full environment. The same applies to the clustered SMP multi-VM environment, with the additional requirement of some efficient communications API for the cluster interconnect technology employed. “Thin”-node clusters would presumably employ variants of existing lightweight OSs developed for “thin client” and embedded Java environments. An obvious choice for such an OS would be Sun’s JavaOS [21]. The current JavaOS for Consumers 3.0 release is based on ChorusOS [20] and retains many of ChorusOS’s features. As ChorusOS was designed as a scalable distributed OS, it would appear that it (and therefore JavaOS for Consumers) is eminently well-suited for clusters of “thin” nodes. JavaOS for Consumers includes Chorus Distributed IPC, which may be particularly useful for providing an inter-node communications layer on top of which an inter-VM Java API can be implemented. However, JavaOS for Consumers 3.0 is currently only available for IA-32, PowerPC and StrongARM processors. The earlier JavaStation/JavaEngine version of JavaOS (1.x) [13] differs somewhat from JavaOS for Consumers 3.0 in that (from available documentation) it appears it is not based on ChorusOS. It also provides a full Java 1.1 environment whereas JavaOS for Consumers 3.0 (intended for embedded consumer appliance products) only provides the reduced PersonalJava [25] environment. No information was available at the time of writing on proposed or planned OSs for MAJC.

4

3 Parallel Programming Models 3.1 Single-VM Programming Models 3.1.1

Java Threads

For single-VM Java environments, as defined in [27], it is natural and appropriate to use Java threads as the parallel programming paradigm, since the thread class is part of the standard Java libraries. Unlike earlier Java implementations, most current Java VMs implement the thread class on top of the native OS threads (for example, Sun’s JVM for Solaris uses Solaris threads). This means that the threads are visible to the operating system and are capable of being scheduled on different processors, allowing useful parallel execution. However, writing parallel programs using Java threads requires a somewhat different approach from writing threaded programs for concurrent execution on a single processor, which is the more traditional use for the thread class. Firstly, it is important to avoid thread context switching, by ensuring that the number of threads matches the number of processors used. Instead of creating a new thread to run every object capable of parallel execution, it is necessary to create and manage a task pool, where a fixed number of threads execute tasks assigned to them. Creating and killing threads is expensive, so the threads must stay alive for the duration of the parallel program. A further feature necessary for good parallel performance is a fast barrier synchronisation method — no such method is provided by the thread class, and significant care must be taken to design a scalable solution. 3.1.2

OpenMP

Much of the parallelism in scientific applications occurs at the loop level. The most obvious way to exploit parallelism using Java threads is at the method level, which is often not sufficiently scalable. To execute a loop in parallel using Java threads, it is necessary to create a new method containing the loop, compute appropriate loop bounds for each thread, and pass an instance of the method’s class to the task pool manager for execution on each thread. This is a somewhat clumsy approach, which is not unique to Java threads — the same issues are involved in writing shared memory parallel programs using Fortran or C and, for example, the POSIX threads library. However, much of this process is readily automatable using compiler directives. In the past, compiler directives for shared memory parallel programming were usually vendor specific and suffered from a lack of standardisation. However, OpenMP, an open standard for shared memory directives has recently been published [16, 17]. OpenMP has been widely adopted in the industry and is now supported by most of the major vendors, including Cray, Compaq, IBM, Sun and SGI. Currently, OpenMP defines directives for Fortran, and C/C++ only. Recent work at EPCC has also defined and implemented a prototype Java version of OpenMP [12], using Java threads as the underlying parallel model. This raises the possibility of a straightforward and familiar route to using Java for shared memory numerical applications, and the porting to Java of existing Fortran, C, or C++ applications using OpenMP whilst keeping the same parallelism model.

5

3.1.3

Auto-parallelisation

Much research has been directed towards the goal of completely automating the task of porting sequential codes to shared memory parallel architectures using threads libraries. This has proven to be a very difficult task, even when restricted to loop parallelisation, which involves both analysing data dependencies and using loop transformation strategies in order to remove them where possible. Most of this work has been directed at Fortran, and more recently C, but much of the technology can carry over to Java, as illustrated by the prototype compiler javar [4]. Auto-parallelisation has not, however, gained wide acceptance amongst applications developers. The compiler must perforce be conservative in case where dependency analysis indicate possible race conditions, and can therefore fail to parallelise loops which the programmer knows are in fact safe to execute in parallel. The data dependency analysis can struggle to cope with large loop bodies and loops which contain calls to other procedures. Furthermore, the transformations applied are essentially local to each loop, since global optimisation strategies to optimise for data locality are extremely difficult to design.

3.2 Multi-VM Programming Models A multi-VM environment requires some mechanism for inter-VM communication. The three obvious approaches to providing this are: To use or adapt a suitable existing Java API. The only current official Java API which is potentially suitable for this purpose is Java RMI. To adopt an existing, widely-used, non-Java native inter-processor communication library and provide a Java interface to it. Examples of this are Java interfaces to MPI and VIA. To extend the Java programming language to add intrinsic parallelism features. An example of this is JavaParty.

All of these approaches are described more fully below. 3.2.1

Java RMI

Java RMI (Remote Method Invocation) [22] is an API which supports seamless remote method invocation on objects in different Java VMs on a network. This allows the implementation of distributed applications using the Java object model. RMI is primarily intended for use in a client-server paradigm: A server application creates a number of objects; a client application then invokes methods on these objects remotely. This paradigm is not commonly used in high performance scientific applications (except perhaps in cases where there is trivial task parallelism). Most decomposition techniques require the passing of data between processes, which is then processed locally, rather than invoking computation on remote data. While it is possible to write new applications from scratch in this paradigm, porting existing applications is likely to be awkward.

6

Furthermore, RMI suffers from significant performance penalties, resulting in latencies which are likely to prove unacceptable in a high performance context. The internal design of Java RMI is intended to have an abstract protocol-independent underlying transport layer with protocol-specific code in subpackages thus easing the implementation of alternative transports. As of the Java 2 SDK release 1.2 RMI, only a TCP-based transport is implemented. However, provision for the use of other socketbased protocols is included in the RMI API. Due to this dependence on TCP/IP, the current RMI implementation is unsuitable for use in a closely-coupled parallel system. However, in theory, the internal structure of RMI should allow alternative underlying transport layers based on more appropriate interfaces such as MPI or VIA to be implemented. EPCC has recently investigated the feasibility of porting RMI to an MPI-based transport using mpiJava [26]. Unfortunately, in the 1.2 release RMI source code, there are many TCP/IP-centric assumptions made in the allegedly “transport-independent” packages which make such an adaptation infeasible. From communication with the RMI developers, it is understood that future releases of RMI will have these assumption removed. The should make alternative RMI transport implementations more feasible. However, the performance of Sun RMI is still likely to be disappointing, even with a high-performance transport. Both RMI and Java object serialization as currently implemented by Sun are significantly computationally expensive. Alternative third-party implementations of both, such as the University of Karlsruhe’s KaRMI [15] and UKA-Serialization have shown considerable improvements in efficiency, but the use of RMI is always likely to incur significant extra costs compared to using the underlying transport mechanism more directly. Using RMI simply to pass data between multiple VMs also requires the creation of additional threads to manage remote requests for data, incurring context switching and synchronisation overheads. The implicitly blocking nature of remote methods make latency hiding techniques difficult to implement. It therefore seems unlikely that RMI will prove a popular choice for parallel applications. It is worth noting here some other work carried out at University of Karlsruhe to provide a higher level interface to distributed/parallel applications using RMI called JavaParty [18]. This consists of a extension to the Java language (a remote qualifier for objects), a compiler to translate the augmented language to Java with RMI calls, and a run-time system which manages the distribution of the remote objects. Some useful speedups are reported on a geophysics application using this system.

3.3 Message Passing In high-performance computing, the most widely-used standard for inter-process communication is the Message Passing Interface or MPI [8]. This is an open standard for a message-passing library which defines bindings for C and Fortran 77, the later MPI-2 standard also defining C++ and Fortran 90 bindings. Due to its origins as a C/Fortran library, the design of MPI is distinctly non-objectoriented and somewhat alien to the Java environment. However, there have been several research efforts to provide MPI for Java. Due to the reluctance of the MPI Forum to become involved in defining further language bindings for MPI, these efforts are now focussed within the Message Passing Working Group of the Java Grande Forum. A draft specification for a standard Java API for message passing has been published 7

[5] and this is now evolving towards a more Java-centric design, and away from strict MPI conformance. This work, now referred to as MPJ, is still in progress and no further specifications have been released, although some discussion documents have been written [2]. It is worth noting that MPI for Java can (and has) been implemented in two quite different ways: as a Java wrapper around a native MPI library; or entirely in Java. However, to implement entirely in Java requires the use of a core Java API as the underlying communications layer, such as java.net, which in turn implies the use of TCP/IP. Hence, for a closely-coupled parallel system with some fast processor interconnect, only a native MPI implementation is appropriate, for performance and architectural reasons. An example of the “native library wrapper” approach is NPAC’s mpiJava [3], which uses the MPI C++ binding model as a basis and interfaces to several different native MPI implementations. A future possibility would be the implementation of MPI in Java using some non-standard Java API, such as VIA (see below). As an illustration of previous work in parallel Java environments using MPI, a past commercial EPCC project involved the design and implementation of a parallel Java environment on the Hitachi SR2201 MPP supercomputer [30]. This project comprised the following work: Porting JDK 1.1.4 to the Hitachi HARP-1E processor and the HI-UX/MPP (OSF/1 AD) operating system on the SR2201. Integration of an early Java MPI binding (Vladimir Getov’s JavaMPI) and the SR2201 native MPI library into the JDK 1.1.4 Java VM. Design and implementation of a high-level task-farming library in Java, using the JavaMPI API which allows multi-threaded Java applications to be ported to the environment. This library was intended for use in a client-server scenario where a client can send Java classes to the MPP compute server for execution. Development of a demonstrator application.

As with the RMI model, however, this environment is not well suited to traditional scientific applications which do not have a task-farm-like structure.

4 Porting Legacy Codes In this section we consider the possibilities of automatically porting legacy codes (for example Fortran or C) to Java. The main reason for wishing to do this is portability— the resulting byte code can be run anywhere a JVM is available, including systems which lack a Fortran or C compiler. There are three main routes by which we could consider producing Java byte code from legacy source: Source-to-source translation Translate existing source to Java source and compile this to byte code. Direct compilation Emit Java byte code directly from the Fortran/C compiler.

8

Binary translation Compile the Fortran/C to an executable and translate this to byte code. We will consider each of these possibilities in turn.

4.1 Source-to-source translation This raises some difficult issues, because Java is a sufficiently restricted language to make the translation of certain constructs (e.g. GOTOs, pointers) particularly awkward or inefficient. The principal research effort in this area is the f2j project at the University of Tennessee [7], whose aim was to produce a source to source translator with sufficient capability to produce a Java version of the BLAS and LAPACK libraries. We are not aware of any comparable effort to translate C or C++. The f2j compiler takes a reasonably large subset of Fortran as input and produces Java source. Handling of GOTOs proves difficult—not all can be transformed to Java constructs, and have to be implemented by subsequent post-processing at the byte code level. Nevertheless, the feasibility of the approach, at least for Fortran has been clearly demonstrated. Another difficulty of this approach is that, in general, it requires translation of, or provision of interfaces to, all the Fortran/C libraries, including I/O, which is not a trivial problem.

4.2 Direct compilation Instead of emitting source code, it is in fact a simpler proposition to emit Java byte code directly from a compiler. The principal argument for the source-to-source route is performance, as there are greater opportunities for optimisation than at the byte code level, though whether they are usefully exploited by current Java compilers is open to question. The UQBT project [6] at the University of Queensland have produced a modified version of gcc which produces Java assembler, which can then be compiled to byte code using the Java assembly tool Jasmin [14]. The question of how much performance is lost by taking this route remains an open question. The issue of handling libraries is no different from the source-to-source case.

4.3 Binary translation Considerable research has been devoted to the issue of binary translation for conventional instruction sets. While it can be argued that JIT compilation is a special case of binary translation with Java byte code as the source, very little interest has been shown in using byte code as the target. The UQBT project mentioned above have recently added Java byte code to the set of targets which their translation system can emit. Preliminary results suggest that the performance implications may not be as bad as might be expected. The concern is that a single source instruction will translate to many byte code instructions, and the subsequent JIT compilation will be unable to reverse this. However, some very small test programs show performance within a factor of two of the source binary. One advantage of this approach is that (statically linked) libraries can be handled transparently. System calls are more of a challenge, especially if the source and target operating systems differ significantly. 9

5 Conclusions In summary, there are no obvious insurmountable obstacles to the design of a usable Java-based parallel platform. Indeed, there are several alternative approaches which could be taken, depending on considerations such as desired scalability range, hardware design effort available, time-to-market, and whether likely applications are more suited to a single- or multi-VM environment. For single-VM systems, Java’s threads class is entirely suitable as a parallel programming model, provided some additional utility classes to provide efficient scheduling and synchronisation are available. An OpenMP-like extension to Java is feasible, and would provide an interface which is both neater and more familiar to scientific programmers. Auto-parallelisation technology can also be applied to single-VM Java environments, but there is no reason to believe that it will prove any more successful than it has for Fortran or C. For multiple-VM systems the situation is less clear. Java’s distributed communication mechanism, RMI, is clearly the wrong model for parallel computing (no-one in the HPC community uses RPC to do parallel programming in C!) and also suffers from serious efficiency problems. Although performance can be improved by better implementation, this is not likely to be sufficient; if one wishes to use a fast network, latencies will be software dominated. A message passing interface for Java would be a much better solution for parallel programming in a multiple-VM environment. Although interfaces to native MPI libraries are available, standards are lacking and a more object oriented solution such as that currently being considered by the Java Grande Forum would be preferable.

References [1] Yariv Aridor, Michael Factor, and Avi Teperman. cJVM: a Single System Image of a JVM on a Cluster. In Proc. IEEE International Conference on Parallel Processing 1999 (ICPP-99). IEEE, September 1999. [2] Mark Baker and Bryan Carpenter. Thoughts on the structure of an MPJ reference implementation. NPAC at Syracuse University, October 1999. [3] Mark Baker, Bryan Carpenter, Geoffrey Fox, Sung Hoon Ko, and Sang Lim. mpiJava: An Object-Oriented Java interface to MPI. In Proc. International Workshop on Java for Parallel and Distributed Computing, IPPS/SPDP 1999, April 1999. [4] Aart J C Bik, Juan E Villacis, and Dennis B Gannon. javar: A Prototype Java Restructuring Compiler. Concurrency: Practice and Experience, 9(11):1181– 1191, 1997. [5] Bryan Carpenter, Vladimir Getov, Glenn Judd, Ton Skjellum, and Geoffrey Fox. MPI for Java: Position document and draft API specification. Technical Report JGF-TR-3, Java Grande Forum, November 1998. [6] C Cifuentes, M Van Emmerik, D Ung, D Simon and T Waddington. Preliminary Experiences with the Use of the UQBT Binary Translation Framework. Proceedings of the Workshop on Binary Translation, NewPort Beach, Oct 16, 1999. 10

[7] David Doolin, Jack Dongarra and Keith Seymour. JLAPACK—Compileing LAPACK Fortran to Java. Technical Report UT-CS-98-390, Computer Science Dept. University of Tennessee at Knoxville. [8] Message Passing Interface Forum. MPI: A Message-passing Interface Standard. University of Tennessee, Knoxville, version 1.1 edition, June 1995. [9] Gigabit Ethernet Alliance. Gigabit Ethernet: Accelerating the Standard for Speed. Technical report, Gigabit Ethernet Alliance, May 1999. White paper. [10] Erik Hagersten and Michael Koster. WildFire: A scalable path for SMPs. Sun Microsystems, Inc. [11] Y Charlie Hu, Weimin Yu, Alan L Cox, Dan S Wallach, and Willy Zwaenepoel. Run-Time Support for Distributed Sharing in Typed Languages. Dept. of Computer Science, Rice University, November 1999. [12] Mark Kambites. Java OpenMP. EPCC Summer Scholarship Programme Report EPCC-SS99-05, EPCC, 1999. [13] Peter W Madany. JavaOS: A Standalone Java Environment. White paper, Sun Microsystems, Inc., May 1996. [14] Jon Meyer. Jasmin Home Page. http://found.cs.nyu.edu/meyer/jasmin/. [15] Christian Nester, Michael Philippsen, and Bernhard Haumacher. A More Efficient RMI for Java. In Proc. ACM 1999 Java Grande Conference. ACM, June 1999. [16] OpenMP Architecture Review Board. OpenMP C and C++ Application Programming Interface. Technical report, OpenMP Architecture Review Board, October 1998. Version 1.0. [17] OpenMP Architecture Review Board. OpenMP Fortran Application Programming Interface. Technical report, OpenMP Architecture Review Board, November 1999. Version 1.1. [18] Michael Philippsen and Matthias Zenger. JavaParty — Transparent Remote Objects in Java. Concurrency: Practice and Experience, 9(11):1225–1242, 1997. [19] Quadrics Supercomputers World Ltd. QsNet High Performance Interconnect. Available at www.quadrics.com, 1999. [20] Sun Microsystems, Inc. ChorusOS 4.0 Operating System. www.sun.com. Data Sheet.

Available at

[21] Sun Microsystems, Inc. JavaOS for Consumers Software. www.sun.com. Data Sheet.

Available at

[22] Sun Microsystems, Inc. Java Remote Method Invocation Specification. Technical report, Sun Microsystems, Inc., October 1998. Revision 1.50. 11

[23] Sun Microsystems, Inc. JavaEngine 1 OEM Technical Reference Manual. Sun Microsystems, Inc., revision 01 edition, February 1998. [24] Sun Microsystems, Inc. MAJC Architecture Tutorial. Available at www.sun.com, 1999. White Paper. [25] Sun Microsystems, Inc. PersonalJava Application Environment Specification. Technical report, Sun Microsystems, Inc., December 1999. Version 1.1.3. [26] Scott Telford. Porting Java RMI 1.2 to an MPI-1 transport. EPCC internal technical note, September 1999. [27] Scott Telford. Javawulf Study WP1: Report on hardware and software requirements. Technical Report EPCC-JAVAWULF-WP1-REPORT, EPCC, February 2000. [28] Marc Tremblay. MAJC-5200: A VLIW Convergent MPSOC. In Proc. Microprocessor Forum ’99, 1999. [29] VITA Standards Organisation. Myrinet-on-VME Protocol Specification. Technical Report VITA 26-1998, VMEbus International Trade Association Standards Organisation, August 1998. [30] H W Yau, F E Mourao, M I Parsons, S D Telford, M D Westhead, and N Raj. Parallel Java for the Hitachi SR2201. In First UK Workshop on Java for High Performance Network Computing at Euro-Par 98, September 1998. Invited talk.

12

Using Java for High Performance Scientific ... - Semantic Scholar

Using Java for High Performance Scientific ... - Semantic Scholar

Suggest Documents

NINJA: Java for High Performance Numerical ... - Semantic Scholar

High-Performance Linux Cluster Monitoring Using Java

High-Performance Terrain Rendering Using ... - Semantic Scholar

High Performance Distributed Objects using ... - Semantic Scholar

Java Performance Analysis for Scientific Computing

HIGH PERFORMANCE SCIENTIFIC COMPUTING USING ... - CiteSeerX

The Potential of Java for High Performance

Java considered dangerous for high performance ... - CiteSeerX

Compiler Techniques For High Performance ... - Semantic Scholar

Interpretive Performance Prediction for High ... - Semantic Scholar

Compiler Techniques For High Performance ... - Semantic Scholar

HIGH-PERFORMANCE ARCHITECTURE FOR ... - Semantic Scholar

High Performance Frequency Dissemination for ... - Semantic Scholar

Knowledge Extraction for High-Performance ... - Semantic Scholar

Clock Optimization for High-Performance ... - Semantic Scholar

The Opportunity for High-Performance ... - Semantic Scholar

Java CoG Kit Workflow Concepts for Scientific ... - Semantic Scholar

Object-orientation in Java for scientific programmers - Semantic Scholar

High Performance Computing - Semantic Scholar

High Performance Computing - Semantic Scholar

High Performance Computing - Semantic Scholar

X10 for High-Performance Scientific Computing - CiteSeerX

SDN in High Performance Computing for Scientific

Ultra-High Performance, High-Temperature ... - Semantic Scholar