IceT: Distributed Computing and Java 1 IceT: An ... - Semantic Scholar

22 downloads 13907 Views 284KB Size Report
describes IceT, a novel framework for collaborative and high-performance .... The subsequent sections describe several speci c components of Java as they relate to ..... 2] G. A. Geist and V. S. Sunderam, The PVM system: Supercomputer levelĀ ...
IceT: Distributed Computing and Java  Paul A. Gray Vaidy S. Sunderam Emory University Dept. of Mathematics and Comp. Sci. fgray,[email protected]

Submitted to the 1997 ACM Workshop on Java for Science and Engineering Computation Abstract

Numerous factors relating to the massive embracing of the Internet by the general public have clearly in uenced network programming techniques and paradigms. The Java programming language both in uences and is in uenced by the requisite and dynamic aspects of network programming. The IceT project incorporates the advantages of Java with the established techniques of distributed computing, while maintaining a deference to what each discipline is best suited for. The result is a distributed computing environment which has the ability to dynamically merge virtual machines belonging to multiple users; upon which data and processes are uid and highly portable. This paper describes IceT, a novel framework for collaborative and high-performance distributed computing. Examples are provided to demonstrate the suitability of IceT for traditional distributed computing and additionally demonstrate the advantages brought about by the mobility of processes derived from the use of Java which make IceT well suited for collaborative computing.

1 IceT: An Overview New developments in the areas of networks, architectures, processor capacities, programming tools and software design have been the motivating factors behind the development of IceT. IceT builds upon traditional distributed computing techniques and paradigms to include novel aspects which have been brought about by these evolving technologies. One of the more publicized areas of development has been in the area of Internet programming, with a considerable amount of attention given to aspects provided by the Java programming language. Many of the topics that have arisen through channels associated with Internet programming and the Java programming language ow naturally into the realm of distributed computing. The major topics from Internet programming which have been driving forces for the development of IceT have been: What are the consequences and implications of executing processes remotely? ...anonymously? ...securely? ...on machines which the owner of the process has no privileges1. These topics have a slightly orthogonal interpretation in terms of distributed computing. Consider the distinct approaches at issue here, namely Internet programming and high-performance, distributed computing. In terms of Internet programming, the most common scenario for the aforementioned topics has been in regard to applets. Applets are programs which are typically located at a geographically-remote server and are downloaded to users' local graphical Web browsers for subsequent  Research supported by Army Research Oce grant DAAH04-96-1-0083 U. S. Department of Energy Grant No. DEFG05-91ER25105, NASA Grant NAG 2-828 and the National Science Foundation Award Nos. ASC-9214149 and ASC9527186. 1 Lack of login privileges, for example.

execution and interaction. These applets are governed by security measures placed upon it by the local browser. This is, in some sense, reversed in high-performance, distributed computing where users' programs are local, executed semi-locally (within localized intranets), and the topics of \remote execution of processes" and such are not found in the mix of security issues. The aim of IceT is to mutually incorporate approaches and techniques found in Internet programming with distributed computing paradigms and other paradigm-extending ideas. Necessarily, this includes the ideas of harnessing geographically-remote resources for anonymous utilization, portability of processes and data, and the consequential security issues that arise. The combination of respective Internet and distributed programming techniques in IceT leads to a natural environment for collaborative computing and the origination of extended features into the distributed computing environment. Examples of extended features incorporated into IceT are: dynamic merging and splitting of virtual machines, mutliuser awareness, portability of processes and data across multiple virtual machines, and a provisional framework for mutli-user programs. Speci cally, the IceT resource model consists of multiple clusters of virtual environments belonging to multiple users each with distinct levels of security and accessibility. The resulting collocation of resources comprises a mutli-user, multi-level, time-shared virtual machine. This environment provides and facilitates interaction and collaboration amongst variations of users and processes. Fundamental to the process and data model of IceT is the property of transportability. Consequently, native IceT processes and data are uid entities which may be uploaded to remote locations and, in the case of processes, executed upon remote systems without regard for the remote architecture or le system structure. For a more detailed description of the components of IceT, see [5]. Thus, by combining multiple users, their respective resource pools, and allowing portability and uploadability of tasks and data, IceT provides a new genre for distributed computing in areas such as collaborative computing and signi cantly extends the possibilities for numerical computations. In the context of processes, IceT purports the idea of \process spoking." The spoking model of distributed computing is a novel idea which incorporates a signi cant amount of recently-developed tools and ideas and is best described through a comparison with the complementary \resource broker " model or with \remote evaluation "[7]. Conceptually, the spoking model views processes and data as persistent, uid

Processes

Requests for processing are sent to appropriate Resource Brokers

and data are uploaded to resources

Computational Resources

Computational Resources

Figure 1: The Spoking model for distributed computing (left) views processes and data as portable entities to be distributed to the xed computational resources. In contrast, the Resource Broker model (right) views service requests and subsequent responses as the portable entities. entities which are uploaded or ported amongst users and computational resources. Processes uploaded to remote hosts are able to exert independence, manage the subsequent remote or local uploading of additional processes and to maintain persistent communication with multiple external processes. In the 2

area of collaborative computing, the computational resources are those of participating collaborators or are variations of dedicated resources. In resource broker models such as CORBA[9] and Java's RMI[11], processes and data are bound to a particular resource (or resource pool) and requests are omni-directional in the sense that all requests of type \X" are directed to service brokers of type \Y" (Figure 1). In addition, communication is typically established upon completion of the task and is directed solely back to the originating host. In remote evaluation, a process is ported to a remote host for execution, but remote evaluation processes do not possess the persistent properties of dynamic, arbitrary communication nor the ability to supersede the existence of the spawning (master) process. The subsequent portion of this document describes some of the novel features of IceT and the role played by the Java programming language. Details on what is currently implemented and the future of IceT are also presented along with some Java-based examples which serve to further illustrate the novel features currently included in IceT.

2 Java's Bene ts to IceT's Design Buzzwords abound in the description of the Java programming language. Take for example, the following description of Java taken from Sun's white paper on the Java Language Programming Environment[3]: . . . The Java Programming Language Environment provides a portable, interpreted, high performance, simple, object oriented programming language and supporting runtime environment. . . . . . Java must enable the development of secure, high performance, and highly robust applications on multiple platforms in heterogeneous, distributed networks. . . (and) must be architecture neutral, portable and dynamically adaptable. While the depth in which these descriptive characteristics are realized in Java is debatable, many of these attributes can be cited as being bene cial to the IceT design to some degree. Java's objectoriented design, security protocols, multi-threaded nature, and its architectural neutrality led toward rapid prototyping of the IceT design. In addition, the core of the language distribution includes a wide range of basic tools and packages, such as networking protocols and user interface components which are fundamental aspects of the IceT substrate. Admittedly, much of the enhanced and unique abilities of IceT are realized due to aspects of the Java programming language. By introducing portability of processes and imposing security measures readily available by the Java SecurityManager Class, a natural extension to the traditional paradigm of distributed computing environment results. Speci cally, incorporation of these Java attributes into IceT allows creation of facilities for a user to merge his/her resources, anonymously, with remote resources which the user lacks normal access privileges. Upon these resources, aspects of Java directly support the ability for users to port processes (private or publicly owned) and maintain persistent message-passing communication. The subsequent sections describe several speci c components of Java as they relate to the IceT design.

2.1 Bytecode portability and the Java ClassLoader Class

One of the most utilized features of Java in Internet programming has been the portability of Java bytecode in the form of applets. While not the rst language to o er portability, one can assert that Java's popularity is in some part attributable to it's building portability and security aspects into the language. Portability and remote access of Java bytecode are clearly driving forces to Java's overwhelming popularity2. 2 In support of this claim, one need only look at the additional features incorporated into the base Java Development Kit v. 1.1: Remote Method Invocation, Object Serialization and the Java ARchive (JAR) speci cation to name a few.

3

From the IceT perspective, the Java ClassLoader Class is the primary feature of Java which can be used to complement distributed computing. Through the Java ClassLoader, processes represented as Java bytecode can be uploaded, instantiated, and executed on remote hosts where the \owner" of the bytecode has no privileges whatsoever. Arguably, the portability of Java bytecode gives rise to the need for the Java SecurityManager Class and, on a much higher level, the Java Virtual Machine (JVM) itself. One could further argue that Java's omission of some key C/C++ features3 may have been done not only to simplify the language, but to maintain portability of the Java bytecode as well. In IceT, portability of Java bytecode is achieved by extending the ClassLoader class to load, resolve, and instantiate new instances of classes located on remote hosts. Loosely speaking, this component of IceT is similar to that of a web browser which downloads applets from wide-spread resources across the Internet. A user wishing to upload his/her process to a remote IceT host would do so through an IceT ClassBootStrapper. The ClassBootStrapper is responsible for obtaining and resolving bytecode representations of class les to be executed and is con gurable to impose di erent levels of security restrictions to (external) users of the local computational resources through extensions to the Java SecurityManager class (described in the next section).

2.2 The Java SecurityManager

By allowing provisional usage of local resources to possibly anonymous users, signi cant security issues arise. The Java SecurityManager class provides a quick and easy means for imposing elementary security restrictions. By extending the Java SecurityManager, SocketImpl, Thread and File classes, the local IceT environment can be con gured to accommodate varying levels of restrictions placed upon workload usage, external and local socket connections, process portability and le system access.

2.3 Attributes of the Java Virtual Machine

Java is an interpreted language, which means that Java programs must be parsed in some fashion to be executed. \Compiled" Java programs exist in the form of Java bytecode, and execution of a Java program is accomplished through running the bytecode through a Java bytecode interpreter. Java provides a least-common denominator in providing a uniform \Java Virtual Machine" (JVM) for managing the task of interpreting the bytecode. Java's JVM is responsible for veri cation and execution of the Java bytecode, conditional access to system-dependent attributes, and strong ordering of primitive data types. Consequently, the standardized JVM provides a uniform substrate for IceT processes and data. Upon the JVM substrate, processes and data are entirely portable. This provides IceT with the mechanisms needed for stand-alone realization of the features described in the opening section. In addition, this allows IceT to serve as a utility for porting system-dependent processes and data to remote locations for anonymous (or governed) linkage/binding/execution and to serve as the mechanism through which users and processes maintain communication with and awareness of remote processes, data, resources or users. Reiterating, the JVM provides Java bytecode with a uniform, system-independent view of the underlying architecture. With the guarantee of uniformity of the JVM, bytecode is truly architecturally independent. This aspect of the Java programming language allows platform-independent execution of the Java bytecode regardless of the platform which on which the bytecode was generated. This allows any IceT user to execute any IceT process. Consequently, there is potential for IceT process warehouses, where users may access existing programs written for speci c purposes4.

Pointers and direct memory allocation and deallocation, for example. For example, if one required code for solving a large, sparse, diagonally-dominant linear system, such problem-speci c code would possibly be available from a Linear Solvers IceT repository. 3 4

4

3 What Cost of Java? The features associated with the Java programming language come at some costs. One signi cant penalty for using Java, and for which Java has received a majority of criticism, has been Java's process execution speed. As an interpreted language, Java programs can run an order of magnitude slower than their C/C++/FORTRAN counterparts5. In the area of distributed computing, where speed and computational eciency are common benchmarks, Java's lagging speed is its Achilles' heel. However, the constant and public criticism of Java's performance has led to the development of just-in-time Java compilers which are able to translate portions of the Java bytecode into executable machine-dependent code. Another attribute of Java which comes at a cost is Java's dynamic memory management. Java's transparent memory allocation, deallocation, and garbage-collection at runtime is a signi cant feature which allows programmers to concentrate code content. However, constant allocation and deallocation of objects incurs a signi cant performance cost. This penalty in performance shifts the programmer's emphasis to stressing situations where temporary objects may be recycled rather than garbage-collected6. Additionally, despite vast acceptance by the general public, Java is as of yet not available for certain architectures. While the number of architectures running the Java Virtual Machine is growing steadily, these Java-less architectures are completely unavailable for use by IceT.

4 IceT in Action The objective of the IceT project has been to blend the current distributed computing framework with new and evolving technologies; including but not limited to the aspects and features found in Java. This section serves to illustrate the suitability of IceT for traditional distributed computational tasks | in the form of a matrix multiplication example | and to illustrate the novel attributes and collaborative features brought about by the enhanced portability of tasks | through example of a just-in-time installation of an IceTChat Tool.

4.1 IceT and Traditional Distributed Computing

Consider a master program which is faced with the problem of using slave processes to multiply matrices A and B. One way of distributing this problem is to split A by rows and to pass these sub rows of A and the entire matrix B to the slave process for subsequent multiplication. The following shows how one might address this problem (albeit naively) utilizing IceT and the Java programming language.

The Master Program The rst task is to make the master program IceT-aware, this is done by import-ing the IceT-speci c commands. In order to interact with IceT components, the program takes the form of an extension to the TaskProtocols class, which de nes various IceT functions such as send, recv, etc. In this example, the master and four slaves will be responsible for the distributed multiplication. import IceT.*; import java.util.*; public class MatrixMultiply extends TaskProtocols { final static int NUM_SLAVES = 4; final static int NUM_WORKERS = NUM_SLAVES+1;

The program initializes its IceT environment and spawns (described subsequently). 5 6

NUM SLAVE

copies of the

Slave

program

For some preliminary comparisons between native IceT programs and their equivalent C/PVM counterparts, see[4]. Which is, incidently, consistent with the philosophy behind the Java programming language.

5

try { TaskElement myTaskElement = IceT(); TaskElement[] taskArray = new TaskElement[NUM_SLAVES]; int numSpawned = spawn("Slave",NUM_SLAVES,taskArray);

Problem parameters, the slave's portion of A and the entire matrix B are packed and sent to the respective processes. Buffer bufferB,bufferA; int block = (int) SIZE/NUM_WORKERS; /* Pack the entire matrix b */ for (int i = 0; i< SIZE; i++) bufferB.pack( B[i] ); for (int k =0; k