ISCN: Towards a Distributed Scientific Computing ... - CiteSeerX

4 downloads 16567 Views 86KB Size Report
the desktop the user can select any available machine to run. DYMOKA and choose ... diate results using the Java applet while the DYMOKA is still running on the ... easy to build new applications that reuse existing services because of the full ...
ISCN: Towards a Distributed Scientific Computing Environment Longsong Lin National Yunlin Institute of Technology 123 University Road, Sec. 3 Toliu, Yunlin, Taiwan, ROC [email protected] Mark J. Johnson Swiss Center for Scientific Computing Via Cantonale CH-6928 Manno, Switzerland [email protected]

Karsten M. Decker Swiss Center for Scientific Computing Via Cantonale CH-6928 Manno, Switzerland [email protected] Christophe Domain Electricit´e de France 1 av. du General de Gaulle F-92141 Clamart, France [email protected]

Yves Souffez Electricit´e de France 1 av. du General de Gaulle F-92141 Clamart, France [email protected]

Abstract Based on the vision that the most important component of the next generation of scientific computing environments is not the High-Performance Computer (HPC) itself, but rather a distributed computing infrastructure of national and/or even global scale, it is the goal of the project Interactive Scientific Computing over Networks (ISCN) to conduct feasibility studies and to incrementally prototype a distributed object-oriented framework for scientific computing applications on distributed HPC systems. In the initial stage of the ISCN project, we have implemented a client-server framework supporting simple interactive selection of different remote HPC servers, configurations, and batch queues, interactive access to the running application, even when submitted to batch queues, interactive supervision/steering of the application and immediate visualization of results, and an interactive mechanism to manipulate output data visualization. The communication infrastructure used is the CORBA-based ILU software. The Java language is used to build a portable client that is executed on the scientist’s desktop workstation or PC. As remote HPC servers we have used the NEC SX-4 and a Sun SPARCserver 1000. The scientific application selected to demonstrate our framework is the classical molecular dynamics application package DYMOKA written in FORTRAN. From

the desktop the user can select any available machine to run DYMOKA and choose to launch it on demand or by submission to different job queues, using an interactive panel of the client. The user can interactively change the parameters in the input command file of DYMOKA and visualize intermediate results using the Java applet while the DYMOKA is still running on the HPC server.

1. Introduction A world of totally integrated Information Technology (IT) where everyone and everything plays a part in a global network of seamless information exchange is coming. This world is so appealing to today’s IT users that it will reach almost every area of computing — including scientific computing. Scientific computing is a long way behind business computing, not only in terms of user convenience, but also with respect to economic utilization of High-Performance Computing (HPC) resources, because these issues haven’t been the main priority in the past. It is the goal of the project Interactive Scientific Computing over Networks (ISCN) to prototype the next generation of scientific computing systems — where the basic tool is not the HPC machine, but a distributed computing infras-

Appears in Proc. 3rd Int’l Conf. on High-Performance Computing in the Asia/Pacific Region (HPC-Asia’97, April 28 - May 2, Seoul, Korea), pp. 157-162,  IEEE Computer Society Press, April 1997. [ISBN: 0-8186-7901-8]

tructure (hardware and software) of national and/or even global scale. Our objectives for developing such an environment comprise transparently networked applications to make specialized resources widely available on the desktop, in a portable, transparent and application-driven way, to ease usage of specialized resources (e.g., interactive remote distributed execution control and application supervision, interactive, over-the-network visualization), to ease application development (e.g., reuse of scientific code, interactive, over-the-network visualization of program behavior and debugging and performance optimization), support for heterogenous hardware platforms, and to improve the utilization of specialized resources [7]. Different aspects of such a metacomputing environment [3] such as network services, communication, load-sharing, scheduling, authentication, security and data access are currently also investigated in other projects, for instance [2, 5, 8]. The methods of our research comprise the exploitation of innovative/novel technologies proven in the field of business computing, conducting feasibility studies, and the development of demonstrator applications for evaluation by scientific users.

2. The ISCN Architecture 2.1. Overview The ISCN system is based on a client-server architecture and distributed objects as depicted in Figure 1. An ISCN application consists of a client that runs on the user’s desktop machine, and one or more server objects that run on local or remote HPC machines. The server objects perform the expensive scientific computations and may also perform various stages of post-processing, depending on the computational demands of the visualization and the capabilities of the desktop machine. The client performs control, coordination, visualization, and provides an interactive interface to the user. The larger circles in Figure 1 depict the server objects. These run on HPC platforms and offer computation services to the users. The services provided will be equivalent in scope to the facilities provided today by a typical scientific package. In fact, the computation code inside a server will typically be existing code, for instance, written in FORTRAN or C, previously available as an application package or library. The existing code is embedded in the server by writing a small amount of supporting code to map the application’s native Application Programmer Interface (API) to an ISCN computation server interface. Reusable support code will be available as a toolkit to ease the creation of the servers. Many servers will provide services that are widely applicable within their domain, e.g., fluid dynamics, and will therefore be useful to a large user community. These servers

will be duplicated on several different HPC machines. Other servers will be based on customized code written for specialized applications, and may only be available on one machine. The servers are accessible via interfaces specified in a standardized interface specification language, for instance the CORBA Interface Definition Language IDL [9]. It is easy to build new applications that reuse existing services because of the full encapsulation of the code and its access via a clearly specified interface. Furthermore, it does not matter either what hardware the server is running on, or what language it is written in — these parameters have no effect on other software that will use the servers, as servers are seen purely as objects that can be communicated with via message passing. Any data conversion necessary is carried out by the distributed objects middleware layer that is part of the ISCN software infrastructure. General-purpose servers have well known interfaces that are relatively stable, and are developed incrementally while maintaining backwards compatibility with old versions of the interface as far as possible. The development of the interfaces to specialized servers would probably be closely coupled with the development of the clients that access them. There are two possible life cycles for server objects. They could be created on demand by a object management system (such as a CORBA factory object) and destroyed at the end of the users session. Alternatively, they could be long-lived objects that can be used multiple times in succession by different users, and remain alive all the time, whether they are active (computing) or sleeping (idle). The life cycle used would be whatever is most convienient for integrating existing scientific code into the ISCN server, or alternatively, whatever is most convienient to coexist with the system management practices of the HPC platform provider. It is critical for both types of objects that they should only have significant resources allocated on the HPC system (CPU time and primary memory) when they really need it for computation. This resource usage is supported in the usual way as any program in a multiuser system operates — by blocking waits for messages to arrive if no computation is being performed, and using dynamic memory allocation. The servers are accessed via clients over a local, national, or possibly international network. On the left hand side of Figure 1 we see two clients, one for an industrial user and one for an academic user. The client code follows the distributed objects programming paradigm. This means that distribution issues are largely hidden at the source code level. Local objects, running inside the client on the user’s desktop machine, can communicate with the remote server objects via object proxies. These proxies are local objects that provide the full interface of a remote object, and can receive messages in the usual way from other objects inside the client. When a proxy receives a message, it transparently

Transparent Use of Heterogeneous Hardware

Transparent Use of Local and/or Remote Services FLUID DYN

FLUID DYN

SX-4 FLUID DYN QUANT CHEM

Portable Interactive GUI on Desktop PC or Workstation

Computation Server Objects Wrapping Existing Scientific Code

MOLEC DYN

Distributed Objects Architecture for Easy Client Implementation

Networked National Computing Infrastructure

Figure 1. The ISCN Architecture. sends that message over the network to the remote object which then acts upon the message and passes back any return data as necessary. This means that when the client program is written, it is almost as easy to use remote objects as it is to use normal local objects. In the envisaged client-server environment, the work which requires high-performance computing capacities is performed on the HPC server. Therefore there is no need to tune the client to a particular architecture and some efficiency of the client software can be sacrificed in favor of portability and rapid application development. In the ideal case, the client will be written in a fully portable language with a portable GUI API (e.g., Java, Tcl/Tk), so that it can be used across all popular desktop machines ranging from workstations with a UNIX and X-windows environment to PCs. Rapid application development is possible by building clients from reusable software components implemented using a component object technology such as COM, OpenDoc, or Java Beans.

2.2. Usage In Figure 1 we can also see the architecture of two typical ISCN applications. In the top half of the diagram, we see an industrial user who is working with a general purpose

fluid dynamics application. He may have a general-purpose client for performing this work, or may have a customized client built from reusable components — one for 3D flow visualization, another for 3D object modeling, for example. His application uses only one computation server — a popular general purpose fluid dynamics simulation system, for instance FLUENT or Harwell FLOW3D. This is an example of a coarse-grain application, because the computational work is coarsely distributed, i.e., all on a single server. The service the industrial user needs is duplicated in different places on different architectures, including a machine on his local network (top left). These duplicates of the FLUID SYM service may in fact be different implementations tailored to specific architectures, or different installations of the same fluid dynamics package, but they all have the same interface (syntax and semantics) such that the client is able to use any of them without modification. ISCN applications will be integrated with monitoring, configuration, load balancing, and accounting systems that allow the national computing infrastructure to be utilized seamlessly (these aspects are not depicted in Figure 1). The industrial user in the figure needs to use only one server at a time. Depending on the required performance, what price he wants to pay, available network connection speed, policies of service providers (e.g., access, accounting, security,

utilization), etc., he will automatically be connected to any of the three FLUID DYN servers. In the bottom half of Figure 1 we see an academic user of a more sophisticated ISCN application that is built with two different servers running on different machines. This is an example of a medium-grain application, because the computational work is distributed over a small number of servers. If one of the servers would be a parallel system, the corresponding part of the application would need to be parallelized.

3. The ISCN Demonstrator System To demonstrate the practical capabilities of the proposed ISCN architecture to scientific users, we have realized a first prototype implementation of the system, specifically supporting the industrial application package DYMOKA.

3.1. Demonstrator Application DYMOKA is a classical molecular dynamics application software package based on the CDCMD application developed at the University of Connecticut. With the help of DYMOKA, different kinds of simulations can be performed such as adiabatic, constant pressure, and isothermal molecular dynamics simulations. DYMOKA is written in FORTRAN 77. Besides a sequential version, vector and parallel versions have been developed [1]. Parallelization has been achieved by explicit message passing using the standard Message-Passing Interface(MPI) and emphasis was put on portability and scalability. Currently, DYMOKA’s vector version runs on Cray C90, NEC SX-3 and SX-4 machines, while the parallel version runs on NEC Cenju-3, Cray T3D, NEC SX-4 and Sun SPARCserver 1000 parallel processor systems. DYMOKA is used in the field of applied material science and productively used at Electricit´e de France (EDF) to simulate radiation damage in alloys at the atomic scale by means of high-energy neutrons. Because of its advanced algorithmic features, systems with up to several million of atoms can be simulated. To specify the simulation, DYMOKA uses an input script language which is executed step-by-step by the application. The main output consists of a stream of 3-dimensional coordinates to visualize the crystallographic defects created by the radiation in the initial crystal. DYMOKA qualifies in several aspects to demonstrate our approach under realistic conditions. It is a real application of practical value to scientists. It has been ported and validated on several different HPC platforms. Therefore, to demonstrate our approach on heterogeneous hardware, no additional application software development effort is required. The script-oriented application control used in DYMOKA allows easy encapsulation of the application into an object

thus demonstrating software reuse. Finally, the relatively low volume of graphical output allows to demonstrate immediate visualization of results while keeping the requirements on network bandwidth and advanced data transfer techniques such as compression low.

3.2. Technologies and System Architecture The communication middleware is realized with the Inter-Language Unification ILU [6] which follows the CORBA standard. ILU supports transparent communication across heterogeneous hardware platforms and between different programming languages via its distributed objects architecture. Java is used to build a portable interactive client that runs on the scientist’s desktop workstation or PC and connects to the scientific application running on an HPC platform. The Java client is integrated directly into the ILU platform with Jylu [4], the ILU binding for Java. In the current prototype implementation the user can choose from the desktop client either a Sun SPARCserver 1000 or different NQS job queues on a NEC SX-4 as remote HPC servers. After selection of the HPC server, and appropriate account accreditation, the user launches the preinstalled DYMOKA application either for interactive execution or by submission to different NQS job, using an interactive panel of the client. Objects that perform intensive computations are placed on a remote high-performance compute server, and the results are sent back to the desk-top client, where the user interacts with the objects to practice execution control of the scientific application, to interactively change the parameters in the input command file of DYMOKA and to visualize intermediate results using the Java applet while DYMOKA is still running on the HPC server. The architecture of the ISCN demonstrator application is shown in Figure 2. The operation is as follows: 1. Job servers are installed and started on the HPC servers. 2. The job servers perform a trading export operation, sending their object identifiers to the trader. 3. The user starts up a Java enabled World-Wide Web browser (e.g., Netscape, HotJava) on his desktop machine (workstation or PC). 4. The user goes to the ISCN DYMOKA interface page on the World-Wide Web, and the web browser dynamically downloads the latest version of the ISCN DYMOKA Java applet (which provides the interactive GUI) from the web server.

ISCN TRADER

5

2

1 SPARC 1000E

3

DYMOKA JOBSERVER

WWW BROWSER

6

DYMOKA JAVA APPLET

1 SX-4

8 DYMOKA JOBSERVER

7

9

DYMOKA APPLICATION

4

ISCN WEB SERVER

Figure 2. Architecture of the ISCN Demonstrator Application: ISCN DYMOKA. 5. The user selects an HPC machine to use from a pulldown list of well known names, and the client performs a trading import operation to accredit and fetch the object identifier of a server running on the desired machine. The client creates a proxy object for the server inside the Java runtime environment, and tests the connection to the real server object. 6. The user sets up the simulation script, and when ready to run, presses the “launch” button. This causes the client to request the server to start the DYMOKA application. 7. The server launches the DYMOKA application, either on demand, or via an NQS queue (which may be a prioritized queue). 8. When the DYMOKA application is ready, the client sends the command script to the server, which is passed directly to the DYMOKA application. DYMOKA then starts the simulation and starts writing results into the output pipes opened by the server. 9. The client requests results data from the server for immediate visualization, while DYMOKA continues the simulation.

3.3. Experience The functionality most notable to end users was the high degree of interactivity which is offered by the ISCN pro-

totype. Simple interactive selection of different target systems, configurations, and batch queues, and interactive access to the running application, even when submitted to a batch queue, were considered as major assets. The interactive simulation control, step-by-step execution of the application, the immediate visualization of simulation results, and the interactive mechanism to manipulate output data visualization (e.g., stop/go, step-by-step execution, forward/backward replay of critical sequences of data), were considered a major help to understand the simulation. While the main application of such an environment was envisaged for testing new simulation scenarios, it was also considered useful for supervision of production runs.

4. Conclusions With the simple ISCN demonstrator application described in this paper we have shown that a seamless integration of specialized and heterogeneous HPC resources into a distributed computing infrastructure is feasible, and that technologies proven in the field of business computing can be successfully used for scientific computing: object-oriented programming to support software reuse, CORBA for transparent distributed objects communication in a client-server environment, and Java to provide a portable interactive graphical user interface. Demonstration of the prototype environment to scientists working in the field on leading-edge problems in scientific computing has shown

that even such a relatively simple environment is very appealing and could both make the daily work more effective and can contribute to the economics of high-performance computing. Several important issues of distributed computing environments were not addressed with our demonstrator system and will be the subject of future work. We plan to develop additional portable and reusable objects, together with basic interfaces to facilitate the application development by means of scientific program code encapsulation. Another effort will address the support of flexible real-time visualization capabilities for scientific applications with different degrees of interactivity and support for performance tuning. Finally, security problems encountered in implementing the prototype framework related to the Java browser’s network security need to be overcome such that the client-server computing is secure but remains sufficiently flexible. The long term vision is an environment that facilitates transparent and effective composition of distributed, reusable computing objects into highly responsive clientserver distributed applications equipped with real-time visualization. Such an environment will allow scientists to carry out efficient, high-performance computing via continuous interactions with geographically-distributed compute servers from their desktop workstations or PCs.

5. Acknowledgment We would like to thank Brian Wylie, Vaibhav Deshpande, Frank Zimmermann and James Brunson for their valuable insight and support.

References [1] C. S. Becquart, K. M. Decker, C. Domain, J. Ruste, Y. Souffez, J. C. Turbatte, and J. C. Van Duysen. Massively Parallel Molecular Dynamics Simulations for Metal Irradiation. In Proceedingsof the 3rd International Conference on Computer Simulation of Radiation Effects in Solids (COSIRES, Surrey, United Kingdom, July 22-26, 1996), 1996. To appear in J. Radiation Effects and Defects in Solids. [2] H. Casanova and J. J. Dongarra. NetSolve: A Network Server for Solving Computational Science Problems. In Proceedings of the International Conference on High Performance Computing and Communications (Supercomputing’96, Pittsburgh, PA, USA). ACM/IEEE, Nov. 1996. [ISBN: 0-89791-854-1] Also technical report UTK-CS-96-328. [3] C. Catlett and L. Smarr. Metacomputing. Communications of the ACM, 35:44–52, 1992. [4] Department of Computer Science, Stanford University, Stanford, CA. ILU for JAVA , 1996. http://coho.stanford.edu/ ˜ hassan/Java/Jylu/. [5] I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Super-

[6] [7]

[8]

[9]

computer Applications, 1996. To appear; available from http://www.globus.org. B. Janssen and M. Spreitzer. ILU 2.0alpha8 Reference Manual. Xerox Corp., May 1996. L. Lin, K. M. Decker, M. J. Johnson, and C. Domain. Interactive Scientific Computing over Networks. Technical report, Centro Svizzero di Calcolo Scientifico, CH-6928 Manno, Switzerland, 1996. M. Litzkow, M. Livny, and M. W. Mutka. Condor - A Hunter of Idle Workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems, pages 104– 111, 1988. OMG. The Common Object Request Broker: Architecture and Specification. Technical report, The Object Management Group, Framingham, MA 01701, USA, July 1995. Revision 2.0.

Suggest Documents