DISCWorld: A Distributed High Performance ... - Semantic Scholar

14 downloads 255470 Views 94KB Size Report
applications as well as enable new distributed programs, and which allows ... coming widespread for commercial and business applications is to use a middleware ... requirement for the end user to be aware the server is a parallel computer.
Proc. HPCN Europe ’98, Amsterdam, 21-23 April 1998, Tech Note DHPC-020.

DISCWorld: A Distributed High Performance Computing Environment K.A.Hawick∗, H.A.James, C.J.Patten, F.A.Vaughan Department of Computer Science, University of Adelaide, SA 5005, Australia November 1997

Abstract An increasing number of science and engineering applications require distributed and parallel computing resources to satisfy user response-time requirements. Distributed science and engineering applications require a high performance “middleware” which will both allow the embedding of legacy applications as well as enable new distributed programs, and which allows the best use of existing and specialised (parallel) computing resources. We are developing a distributed information systems control environment which will meet the needs of a middleware for scientific applications. We describe our DISCWorld system and some of its key attributes. A critical attribute is architecture scalability. We discuss DISCWorld in the context of some existing middleware systems such as CORBA and other distributed computing research systems such as Legion and Globus. Our approach is to embed applications in the middleware as services, which can be chained together. User interfaces are provided in the form of Java Applets downloadable across the World Wide Web. These form a gateway for user-requests to be transmitted into a semi-opaque “cloud” of high-performance resources for distributed execution.

Introduction Distributed computing for scientific and engineering applications presents higher performance requirements than are generally found in many other distributed computing applications. Decision support applications involving computationally intensive simulations or access to very large datasets often need parallel and high performance computing resources. Integrating software for such systems together across distributed networks and in an easily used manner is a challenging problem. A solution to the distributed computing software integration problem that is becoming widespread for commercial and business applications is to use a middleware system. The middleware insulates applications software and applications developers from the idiosyncrasies of the network technologies. There have been a number of attempts at middleware systems such as the Common Object Request Broker ∗ Author for correspondence, Email: [email protected], Fax: +61 8 8303 4366, Tel +61 8 8303 4519.

1

Architecture (CORBA) [1] and Distributed Computing Environment (DCE) [14]. These systems have a growing following and have indeed proved capable of supporting reliable distributed computing applications. Current systems do not however well-address the science and engineering applications market in which computational performance is critical. Existing systems also fail to scale well over very wide area networks and across institutional resource ownership boundaries. In this paper we describe a way of enhancing existing middleware in a sophisticated higher level system that is both scalable and is capable of supporting high performance applications and parallel computing hardware in a readily deployable fashion. In particular we describe the coarse grained architecture of our system, the software technologies we are using for prototype implementations and some of the key research issues we have identified for further work in developing the system. Our Distributed Information Systems Control (DISCWorld) system forms a serverless architecture for a “smart middleware”, providing a software glue for supporting legacy applications by encapsulating them as explicitly named services with well defined interfaces and actions. This approach allows parallel computers running particular applications to provide accelerators for those services, with no special requirement for the end user to be aware the server is a parallel computer. This high-level architecture is shown in figure 1.

DISCWORLD Services Broker/Agent

Storage Services Provider

Compute Services Provider

Communications Infrastructure

WWW Server WWW Browser

End User

Figure 1: DISCWorld architecture of WWW accessible high-performance resources. We have identified a number of key attributes for a distributed, high-performance computing system which are described more fully in [15]. Foremost of these are scalability, performance, and robustness. It is important for our system to be able to recover from nodes becoming temporarily unavailable due to the inevitable problems that occur over a country-wide network. Our motivating reasons for developing such a system arise from the computational 2

resources available across Australia, and which we believe are representative of likely world-wide distributed computing infrastructures. Computations over wide area networks suffer from latency limitations from the speed of light [6]. This is an inherent limitation for the large distances across Australia or across international distances. One way to amortise the effect of latency is for a distributed computing system to exchange only high level service requests across long distances, so that communicating servers can carry out as much computation as possible before transmitting a response. This principle of maximising “action at a distance” is a key factor in the design of DISCWorld. The applications our system is aimed at are those science and engineering applications where a computationally intensive simulation must be run in near interactive time, drawing data from very large data archives, and perhaps fusing data from multiple sources across a wide area. The end result may be a data reduction operation perhaps providing decision support to the end user. Examples include Geographic Information Systems (GIS) [7] and Spatial Information Systems [8], Remote Sensing [5].

The DISCWorld System The coarse grained architecture of the DISCWorld system is shown in figure 1. A key feature is the distinction between high performance server resources inside the “cloud” and low performance resources such as conventional WWW clients outside. Another important aspect is a symmetric relationship between the service providers or server nodes inside the cloud. These are all in principle capable of dealing with a user service request and brokering the sequence of distributed services that executes across the high-performance resources. Key technologies for making this architecture possible are the Java object-oriented programming language and in particular the remote method invocation (RMI) mechanisms that allows remote computing requests to be made by one Java virtual machine to another running on another host. Our current DISCWorld prototypes work in tandem with WWW servers but this is a software development convenience and not a necessity. Figure 2 shows how Java Applets can be embedded on WWW pages to provide a user interface gateway, communicating with a front end for invoking a complex sequence of computations and data archive manipulations. The Java Applications that run on nodes and interoperate with WWW servers can ultimately be replaced with multi-threaded daemon programs that operate at a peer-peer level. To be scalable over wide areas and over resources owned by different institutions, we believe it is vital that a serverless or peer-to-peer architecture be adopted. A fundamental design feature of our system is that each node in our system have equal status in terms of the core software running on it. There is no intrinsic hierarchy between nodes. Nodes do however have specialist service capabilities and for a particular job or sequence of services, a temporary hierarchical relationship may exist between participating nodes. Service based computing is not a new idea, but no other system we are aware of adopts the same very high level of granularity of service request as we use in DISCWorld. These high level service requests are principally targeted at relatively computationally demanding operations on relatively large data entities. DISCWorld is aimed at integrating both high performance computational facilities together with high-capacity and performance storage systems using the best possible bandwidth available. An important architectural concept is shown in figure 3 where

3

Java Application runs

Java Applet

Native Method

Java Application

runs

runs

on

on

WWW Client Host

WWW Server Host

on

Compute Server Host

Java Application runs

Native Method on

Some Other Host...

Figure 2: DISCWorld structure showing communicating Java applets and applications a dual communications infrastructure provides a broadband network capability for data and a possibly lower bandwidth infrastructure for control information. In our system, Asynchronous Transmission Mode (ATM) broadband networking is used locally and across wide areas for bulk data, and the Internet is used for control and service request information. Our ATM and hardware infrastructure is described in [6, 13]. DISCWorld WWW Client

DISCWorld Well-Known URL

Database of Client Applet Code

Internet Network (Control) Storage Server

Compute Server

Storage Server

Compute Server

DISCWorld Dual Network Architecture

Broadband Network (Data)

Figure 3: Dual Connectivity architecture for DISCWorld communications The service approach is a powerful one for embedding existing applications codes into middleware. The underlying service oriented model for DISCWorld is best described by the sequence of events that arise in response to a user request. Consider the following sequence of seven primary events. 1. User connects to System (Any node in the system)

4

2. Domain appropriate interface downloaded (applet downloaded on demand) 3. User submits valid query & Node analyses query 4. Node organises execution and brokers services 5. Node invokes & monitors execution 6. System executes service sequence to satisfy used query 7. Results delivered to user (immediately or deferred) A great amount of detail on the execution model is hidden in this high-level description, and a number of interesting problems arise in fixing the software architecture. Some of these are addressed in [9]. In the present paper we focus on the services model and description. The query submitted by a user can vary widely according the the users application domain with various different levels of user expertise catered for. We are currently focusing on geographic information systems (GIS) applications and in particular operations on geostationary satellite imagery. We have implemented an early WWW prototype interface to an image archive browser and processing scheduler [10] and are using this to develop more sophisticated Java Applet interface modules for DISCWorld service modules for handling images.

Services Model The DISCWorld model is based on communicating services that can be instantiated from appropriate templates to handle the sub-service components of a user request. Services are assembled in proper sequences by the brokering operations of a DISCWorld service access point. Services can be decomposed into component services and hence services can invoke other services. Various toolkits provide a mechanism for applications services to be assembled. Our system design is built around two principle classes of service: processing and storage services. Processing services carry out some computation or processing on the user’s behalf and may range from relatively lightweight services that can be implemented as dynamically loaded Java bytecodes run on almost any participating node in DISCWorld, to computationally demanding operations that must be run as native method code on specialist hardware. For example, we have investigated the use of native code for dense linear algebra using massively parallel processors such as the Connection Machine CM-5 [12]. This code can be invoked remotely as an embedded parallel service from a DISCWorld node. Within the DISCWorld environment, processing services may be a single service, with a concrete implementation, or may be a complex service, which is, in effect, a placeholder name for a pre-defined group of services (which may or may not be on the same server) that is able to be executed. Both types of service are equivalent, each accepting a number of input parameters, and producing one or more forms of output. Some examples of services include: querying an image database by image metadata; simple image operations like cropping or multi-channel image fusing; image reduction operations such as calculating percentage cloud cover. Processing services need not be bound to a particular host, nor need be unique within the DISCWorld environment. Hence, in the case of a frequently used service, exactly the same instantiation of a service may be replicated across many 5

machines, or the same service may be implemented in a different way on a number of heterogeneous machines with different performance characteristics. For example, stencil operations may be performed on a farm of Alpha workstations using a message-passing paradigm, or on a massively parallel CM-5 using HDF [11]. Processing services accept a number of input parameters and produce one or more forms of output. Depending on the service, some of the parameters may contain default values, allowing the user to override them, while other parameters are required to be specified. The required parameters, most often in the form of shared data, are supplied by the use of pipes, which store the data produced by some other service for use as input into a later service. Pipes are available across machine boundaries, and more than one consumer may access a given pipe. Services are side-effect free, and many may be run concurrently, by the same or different users. Storage services are geared towards storing and manipulating large databases of primary data as well as possibly caches of intermediate results and previously computed results. Storage services are also designed to be serverless or symmetric between participating peers. The datasets utilised in science and engineering applications often reach terabytes, and sometimes petabytes, in size. These datasets are often geographically separated from many of the researchers who wish to analyse them, and the computational resources used to do so. Compounding these distances between compute, storage, and the researchers to use them, the datasets themselves are sometimes distributed. Yet this “horizontal” distribution across networks is only one dimension in the distribution matrix. Datasets are often distributed “vertically” in a storage hierarchy of local disks, disk arrays, tape libraries, and other media. Complicating the issue, all of these storage technologies behave differently, and possess different access mechanisms and levels of heterogeneity. The widespread application of those paradigms used in “everyday” data access to science and engineering datasets does not adequately address the complicated issue of distributed and hierarchical data storage and access . Standard file systems, whilst widespread providers of well-known I/O interfaces, fail to address the extraordinary and highly variable requirements of high-performance science and engineering applications. A standard file system interface transparently layered on top of a myriad of distributed and hierarchical storage mediums cannot provide the required adaptability, flexibility, and performance. For these same reasons, explicit staging or prefetching of data is not the solution to this problem either. We are developing a system to provide flexible and high performance storage services to science and engineering applications. Providing distributed and hierarchical storage is the main element of this system. However, unlike existing storage systems, and following the overall DISCWorld design philosophy, this storage system is latency-tolerant, designed for use across wide-area networks. Key attributes and functionality we are focusing on include flexibility, adaptability, scalability, and heterogeneity.

Distributed Systems Technology A number of other software middleware and distributed computing systems have attempted to address the problem requirements described above. Of particular interest are CORBA and DCE as well as the Legion [4], Globus [3] and InfoSpheres [2] research projects.

6

CORBA is an object-based system with much in common with the distributed services approach of DCE. We have evaluated DCE as a possible technology for constructing DISCWorld, but DCE does not satisfy the portability and performance requirements. A number of CORBA products are still emerging from commercial vendors, and with the trader facilities in CORBA 2.0 which allow object brokers to intercommunicate, CORBA is a viable software technology for part of the DISCWorld design, at least at the local area scale. The Legion and Globus projects build on recent distributed computing research and suggest interesting algorithms and approaches for wide area communications between objects in a system like DISCWorld. The multi-channel communications and moving communications end-point mechanisms in Globus are particularly useful. Recent developments with the remote method invocation (RMI) mechanism available in the Java programming system have lead to a number of Java interfaces for CORBA products. The RMI and networking packages available for Java provide a rich set of features for implementing DISCWorld inter peer communications. The InfoSpheres project is also using Java technology for scalable distributed computing, although it is not clear how well such a system will achieve high performance. The agent object ideas employed in the InfoSpheres project may prove a useful mechanism for long distance computing however.

Summary We have described the coarse grained architecture of our DISCWorld system, with a special focus on the service based approach. We have given some examples of processing and storage services and have emphasised the high granularity approach we are adopting. We believe this approach allows an achievable distributed computing system with high performance capabilities suited to science and engineering applications. Our system provides for user access through a WWW interface in the form of a suite of interface applets as well as a toolset for interfacing applications service components together across high-performance wide area networks.

Acknowledgements We thank P.D.Coddington, J.F.Hercus, K.E.Kerry, K.J.Maciunas, J.A.Mathew and A.J.Silis for their enthusiasm and contributions in developing the DISCWorld system. Distributed High Performance Computing Infrastructure (DHPC-I) is a project of the Research Data Networks Cooperative Research Center (RDN CRC) and is managed under the On-Line Data Archives (OLDA) Program of the Advanced Computational Systems CRC. RDN and ACSys are established under the Australian Government’s CRC Program.

References [1] Ron Ben-Natan. CORBA - A Guide to Common Object Request Broker Architecture. McGraw-Hill, 1995. [2] K. Mani Chandy, Anand Chelian, Boris Dimitrov, Zuzana Dobes, John Garnett, Joseph Kiniry, Huy Le, Jacob Mandelson, Matthew Richardson, Adam Rifkin, Eve Schooler, Paolo A.G. Sivilotti, Wesley Tanaka, and Luke Weisman.

7

A New Approach To Collaborative Distributed Computing. CRPC Newsletter, 1996. [3] Ian Foster and Carl Kesselman. Globus: A Meta-computing Infra-structure Toolkit. International Journal of Supercomputer Applications, 1996. [4] Andrew S. Grimsaw and Wm. A. Wulf. Legion – A View From 50,000 Feet. In IEEE, editor, Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing, Los Alamos, California, August 1996. IEEE Computer Society Press. [5] K. A. Hawick and H. A. James. Distributed High-Performance Computation for Remote Sensing. Technical Report DHPC-009, Computer Science Department, University of Adelaide, May 1997. Accepted for Supercomputing 97. [6] K. A. Hawick, H. A. James, K. J. Maciunas, F. A. Vaughan, A. L. Wendelborn, M. Buchhorn, M. Rezny, S. R. Taylor, and M. D. Wilson. An ATM-based Distributed High Performance Computing System. In HPCN, editor, Proceedings HPCN, Vienna, Austria, August 1997. IEEE Computer Society Press. [7] K. A. Hawick, H. A. James, K. J. Maciunas, F. A. Vaughan, A. L. Wendelborn, M. Buchhorn, M. Rezny, S. R. Taylor, and M. D. Wilson. Geographic Information Systems Applications on an ATM-Based Distributed High Performance Computing System. In HPCN, editor, Proceedings HPCN, Vienna, Austria, August 1997. Also DHPC Technical Report DHPC-003. [8] K. A. Hawick, H. A. James, K. J. Maciunas, F. A. Vaughan, A. L. Wendelborn, M. Buchhorn, M. Rezny, S. R. Taylor, and M. D. Wilson. GeostationarySatellite Imagery Applications on Distributed, High-Performance Computing. In HPCAsia, editor, Proceedings HPCAsia, Seoul, Korea, August 1997. Also DHPC Technical Report DHPC-004. [9] K. A. Hawick and F. A. Vaughan. DISCWorld - Distributed Information Systems Cloud of High Performance Computing Resources - Design Discussion Document. DHPC Working Note, March 1997. [10] H. A. James and K. A. Hawick. Eric: A User and Applications Interface to a Distributed Satellite Data Repository. Technical Report DHPC-008, Computer Science Department, University of Adelaide, April 1997. [11] H. A. James, C. J. Patten, and K. A. Hawick. Stencil Methods on Distributed High Performance Computers. DHPC Technical Report DHPC-010, June 1997. [12] K. E. Kerry and K. A. Hawick. Interpolation on Distributed High Performance Computers. Technical Report DHPC-015, Computer Science Department, University of Adelaide, 1997. [13] J. A. Mathew and K. A. Hawick. ATM Performance Characteristics on Distributed High Performance Computers. Technical Report DHPC-016, Computer Science Department, University of Adelaide, 1997. [14] W. Rosenberry, D. Kenney, and G. Fisher. Understanding DCE. O’Reilly & Associates, Inc., 1992. [15] A.J. Silis and K. A. Hawick. World Wide Web Server Technology and Interfaces for Distributed, High-Performance Computing Systems. Technical Report DHPC-017, Computer Science Department, University of Adelaide, August 1997.

8

Suggest Documents