email: {bubak,malawski}@agh.edu.pl ... for deployment service code on shared resources. ... systems makes it impractical for deploying lightweight services.
A System for Distributed Computing Based on H2O and JXTA Pawel Jurczyk1 , Maciej Golenia1 , Maciej Malawski1,2 , Dawid Kurzyniec2 , Marian Bubak1,3 and Vaidy S. Sunderam2 1
3
Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059, Krak´ ow, Poland 2 Emory University, Atlanta, USA Academic Computer Center CYFRONET, ul. Nawojki 11, 30-950 Krak´ ow,Poland email: {bubak,malawski}@agh.edu.pl
Abstract The main goal of this work is to build a uniform global computational network using H2O distributed computing framework and JXTA P2P technology. This computational network will give users new possibilities in building and utilizing of distributed computing systems, namely H2O kernels behind firewalls will be accessible and group management in JXTA will bring us possibility of creating virtual groups of kernels, which enables dynamic ad-hoc created collaborations. Our current implementation of H2O over JXTA allows user to export kernels having JXTA endpoints. This allows H2O metacomputing applications to seamlessly run across private networks and NATs, by using JXTA as an underlying connection technology. Communication between H2O kernels within JXTA network was made possible by adding a JXTA socket provider to RMIX. JXTA socket factories are used by RMIX to enable remote method invocations in the P2P environment.
1
Introduction
During last few years the distributed resource sharing systems are evolving very fast and many software platforms and toolkits have been developed. The term resource sharing has very broad meaning, so it is often associated with systems ranging from specialized classes of applications, such as SETI@HOME [6], through file-sharing platforms, such as Gnutella, to all-purpose Grid computing toolkits. While some peer-to-peer systems may be decentralized, the Grid toolkits are based on the idea of virtual organizations [10]. Virtual organization is based on coordinated and secure resource sharing not only inside single administrative domains, networks or organizations. Tools that can be classified to this group are well known software toolkits, such as Globus [7] or Globe [18]. Despite of many advantages virtual organizations offer in terms of collaboration between geographically distributed organizations and suitability for grand challenge scientific problems, there are known problems with such approach. The main disadvantage of virtual organization are centralized authentication and resource brokering system, which generally restrict individual users, who probably want to share their resources on a peer-to-peer basis rather than with explicit coordination and centralized services. The H2O framework [14] has been
developed taking into account these considerations. It was assumed that H2O will be lightweight and stateless system based on premise that providers act independently of each other and of clients. It enables self-organizing distributed computing system by eliminating of coordination middleware. However, H2O does not allow users being behind firewalls or NATs to share their resources over the global computer network. We strongly believe this is main restriction met while building P2P metacomputing systems based on H2O architecture. In this paper, we propose an extension to H2O framework that overcomes this problem by using of JXTA [5] P2P network. We propose a complete solution, with two main goals: (1) possibility of sharing H2O kernels over the JXTA network, and (2) discovering H2O kernels that were started with JXTA enabled endpoints. By combining the capabilities of H2O as a general-purpose distributed computing platform, with the P2P discovery and communication mechanisms, we can achieve a powerful P2P metacomputing platform.
2 2.1
H2O - lightweight, self - organizing and scalable resource sharing platform General overview of H2O
The main goal in the design of H2O is to facilitate distributed resource sharing [17]. Particularly in frameworks that span multiple administrative domains, autonomy and loose coupling are crucial to deliver flexible, dynamic and distributed computing platforms. In H2O system abstraction, providers, who are independent from each others, make their resources available over Internet. Each provider starts software component that provides services through remote interfaces. Clients are considered as dynamic set of users who discover, locate and utilize resources. Current systems, like Web Services, assume that services are static and they are deployed by the owners. Even OGSI and WSRF do not specify mechanisms for deployment service code on shared resources. Globus GRAM may be used for this purpose, however its focus on large computational jobs and queueing systems makes it impractical for deploying lightweight services. H2O assumes that services are deployed by authorized clients, and that facilitates reconfiguration of services according to user needs. Important in H2O framework is that, the abstraction of distributed computing platform is pushed as close to client as it is possible [14]. Clients decide how to build metacomputing system and how to distribute workload.
2.2
Kernels, pluglets and component interactions
H2O framework brings two new terms: kernel and pluglet. Kernel is used to describe H2O component container and pluglet denotes H2O component. Before user can access any component from container or deploy new component, he has to authenticate itself at the container. In the system, gate keeper verifies client’s
Providers
Create
Provider host H2O Kernel Pluglet A Pluglet B
Lookup & use
INTERNET
Users
Deploy
Fig. 1: H2O system.
security credentials and returns kernel context, which is used to perform actions on kernel. Users are able to load new pluglets on kernel (if their credentials allow it) or lookup and access pluglets that have already been loaded. After loading or accessing pluglets on kernel, pluglet context is provided and can be used to invoke operations. Containers in H2O framework provide hosted services with facilities for effective and structured communication. That makes developing of pluglets easier and more effective. The system inherits communication capabilities from RMIX [16], which is extension of original RMI model. RMIX provides new features, such as asynchronous calls, one-way calls, dynamic serialization semantics and wire protocol choosing [15]. Additionally, RMIX brings dynamic stubs, runtime binding and customizable virtual endpoints that allow the same remote object to be accessed via different protocols such as SOAP, JRMP, SunRPC, and/or with different access restrictions. This makes H2O distinct from Web Services relying on SOAP or CORBA using IIOP. Developers, however, are not obliged to use RMIX communication protocol in their pluglets. They can decide to communicate using direct socket connections or third-party communication libraries. As examples may serve the support for PVM [11] and MPI [13], which have been adapted for usage on top of H2O platform.
2.3
Security of the platform
Security mechanism of H2O system is located within component containers. The gate keeper is conceptually similar to rsh daemon and verifies client’s se-
curity credentials. Client always accesses kernel and pluglet through security proxies, which perform security decisions based in specified policies specified by the kernel owner. Authentication in H2O framework is defined on terms of JAAS and kernel providers may configure authentication module according to their own preferences. H2O kernel is a secure container, because each pluglet runs in a separate sandbox environment, which prevents it from interfering with other pluglets.
3 3.1
P2P networks and JXTA P2P network implementation Overview of P2P networks
There are many P2P networks currently available. Many of them are used to exchange files but some of them are used for metacomputing. In this section we present both of them showing their strengths and weaknesses. Finally, we present JXTA P2P framework as one of the most innovating project of P2P network. One of the first P2P applications was Napster [4] [12] which represents Peer to Peer network of first generation. It was developed to exchange mp3 files. This application uses hybrid topology with central server for indexing files shared by peers. Every lookup of data causes query that goes to central server. The server answers and transfer of data is started between peers. Napster has one big disadvantage that is single point of failures. Without a central server network doesn’t exist. Next P2P application is Gnutella [1] [12] which is an example of second generation of Peer to Peer networks. It uses HTTP protocol and is decentralized. If peer needs to lookup for other peers or files it sends search packet with max hops value. After the packet riches to another peer its hops value is decremented by 1. If this value riches 0, the packet is not propagated to other peers. Strength of this application lies in its decentralization but besides advantages it has many disadvantages. Peers that have searched files may not receive search packet because of hop limit. Search packets are sent to every known host which causes need of high bandwidth. Search response goes the same path that was initially travelled. Next P2P network worth of mention is Morpheus [3]. It uses FastTrack’s P2P stack and allows file transfer restarts during download. Morpheus is hybrid network because peers are associated with peer hubs that assists in search requests. Search queries are propagated only to hub peer and it process the query. Files are transferred using HTTP in order to reach peers behind firewalls. Recently Morpheus adopted new intelligent search engine which makes it P2P network of third generation. Peer asks closest neighbor about data and this neighbor asks its neighbors that know little more and so on - till the data will be found. The idea to exchange files in Peer to Peer manner inspired people to exchange free cpu cycles and resources. Today there are millions of computers that could be linked and used as one big supercomputer. Good example is project SETI@home [6] which uses free cpu cycles of computers connected to Internet.
Clients download the data from the SETI server and then make needed computation. Finally, resend results back to the server. Unfortunately, this approach has many drawbacks because client cannot decide what he want to download. He cannot interact and change the way the computation will be performed. In real distributed computational engines, users want to interact with their programs. They want to change some parameters on the fly. They want to decide who can start computation on their machines and what programs will be started. One of systems implementing such possibilities is Triana [8]. Triana is written in Java. It uses JXTA network as a peer to peer layer. It is based on Globus and disables some of its authentication capabilities. Triana is very good solution but using Globus is not best suited for lightweight services. We would like to give better solution of this problem by using H2O computational framework and JXTA network. H2O is best suited for lightweight processes and JXTA gives ability to work with H2O kernels in peer to peer manner.
3.2
JXTA project
There are many concepts of peer to peer networks but we have chosen JXTA implementation [9]. Mainly, it is written in Java. It do not need a central server to exist. Peers behind firewalls can communicate with others without any problems. JXTA uses flat addressing which gives ability to disconnect from the network and connect again, even in different place, with the same identity. JXTA messages are based on XML and very important is that, user can create new type of messages that will be sent from one peer to other. JXTA enables to create ad-hoc groups of peers. Peers that do not belong to specific group cannot use services implemented by this group. Finally, JXTA implements JxtaSockets that are sockets abstraction in JXTA network and may be used as ordinary sockets.
4
The concept of distributed computing framework using P2P network
Even though there are some metacomputing or grid frameworks allowing users to share their resources on the peer-to-peer basis, these frameworks are useless when we are thinking of sharing resources that are behind firewalls or NATs. There is need of distributed metacomputing system solving this problem out. The great opportunity can be metacomputing system that will be able to act using P2P network. During last few years P2P networks are evolving very fast and, moreover, P2P frameworks are becoming easier and easier to use. Metacomputing systems that can act in P2P manner have many advantages. This class of systems will allow users to share their resources across private networks and NATs. In other worlds, client from any local network is able to use any shared resource from other private network. It is achieved by using flat
addressing type in P2P systems. Moreover, object address is independent from host IP address. This feature makes possible to share specified resource using the same address independently from its location. These new features in P2P metacomputing framework will give new possibilities in building global computing systems. The most important are: • Simplicity in building computing network (no need of specialized configuration of routers or firewalls) • Wider computing network (users from private networks can share resources) • Clients can use the same resource independently from its location • Ad-hoc collaboration of kernels - virtual computing groups (by using peers group in P2P network) Combining H2O with JXTA leads to a new P2P metacomputing platform, which is more general than existing solutions. There are projects, which are either targeting specific problem, such as SETI, or specific classes of problems. Examples include GPU Project [2], which focuses on Monte Carlo and randomized algorithms, and JNGI [19] for master-worker schemes. Conversely, P2P network formed of H2O kernels will allow deployment and running of arbitrary code and create customized communication topologies. As a computing network becomes wider and wider, we need dedicated system that will give users information about kernels that are currently enabled over the network. To achieve this we can use internal P2P mechanisms. Our system will use an advertising mechanism, which is basis of discovery system. By using advertising, our autodiscovery - called P2P Name Service system - is decentralized and can be independent from any global service. Additionally, P2P Name Service system can form groups of Name Services that will collaborate and exchange information without using advertising mechanism of P2P network. It can increase efficiency of whole P2P Name Service system. Additionally, our discovery system provides network statistics. It monitors network latency from Name Service to all registered kernels over the P2P network giving full information about metacomputing network to users. By using our autodiscovery user can simply get information about currently registered kernels and basing on this information he can decide where to deploy its work to get results as soon as possible. Our solution of metacomputing system using P2P network is based on H2O metacomputing framework and JXTA implementation of P2P. The system is called ”H2O over JXTA” and is presented in this paper.
5
H2O with RMIX JXTA
Main idea of H2O over JXTA system is to use JXTA sockets abstraction as a communication layer in H2O framework. According to RMIX specification, first step of our job was to implement RMIX Socket Factory using new type of sockets. We have implemented JxtaClientSocketFactory and JxtaServerSocketFactory that provide mechanism of JXTA sockets creation. Additionally, we have created JxtaSocketAddress class which represents single address in JXTA
network and JxtaSocketStream class extending SocketTransport which is needed by H2O to create endpoints using our new communication layer.
Providers
Create LAN
Containers
Internet Register Request
JXTA Virtual Network
Response
Register
Request
Response
Fig. 2: H2O over JXTA system.
Figure 2 presents overview of H2O over JXTA system. Providers deploy containers having enabled JXTA endpoint. Clients can simply access and use them. Whole communication between client’s application and kernel, by using JXTA, is done via P2P network. To start kernel that will provide JXTA endpoint, container owner has to modify a configuration file of kernel (KernelConfig.xml) and add following lines: jxta_user:jxta_password
- ’agh.h2o.kernel1’ (used default JXTA net peer group) --> JXTA_kernel_address@JXTA_group
Configuration presented above will start JXTA endpoint with address: JXTA kernel address@JXTA group. After start of kernel, kernel provider will be informed about address of endpoint started in JXTA network. Clients can simply use kernels in P2P network by specifying its address in H2O.login() method. Additionally, RMIX communication framework, as mentioned before, provides many new features and thanks to our JXTA Socket Factories, it has been widened to communication framework that implements simple to use and complex RMI system over P2P network. Now, developers those will decide to use RMIX as their communication framework, can simply get advantage of new class of RMI implementation that can be used across private networks etc. We are sure it can provide a lot of profits to them.
6
Autodiscovery - finding kernels in JXTA network
Users are able to create very wide computational network with a lot of kernels in it by using the H2O over JXTA system. In this kind of environment client should have possibility of being noticed about current situation in network. This kind of information is essential to make decision about deploying client’s work over the kernels in computing system. Although H2O has some resources monitoring facilities [20], some improvements should be performed if we want to get it work in P2P manner. To satisfy this need we have created a mechanism called autodiscovery. It provides a lot of new features those help client to find kernels in JXTA network and to choose set of kernels that will solve client’s computational problem as soon as possible. Basis of discovery system is JXTA Name Service. Each kernel or client’s console can start Name Service to take advantage of discovery system. JXTA Name Service is independent from any centralized system and it can act independently from other JXTA Name Services. It is achieved by using of JXTA advertising mechanism which is responsible for propagation of information over the JXTA. When provider wishes to propagate information about its kernel over the discovery system, service container has to be started with JXTA Name Service enabled. JXTA Name Service will inform JXTA about presence of kernel in the computing system then. When Name Service is started on console side, it gains information found in JXTA advertising system about kernels Alternatively, it can gain information not directly from advertising system, but from other Name Services. JXTA Name Service can be started as stand alone system. It is used then as local information system providing information about kernels to user’s console.
LAN
LAN
Client Console
List of currently registered kernels
Notify about kernels H2O&&NS H2O&&NS
Register
H2O&&NS
LAN
Register H2O&&NS Register
NS
Negotiate
JXTA Advertisements Virtual Network Register
Local NS
H2O&&NS
Fig. 3: Autodiscovery mechanism of H2O over JXTA system.
In this situation, client will have complete information about computing network faster than without using local information system. To deal with overload of advertising mechanism in JXTA, Name Services started in kernels can inform local information service about its presence instead of sending information directly to JXTA advertising. Moreover, to increase flexibility of autodiscovery, each kind of JXTA Name Service (started on kernel side, console side or stand alone) can collaborate with defined set of other JXTA Name Services and exchange information without using advertising mechanism. Figure 3 presents overview of autodiscovery system proposed by us. Figure 4 presents some usage scenarios of discovery system. Name Service system can be used without any centralized units (as shown in 4(a) scenario). There are following steps of information flow over the network: 1. Provider A starts kernel with JXTA name service enabled. Name service informs (using advertising protocol) JXTA about presence of kernel. 2. Client starts GUI with JXTA name service enabled. Name service gets information about Kernel A (directly from JXTA network). Information is provided to user. 3. Provider B starts kernel with JXTA name service enabled. Name service informs (using advertising protocol) JXTA about presence of kernel. 4. JXTA name service started in client’s console gets information about Kernel B (directly from JXTA network). Information is provided to user. Another way to use JXTA Name Service system is presented in 4(b). It assumes that there has already been started stand alone JXTA name service. Steps of this usage scenario are: 1. Provider A starts kernel with JXTA name service enabled. Name service informs (without usage of advertising mechanism of JXTA) local stand
1
1
H2O
NS
3
H2O
Provider A
Provider A
JXTA
3
JXTA 2
4
2, 4
H2O
H2O
Provider B
GUI
Provider B
GUI
Client
Client
(a) Discovery system without stand alone Name Service (only advertising mechanism of JXTA is used)
(b) Discovery system using stand alone Name Service
2
1 H2O Provider A
NS
JXTA
5
4
H2O Provider B
3
GUI Client A
GUI Client B
(c) Discovery system using stand alone Name Service (one of kernel uses JXTA sockets to register onto stand alone Name Service)
Fig. 4: Example usage scenarios of discovery system
alone JXTA name service about presence of Kernel A. 2. Provider B starts kernel with JXTA name service enabled. Name service informs (without usage of advertising mechanism of JXTA) local stand alone JXTA name service about presence of Kernel B. 3. Local stand alone JXTA name service informs JXTA about kernels being registered in it (Advertising mechanism of JXTA is used). 4. Client starts GUI with JXTA name service enabled. Name service gets information about Kernel A and Kernel B (directly from JXTA network). Information is provided to user.
Usage scenario presented in Figure 4(c) assumes following steps (local stand alone JXTA name service is started): 1. Provider A starts kernel with JXTA name service enabled. Name service informs (using advertising protocol) JXTA about presence of kernel. 2. Stand alone JXTA name service gets information about Kernel A (directly from JXTA network). 3. Provider B starts kernel with JXTA name service enabled. Name service informs (without usage of advertising mechanism of JXTA) remote stand alone JXTA name service about presence of Kernel B (JXTA sockets are used). 4. Client A starts GUI with JXTA name service enabled. Name service gets information about Kernel A and Kernel B from local stand alone JXTA name service. Information is provided to user. 5. Client B starts GUI with JXTA name service enabled. Name service gets information about Kernel A and Kernel B from local stand alone JXTA name service. Information is provided to user. By using of our autodiscovery system the client can get advantage of additional information about kernels which is the network delay. Each JXTA name service can monitor network delay from itself to kernel. This information will help to choose the best set of available kernels in computing network from client’s point of view. Monitoring system is based on simple send/receive communication delay measurement. Name service connects to the kernel and sends request. After kernel will get request, it responses with simple ’HALO’ token and when name service will get this response, it can estimate network time delay to requested kernel. Then estimated time delay is provided to the client.
7
Summary
In this paper we have presented the approach of new class of computational system that can get advantage of P2P network. Our implementation of this system is based on H2O framework and JXTA P2P network. At the present time, we have finished H2O adaptation to P2P network. Our current and future work is focused on developing of JXTA Name Service system. We plan to create a service that holds information about currently registered kernels from local and JXTA network. Next, we will elaborate a mechanism of measurement of delay times in network from Name Service to H2O kernels. Acknowledgement This research was partially funded by the CoreGRID EU IST Project.
References 1. Gnutella File Sharing Network. http://www.gnutella.org. 2. GPU project. http://gpu.sourceforge.net/. 3. Morpheus File Sharing System. http://www.morpheus.com/.
4. NAPSTER Music Service. http://www.napster.com. 5. Project JXTA. http://www.jxta.org. 6. SETI@home: Search for extraterrestrial intelligence at home. http://setiathome.ssl.berkeley.edu/. 7. The Globus Alliance Website. http://www.globus.org/. 8. The Triana Project. http://www.trianacode.org/. 9. Daniel Brookshier, Navaneeth Krishnan, Darren Govoni, and Juan Carlos Soto. JXTA: Java P2P Programming. Available at: http://java.sun.com/developer/Books/networking/jxta/. Sams, 2002. 10. Ian Foster, Carl Kesselman, and Steven Tuecke. The anatomy of the Grid: Enabling scalable virtual organizations. International J. Supercomputer Applications, 15(3), 2001. 11. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine: A Users Guide and Tutorial for Networked Parallel Computing. Scientific and engineering computation. MIT Press, Cambridge, MA, USA, 1994. 12. Joe Gradecki. Mastering JXTA: Building Java Peer-to-Peer Applications. Wiley, 2002. 13. William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. Highperformance, portable implementation of the MPI Message Passing Interface Standard. Parallel Computing, 22(6):789–828, 1996. 14. D. Kurzyniec, T. Wrzosek, D. Drzewiecki, and V. S. Sunderam. Towards SelfOrganizing Distributed Computing Frameworks: The H2O Approach. Parallel Processing Letters, 13(2), 2003. 15. Dawid Kurzyniec and Vaidy S. Sunderam. Semantic aspects of asynchronous RMI: The RMIX approach. In Proc. of 6th International Workshop on Java for Parallel and Distributed Computing, in conjunction with IPDPS 2004, Santa Fe, New Mexico, USA, April 2004. IEEE Computer Society. 16. Dawid Kurzyniec, Tomasz Wrzosek, Vaidy Sunderam, and Aleksander Slomi´ nski. RMIX: A multiprotocol RMI framework for java. In Proc. of the International Parallel and Distributed Processing Symposium (IPDPS’03), pages 140–146, Nice, France, April 2003. IEEE Computer Society. 17. Vaidy Sunderam and Dawid Kurzyniec. Lightweight self-organizing frameworks for metacomputing. In The 11th International Symposium on High Performance Distributed Computing, Edinburgh, Scotland, July 2002. 18. M. van Steen, P. Homburg, and A. Tanenbaum. Globe: A wide-area distributed system. IEEE Concurrency, 7(1):70–78, Jan-Mar 1999. 19. Jerome Verbeke, Neelakanth Nadgir, Greg Ruetsch, and Ilya Sharapov. Framework for peer-to-peer distributed computing in a heterogeneous, decentralized environment. Lecture Notes in Computer Science, 2536, 2002. 20. Tomasz Wrzosek, Dawid Kurzyniec, Dominik Drzewiecki, and Vaidy Sunderam. Resource monitoring and management in metacomputing environments. In The 10th European PVM/MPI User’s Group Meeting, Venice, Italy, September 2003. Springer-Verlag.