The PROVE Trace Visualisation Tool as a Grid service - CiteSeerX

3 downloads 21102 Views 307KB Size Report
alisation tool from the P-GRADE environment into a stand-alone Grid service. The separation ... The application monitoring subsystem of. P-GRADE consists of ...
The PROVE Trace Visualisation Tool as a Grid service Gergely Sipos and P´eter Kacsuk MTA SZTAKI Computer and Automation Research Institute, Hungarian Academy of Sciences 1618 Budapest P.O. Box 38. sipos, kacsuk @sztaki.hu 



Abstract. This paper introduces the way of separating the PROVE trace visualisation tool from the P-GRADE environment into a stand-alone Grid service. The separation resulted three PROVE implementations: a local service, a servlet based solution and an OGSA (Open Grid Services Architecture) enabled GT3 (Globus Toolkit) Grid service. The paper describes the problems and decisions during the development process of the different versions. Based on our experiences one can see how a local program can be relocated into the Grid as one of the several services the global infrastructure will one day offer.

1 Introduction PROVE [5] is a visualisation tool integrated into the P-GRADE [12] program development environment. The goal of P-GRADE is to support users in every stage of the development and execution process of parallel programs and performance monitoring is an important functionality to achieve this. The application monitoring subsystem of P-GRADE consists of two parts: the GRM monitor infrastructure [10] and the PROVE visualisation tool. While GRM performs trace collection, PROVE is responsible for the data visualisation. PROVE can be separated into two logical parts: a data perser engine and a graphical engine that is tightly connected to the user interface of P-GRADE and performs the presentation itself. In this paper we introduce how this two-layered PROVE could became an individual Grid service. Although the described results show the disintegration of one part of a complex software tool, it shows the general way of converting a well functioning local program into a Grid service. Therefore the conclusions we draw in this work can be generalised for every similar problems. In the next section we overview the role of the integrated PROVE inside P-GRADE. In Section 3 the new tasks of the grid-enabled PROVE service are introduced, while Section 4 gives a detailed overview about the different developed PROVE service implementations, namely the stand-alone application, the servlet based one and the GT3 based solution. Section 5 draws conclusions and outlines future works. 

The work presented in this paper was supported by the Ministry of Education under No. IKTA5-089/2002, the Hungarian Scientific Research Fund No. T042459 and IHM 4671/1/2003.

2 The integrated version of PROVE As it was previously introduced GRM and PROVE are the main parts of the monitoring subsystem of P-GRADE. GRM uses source code instrumentation, hence an instrumentation API belongs to it [10]. Since P-GRADE provides a high level graphical interface that hides every low-level layers, users do not have to know anything about the GRM instrumentation API. The first version of P-GRADE supported job execution on local clusters and supercomputers [6]. In such an environment it can easily set-up the GRM trace collector infrastructure before it starts the parallel program. The infrastructure contains local monitors – one for each node – and one main monitor has to be started on the machine where P-GRADE is located. The instrumented processes through the local monitors can send trace event messages to the main monitor, hence to P-GRADE. The main monitor controls the local monitors – when and where to send their locally buffered trace – and it creates a global trace file from the received data. Since the original PROVE is integrated into P-GRADE it runs on the host where the GRM main monitor does, thus they share the same file system. PROVE can open the global trace file any time and visualise its content from the client’s needed aspect. With the instrumentation API of GRM parallel applications can generate trace data in the Tape/PVM format [9], hence PROVE expects this type of trace files as well. Important to clarify that PROVE does not build onto the GRM infrastructure or its instrumentation API, solely onto the Tape/PVM file format. Someone could use PROVE to visualize trace data has been generated and collected with other tools than the discussed ones if they result a Tape/PVM formatted global trace file. Unfortunately the integrated PROVE version receives necessary starting and control parameters from its wrapper P-GRADE environment so users cannot exploit this independence. When we began the development of the PROVE service one of our motivations was to realise an independent tool.

3 The role of PROVE in grid environments In the past few years P-GRADE has overgrown the boundary of local resources and expanded its functionality into the Grid [7]. At the present P-GRADE supports the execution of PVM and MPI programs in Condor and Globus grids, but we are already working on the next version that will support JGrid [14], a Jini based Grid environment as well. Since P-GRADE uses source code instrumentation based application monitoring in grids too, its trace collector infrastruecture had to be fundamentally changed [1]. While the job executor client can establish the trace collector infrastructure on local resources, in grids this task is impossible. The submitter entity does not know in advance where the processes will run and it cannot have necessary rights to login onto these machines anyway. The new trace collector infrastructure has been already developed in the GridLab project [4]. It consists of two parts: the Mercury monitor service and a main monitor program. Mercury is a software service has to be installed on grid resources and its

tasks are to collect trace from the jobs that run on that resource and to forward the trace to the interested clients. The main monitor is a software component that can register itself at remote Mercury services and create a global trace file from the received data. As it can be seen, in the new structure the local monitors (part of the Mercury service) are started by system administrators instead of job submitters. Grid environments do not place new requirements on the PROVE tool unlike they did on the GRM infrastructure. If the main monitor cooperates with P-GRADE and registers itself at the appropriate remote Mercury services than even the ”non-grid” version of PROVE can visualise the trace. Nevertheless, there are several reasons why we decided to develop a new, stand-alone PROVE version. The most important of them is to separate its function from P-GRADE into an individual service. One advantage of this step has been discussed in the previous section: it makes PROVE independent from P-GRADE in practice. Another important reason is that only the stand-alone PROVE can appear in the Grid as an OGSA [2] enabled Grid service. The following list contains all the aims we wanted to achieve with the stand-alone PROVE: 1. The service has to be able to visualise Tape/PVM formatted trace files. 2. The service must be able to parse trace files being situated anywhere in the network. 3. The result of a service call has to be a picture that illustrates the trace file from the requested point of view. The first goal is inherited from the previous PROVE version. We did not want to change the supported trace file format, since Tape/PVM provides all the necessary data to generate space-time and statistical diagrams of parallel jobs. Based on the time fields of the Tape/PVM event entries the PROVE service is able to sort the data into chronological order and generate a picture. The sorting phase is sometimes very computationintensive so performing it on a powerful PROVE service provider host can lower the load of client machines. The second point of the previous list supposes that some entity collects the trace of a grid-job into a global file the PROVE service provider can access. This entity can be the previously discussed main monitor on the client machine or on a ”trace collector service” provider host. The latter solution is more realistic, since trace files can easily overgrow the storage capacity of clients. Such a service has to provide similar functionality to the remote clients that the main monitor does locally. The last point of the list also lowers the loads of grid clients: instead of large trace files they have to download rather pictures that graphically delineate the huge amount of data. Obviously, much less bandwidth is enough to download such a picture than to download the raw trace. Fig. 1. presents the role and the usage scenario of the PROVE service in grid environments. The scenario supposes that the client submitted the job, the processes already started trace production on the grid resources and there is a service provider host that collects the local traces into a global file. When the client would like to check the actual state of the remotely running application he has to find a PROVE service provider. The detailed way in which this can be done is out of the scope of this paper, but generally it happens through the information system of the Grid. After the client sent the visualisation request to the PROVE provider (1), the provider finds the trace collector and instruct it to collect the actual

Fig. 1. The role of the PROVE service in the Grid

trace (2). The trace collector host finds the job executors and gets the local traces (3). (The trace collector can find the job executors very similarly how the client found the PROVE provider and the PROVE provider found the trace collector.) The most obvious solution to generate the global trace file is to apply the Mercury service on the executor hosts and its corresponding client side main monitor program on the trace collector host. After the trace collector registered itself at the Mercury services that run on the executor machines these providers will automatically forward the local traces. The PROVE provider then can download the global trace and save it into a local buffer (4). Since the visualisation could be requested already during the execution of a job, the results of two calls that request the space-time diagrams of the same job can be different. Because the first part of these trace files are totally the same (the newer file contains the older one), it seems quite logical to store job traces on the PROVE provider host more or less persistently. With this technique the network traffic can be radically reduced because when the second client request arrives the PROVE provider has to download only the locally missing tail of the global file but not the whole file again. After this data synchronisation phase the provider can sort the event entries of the file, generate the requested picture (5) and send it back using a popular image format (6).

4 Our different PROVE service implementations We implemented the introduced PROVE service in 3 different ways: as a client side local service, as a Java servlet and as a GT3 Grid service. While the first implementation provides local service the other two act as real services in grids. Although clients can use the three distinct versions in fundamentally different ways, their cores are the same. This service core can be seen in Fig. 2. The most important part of the structure is the service program. It consists of two layers. The upper layer is written in Java while the bottom one in C. The double-layered architecture enables to exploit the advantages of both languages. While the Java layer provides an ”easy to use” interface for clients and can be consumed from various platforms, the C layer allows more efficient memory allocation. Efficient memory usage is

Fig. 2. The common structure of the PROVE service implementations

a key issue in our implementation because it uses the memory for the previously discussed trace buffering purpose. Using memory buffers instead of file buffers we could achieve significant speed up in the data parsing and image generator phases. The detailed general scenario of the service usage presented in Fig. 2 is the following: after a received client request (1) the Java layer updates the local trace by downloading the missing part of the global trace from its collector host (2). It forwards the downloaded data with native calls to the C layer (3) which merges the new entries and saves them into a memory buffer (4). Then, the C library generates the requested image and saves it as a local file (5). The name of this file has been previously defined by the Java layer, so it can easily convert the file name to a URL and return the address to the client (6). The client then downloads the picture through the service side HTTP server and presents it with an appropriate tool (7). The introduced scenario is almost the same whichever implementation is being used. What mostly differs in them is the Java layer, thus how clients can contact the service.

4.1 PROVE as a local service In the local version of PROVE the Java layer is a stand-alone Java application with a Swing GUI. A client can use this graphical interface directly to interact with the service. Through the graphical front-end one can set the parameters of the trace file to be analysed and can browse the result image. In this case the trace file is usually a local file, but it is not a restriction: even the local version of PROVE can visualise remote trace files that are accessible through web servers. Although this version was developed to demonstrate the correctness of our new service core, we found this version a very useful tool for Tape/PVM formatted trace file visualisation. This stand-alone version can even replace the built in PROVE of PGRADE, since it provides all of its functionalities. The only difference is that the new version presents the result as a static image file while the old one draws the picture onto the screen dynamically.

4.2 PROVE as a Java servlet The second version is the servlet based one. In this case the Java layer is a Java servlet that runs inside a servlet container. The structure of this implementation is presented in Fig. 3.

Fig. 3. The usage of the servlet based PROVE service

Since a servlet container is always contained in a web server, this servlet-based solution does not require the starting of a stand-alone HTTP server: the web server that hosts the servlet container can serve the image download requests as well. In this case the service usage scenario is the following. The client first downloads an adequate web page that contains a Java applet (1). Such a web page could be linked for example into a Grid portal. The applet of the downloaded page acts like a stub during the service consumption: it knows how to communicate with the remote servlet. Besides, the applet provides local service for the client too, since it presents the result trace pictures. When a client sends a visualisation request to the applet through its graphical elements it translates the event into an HTTP request the servlet can understand (2). The servlet receives the message and – together with the C library – updates the locally stored trace data (3, 4, 5) and generates the requested image file (6). (The data update and file generation processes happen in the previously discussed way.) After this the servlet wraps the URL of the generated image into an HTTP response message and sends it back to the applet (7). The applet automatically downloads the file (8) and presents it in the client side browser window (9). As it can be seen in this description the servlet-based PROVE implementation has a big advantage: no special client side program has to be installed, a simple Web browser with a Java plug-in is enough. Because of this prosperous feature we built this PROVE service version into our P-GRADE Portal [12] has been presented during the Super Computing 2003 exhibition, and in the Grid Demo Session of the IEEE International Conference on Cluster Computing 2003.

4.3 PROVE as a Globus Toolkit 3 Grid service The third PROVE service version we developed uses the GT3 [13] framework. Although the OGSI [11] specification does not require the usage of the Java language for Grid service implementation we chose Java to realise the GT3 based PROVE. With this trick we could significantly reduce the development time since the core of the standalone PROVE implementation could be applied again. Although the base structure remains the same, the GT3 framework grants it several extra features neither the stand-alone nor the servlet-based versions are able to provide. Globus services run inside containers that provide – among others – factory functionality for them. These GT3 containers do not automatically create a new service instance for every client call just when it is explicitly requested. In contrast, servlet containers provide stateless connection so they always initiate a new servlet instance for every incoming request [8]. In the present case it means that a new PROVE servlet is generated for every request while a single PROVE GT3 service can serve multiple calls. Since most of the visualisation requests that PROVE has to serve refers to the result of a previous one, this instantiation difference causes significant deviation in the service request protocols. To illustrate it with an example imagine that one would like to zoom to a smaller part of a previously generated trace picture. In this case the new result does refer to the previous one – because the new image is part of the old one – so the request could be described with the old picture: obviously the relative coordinates of the new picture inside the old one are enough. Contrarily, the servlet-based PROVE service needs an exact description about the old picture besides these coordinates too, since the servlet that accepts the second visualisation request knows nothing about the previous image. In case of the servlet-based version we could not eliminate this overhead so we had to develop a complicated protocol that can make the required applet-servlet communication possible. In case of the GT3 based PROVE the GT3 container starts one individual PROVE service per client and this instance serves every call of its owner. With this technique the previously discussed overhead can be saved since a service instance knows every previous result of its client. The usage of the GT3 based PROVE is presented in Fig. 4. As it can be seen in Fig. 4 the Globus Toolkit 3 must be installed on both the client and the server sides. It is necessary since it provides several functions (API, server side container, engines for the communication) have to be used. The client application in this case is a stand-alone program because an applet cannot perform the appropriate usage of the GT3 API. This client application should have graphical interface otherwise it cannot present the result images to the user. Since GT3 is a real grid infrastructure it contains information system as well, it is called Index service. One can get a reference to a factory of PROVE services, to a factory of trace collector services or to a factory of job executor services from an Index service. Fig. 4 supposes that the client already got those references, moreover he instantiated the executor service in the required number and created one instance from the trace collector service as well. After this instantiation the parallel job starts at the selected providers and the trace collector continously saves event data into a global trace file. The scenario of Fig. 4 begins at this point.

Fig. 4. PROVE as a GT3 Grid service

First the client application sends a service instantiation request to the GT3 container of the PROVE service (1). This request – as all others during this scenario – travels between the client and the PROVE provider as a SOAP message transmitted with HTTP protocol. As a response to the request the container creates an instance from the PROVE service (2) and sends back a reference on it to the client (3). The client application now can generate a stub to the newly instantiated service and send a visualisation request to it (4). The service, just like in case of the previous versions, downloads the trace file from its collector (5) and sends it for parsing to the C library (6). The C layer merges the data with the local trace (7) and generates the requested picture (8). The Java service than sends back the URL of the image file (9) what the client application finally downloads and presents in a client side graphical window (10). Another feature of the GT3 framework is the notification infrastructure. In the GT3 version of PROVE this feature is used to make the job observation possible. The integrated version of PROVE can observe parallel applications, which means that it checks the global trace file on a regular basis and automatically performs visualisation when new entries appear. Since now the trace file is situated on the collector host and not on the one that runs PROVE it would be difficult to regularly check the content of this file through the network. Instead of this obvious – but resource consuming – solution we applied the GT3 notification framework. The trace collector sends a notification message to the PROVE host every time when new data arrives from the executor sites. Based on these notification messages the PROVE service always knows whether the locally cached trace is up-to-date or not. If not it downloads the new data, so when a client requests a new visualisation the image can be generated at once. This solution for data actualisation can be applied only in the GT3 version, since using servlets solely the client side can initiate communication with the server the other way is impossible [3].

5 Conclusion and future works This paper introduced the process has been applied to separate PROVE from its wrapper P-GRADE environment. The presented result is our first step toward the long-term goal to separate every function of P-GRADE into stand-alone Grid services. Since we would like to use fully qualified OGSA services to achieve this goal, the GT3 based version of PROVE perfectly fits into this schema. After finishing the full separation process P-GRADE will be able to appear in the Grid as a ”super service” that can control every member of its underlying Grid service set. Another important purpose is to integrate a modified version of the PROVE service into a Jini based grid framework [14]. In such an environment PROVE can appear as a Jini service, and based on its functionality Jini clients can use a flexible application monitor infrastructure.

References 1. Z. Balaton, P. Kacsuk, N. Podhorszki and F. Vajda: From Cluster Monitoring to Grid Monitoring Based on GRM, Proc. of the 7th EuroPar’2001 Parallel Processing, Manchester, UK, 2001, pp. 874–881. 2. I. Foster, C. Kesselman, J. Nick and S. Tuecke: The Phisiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, Globus Project, 2002, www.globus.org/research/papers/ogsa.pdf 3. M. Harding: Servlets: A Technical Discussion, presentation at the Software Forum Java Developers Special Interest Group meeting, Palo Alto, CA, USA, 2/3/1998. 4. GridLab Monitoring work package (WP11). Available at: http://www.gridlab.org/WorkPackages/wp-11/index.html 5. P. Kacsuk: Performance Visualization in the GRADE Parallel Programming Environment, Proc. of the 5th international conference/exhibition on High Performance Computing in Asia-Pacific region (HPC’Asia 2000), Peking, 2000, pp. 446–450. 6. P. Kacsuk, G. D´ozsa and R. Lovas: The GRADE Graphical Parallel Programming Environment, Parallel Program Development for Cluster Computing, Methodology, Tools and Integrated Environments, Nova Science Publishers, 2001, pp. 231–247. 7. P. Kacsuk: Parallel Program Development and Execution in the Grid, Proc. of PARELEC 2002, International conference on parallel computing in electrical engineering. Varsaw, Poland, 2002. pp. 131–138. 8. B. Kurniawan: How Servlet Container Work, available at: http://java.sun.com/products/servlet/docs.html ´ Maillet: Tape/PVM: An Efficient Performance Monitor for PVM Applications. 9. E. User’s guide, LMAC-IMAG, Grenoble, France, 1995. Available at: http://wwwapache.imag.fr/software/tape/manual-tape.ps.gz 10. N. Podhorszki and P. Kacsuk: Design and Implementation of a Distributed Monitor for Semion-line Monitoring of VisualMP Applications, Distributed and Parallel Systems, From Instruction Parallelism to Cluster Computing Cluwer Academic Publishers, 2000, pp. 23–32. 11. S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, and C. Kesselman: Open Grid Service Infrastructure Version 1.0, Global Grid Forum, Draft 4/5/2003, http://www.gridforum.org/ogsi-wg/drafts/draft-ggf-ogsi-gridservice-29 2003-04-05.pdf 12. P-GRADE Graphical Parallel Program Development Environment: http://www.lpds.sztaki.hu/pgrade 13. The Globus Project: http://www.globus.org 14. JGrid project: http://pds.irt.vein.hu/jgrid

Suggest Documents