Document not found! Please try again

Evaluating Tools for Performance Modeling of Grid ... - Ldc Usb

0 downloads 0 Views 290KB Size Report
algorithms, data structures, compiler options, etc. and then, observe the results. .... tion of a program in SUMA through the services suma Execute (on-line ...
Evaluating Tools for Performance Modeling of Grid Applications Mariela Curiel? , Gustavo Alvarez, and Leonardo Flores Universidad Sim´ on Bol´ıvar, Departamento de Computaci´ on y Tecnolog´ıa de la Informaci´ on, Apartado 89000, Caracas 1080-A, Venezuela [email protected],{gjalvarez, floresm.leonardo}@gmail.com

Abstract. A Grid is a collection of heterogeneous distributed computing resources for solving large-scale computational and data intensive problems. It is a dynamic environment where resources attributes -such as load- change constantly hindering performance evaluation activities. Performance models could be a solution to this problem because they provide a way of performing repeatable and controllable experiments. Several tools have been developed for modeling scheduling algorithms in Grids. We believe, however, that if these tools are to be used for modeling application performance they should be improved by adding some particular features. In this paper, we identify such features and evaluate two modeling tools based on those features. These tools are used to represent the execution of applications in the Grid SUMA.

1

Introduction

A Grid is a collection of heterogeneous and geographically-dispersed computing resources connected by a network, possibly at different sites and organizations. The Grid middleware provides transparent access to resources, and in general it deals with the physical characteristics of the Grid. Grids have dynamic nature, i.e., some performance characteristics, such as load, may change over time due to the fact that resources are shared by other applications. This behavior causes performance degradation and makes it difficult to evaluate performance. Application performance analysis is crucial to obtain high performance. Although some factors such as network load or bad scheduling decisions may cause problems, one of the main source of poor performance comes from wrong design decisions. One can repair the problems after finishing the application development (tuning) or during the software development process. In the first case, one modifies algorithms, data structures, compiler options, etc. and then, observe the results. Additional runs should be done in similar conditions in order to evaluate the effects of each change. However, it is impossible to get repeatable results in Grid experiments. Fortunately, modeling allows us to count on a controllable experimentation environment. Additionally, one can use a model to design applications ?

This work is partially supported by FONACIT, project S1-2002000560

2

Curiel, Alvarez, Flores

whose performance is tailored to the dynamic Grid nature. To construct a representative application model may imply modeling the Grid middleware. Some simulation packages have been developed for modeling Grid scheduling strategies. The objective of this research is to evaluate different tools for modeling Grid applications. The idea is to show drawbacks and strengths that can be useful to improve existing tools or to develop new ones. In this work we start by choosing tools from both Client/Server, distributed applications domain (LQNM analytical [1] and simulation solvers [2]) and Grid domain (GridSim [3]). The tools will be used to model sequential and parallel applications of the Grid SUMA ([4]).

2

Desirable Characteristics in a Modeling Tool

In this Section we present a list of desirable features to support application performance modeling and analysis. This is not an exhaustive list, and it can be enriched along the research. We claim that Grid modeling tools should: 1. Offer capabilities for constructing a representative Grid model: It means to provide facilities for modeling basic Grid elements: (a) Network topology, which includes different graphs, bandwidth and latencies. (b) Compute resources, memories, storage resources and other kind of resources. (c) Aspects related to the dynamic nature of the Grid, i.e. load and unavailability of the resources. (d) Middleware layers. (e) Other particular Grid characteristics such as reservation of resources or location in any time zone. 2. Offer capabilities for easily constructing a representative application model: It includes the possibility of modeling: (a) Simple sequential applications that only need especial resources or services. (b) Different models of distributed and parallel applications. (c) Stochastic behavior. In regards to (b), [5] classifies parallel Grid applications in four groups: loosely coupled (compute intensive with low memory requirements, small amount of data per task and little communication between tasks), pipelined (very memory and data intensive, with coarse-grained inter-task communication), tightly synchronized (frequent inter-task synchronization, significant computation and memory/data usage) and widely distributed (update and/or unify distributed data bases, they have small computational, data and memory requirements) 3. Offer enough metrics for performance analysis: It includes classical metrics such as response times, waiting times, number of I/O operations, number of network communications, bytes transferred in I/O or network communications, residency times, resource utilizations and new Grid metrics. Resultant data should be provided in text format and by means of graphical tools. 4. Be easy to use: Tools should offer wide and clear documentation, user support and user-friendly interfaces. Additionally, it could be interesting to provide high level models closer to the application developers (UML diagrams, MSC, etc.) as well as the algorithms to transform them into performance models. 5. Be efficient: The dynamics of a Grid is complex. It is possible to find NP-complete problems in, for example, routing or scheduling strategies that cannot be treated by analytical techniques. However, diverse kind of programs

Evaluating Tools for Performance Modeling of Grid Applications

3

run on the Grid. Some of them could be very simple, as for example to request a cluster for running a parallel rigid application. In these cases an analytical model could be enough. So, it is recommendable to provide diverse methods to solve a variety of problems. When simulation is the only option, solutions such as parallel simulations should be explored.

3

Selected Tools

Our final goal is to discover the presence or absence of the mentioned characteristics in an important number of modeling tools (mainly oriented to Grid and distributed systems modeling). In order to review each tool exhaustively, it is necessary its installation and use. Due to a lack of time, we start by evaluating a reduced number of tools. BeoSim, Bricks, SimGrid, GridSim, ChicSim and OptorSim are popular simulation tools frequently referenced in Grid related bibliography (see references in [6]). BeoSim, Bricks, ChicSim were discarded because they are not currently available for users. For the time being, we ruled OptorSim out because it is designed for data Grid modeling and we want to model a computational Grid. Although LQNM analytical and simulation solvers are oriented to Client/Server applications they were chose by the possibility of doing analytical models. Between GridSim and SimGrid we first took Gridsim for two main reasons: GridSim is Java based and it apparently has more capacities for Grid modeling. SimGrid, OptorSim and other simulation tools will be subsequently evaluated. The next paragraphs explain in detail characteristics of LQNM solvers and GridSim. Layered Queuing Network Models are QNM extended to reflect interactions between client and server processes. We choose LQNM (lqns version 3) by the following reasons : 1) The application model can be easily constructed by non experts in performance evaluation: the tools are embedded in a SPE methodology that derives Layered Queuing Network Models (LQNM) from systems scenarios described by means of Use Case Maps (UCM); free software packages are available for process automation under this methodology. 2) Models can be solved by simulation or analytical techniques: The Layered Queuing Network Solver (LQNS) solves the model analytically, whereas the ParaSol Stochastic Rendez-Vous Network Simulator (ParaSRVN) use the simulation technique. An LQNM can be represented by a graph with nodes for Tasks and Devices, and arrows for service requests. A Task (parallelograms in figure 1) is a software object that has its own thread of execution. Tasks are divided into three categories: Client Tasks (only send requests), Active Server Tasks (can receive and send requests) and Pure Server Tasks (only receive requests). There are three types of interactions between Tasks: asynchronous messages, synchronous interactions and forwarding messages. In a forward call, the sending Client Task makes a synchronous call and blocks until it receives a reply. The receiving Task partially processes the call and then forwards it to another server, which becomes responsible for sending a reply to the blocked Client Task. Tasks receive any kind of request message in points called Entries (smaller parallelograms in

4

Curiel, Alvarez, Flores

figure 1). A Task has a different Entry for every kind of service it provides. Internally, an Entry could be composed by sequences of smaller computational blocks called Activities (rectangles), which are related in sequence, loop, parallel configurations, etc. GridSim [3] is a toolkit for modeling and simulation of heterogeneous resources, users, applications, brokers and schedulers in a Grid computing environment. GridSim was chosen because of the following reasons: 1) It is one of the most popular simulation tool for Grid research. 2) It is based on Java, which is a popular language. 3) GridSim is an active project. 4) According to the documentation GridSim has interesting features for modeling Grid environments, for example: (a) it allows modeling of heterogeneous types of resources operating under space- or time-shared mode; (b) resources can be located in any time zone and booked for in advance; (c) different parallel application models can be simulated. GridSim adopts the multi-layered design architecture. The first bottom layer is the Java interface and the JVM. The second layer is SimJava, which provides an event-driven discrete event simulation package on top of JVM to drive the simulation for GridSim. The third layer is the GridSim toolkit that provides the modeling and simulation of Core Grid entities, such as resources and Grid Information Services, using the events of the second layer. The simulation of resource aggregators called Grid brokers or schedulers is provided by the fourth layer. The top layer focuses on application and resource modeling with different scenarios to evaluate scheduling and resource management policies, heuristics and algorithms. Applications in GridSim are modeled as a number of work packets that are called Gridlets. A Gridlet is a package that contains all the information related to the job and its execution management details, such as job length expressed in MIPS (or SPECs), disk I/O operations, the size of input and output files, and the job originator. Grid resources, users and brokers are modeled as Entities, and they communicate via messaging events using the advanced network features. Synchronous and asynchronous messages are allowed.

4

SUMA

SUMA (Scientific Ubiquitous Metacomputing Architecture) [4] is a computational Grid that transparently executes Java bytecode on remote machines, with additional support for scientific computing development. SUMA provides access to both single-process (possibly multi-threaded) and parallel execution agents, according to the JVM and mpiJava execution model. A user invokes the execution of a program in SUMA through the services suma Execute (on-line execution mode) or suma Submit (off-line execution). We modeled only scenarios associated to suma execute. The steps of the suma execute services follow: When a Client wants to send an execution request to SUMA, it firstly must find a Proxy, by invoking the findProxy method in Scheduler. The Scheduler finds an appropriate Proxy and returns a CORBA reference to the Client. Then the Client invokes the execute method in its Proxy, passing the name of the main class as a

Evaluating Tools for Performance Modeling of Grid Applications

5

parameter. This Proxy invokes user authentication methods in User Control and asks for a suitable Execution Agent in Scheduler, getting a CORBA reference for the Execution Agent. Execution Agents run on servers, tipically high performance machines, and execute user applications. Then the Proxy invokes execute method in the selected Execution Agent, passing all necessary information for the Execution Agent to start loading applications classes and files; this is done by invoking appropriate methods directly in the Client. As a result of the execute method invocation, the Execution Agent starts a Slave (a new Virtual Machine) and obtains its CORBA reference. This reference is sent back to the Proxy and subsequently to the Client. The Client uses this reference to open a connection with the Slave in the Execution Agent. Once the connection is established, the Client orders the execution of the application. When the execution finishes, the Execution Agent sends output files to the Client. Finally, the Execution Agent executes releaseNode method in the Scheduler, indicating it is available again. The execution model of parallel applications is similar with an important difference: In the execution of sequential applications, classes and data are loaded/sent directly from/to the Client. In parallel applications, only the Execution Agent loads classes from the Client. Slaves addresses classes requests to the Execution Agent. I/O operations are directly performed between Clients and Slaves.

5

LQNM of SUMA Applications

Figures 1 and 2 show LQN models of the execution of sequential and parallel applications. The sequential model has seven layers. There are Tasks in each layer that represent the main SUMA components (Client, SUMA Core and Execution Agent) and the network between each pair of components. Tasks contain Entry points and Activities associated to the main functions of the components. The SUMA Core, for example, finds a suitable Proxy (findProxy Activity), verifies User identity (VerifyUser Activity), requests an Execution Agent (RequestEA Activity) and requests the execution of the application (RequestExecution Activity). In the Execution Agent one can observe the Activities related to the execution of the application: There is a loop (represented with a circle between StartExecution Activity and EndExecution Activity), where the JVM executes pieces of computation (Execution) followed by a network operation for either loading Classes (ClassLoading) or executing I/O operations (I/O). The four top layers of the parallel model are similar to the sequential layers. The represented parallel application runs into two nodes: (Tasks PN1 and PN2). The Execution Agent creates Slaves in each parallel node (Activities: a1 and a2), starts their execution and receives class requests from Slaves (CV Activity ). The code executed by each parallel Slave is modeled as a loop with the following kind of activities: some instructions (Execution Activity) and a network operation for loading classes (ClassLoading Activity), communicating with another parallel node (ComPNi Activity) or executing I/O operations (I/O Activity).

6

5.1

Curiel, Alvarez, Flores

Model Development, Parameterization and Validation

The main problems during the model development follow. 1) Too many layers: Both models have at least seven layers. It was necessary to duplicate the Client in both models because the Client executes a synchronous call (the execute command) and it cannot respond requests from the Execution Agent. The Client and its clone run on the same processor. Using asynchronous calls we avoid blocking the Client, but it is impossible to get the application response time (you cannot insert instructions to get starting and ending times). Another possibility is to use forwarding calls. However, these calls are only allowed between Entries, making the Activities useless. We prefer to use Activities because they allow us to clearly represent different functions of SUMA components. There are many copies of the Network Task for similar reasons. On the other hand, some single calls were not explicitly represented for avoiding new increments in the number of layers: for example, the callback between the Execution Agent and the SUMA Core to inform the execution end (released node method) would require a duplicated of the SUMA Core. 2) There is not an explicit network model. 3) It was difficult to generate the models automatically: Tools could still have some problems for generating complex systems models. However, the construction of the model was very easy starting from the UCMs and using the method proposed by Petriu in [7]. 4) LQNM are not suitable for modeling parallel applications: Because of its complexity, the parallel model cannot be solved by analytical methods. On the other hand, the code of each parallel node must be written explicitly, which can be tedious when there is a large number of nodes. Model parameters were obtained from Java Grande Forum benchmarks: JGFCryptBench, JGFHeapSortBench, JGFSeriesBench and JGFSparseMatmult (Sizes A and B). We conducted experiments by running SUMA modules in three machines with the following characteristics: Pentium III dual processors, 666 MHz with 504Mb of RAM memory, connected by a LAN (10baseT Ethernet). We ran each benchmark ten times without having interference from other applications. The parameterization process was very easy. Model parameters are expressed in time units, so most of them can be obtained directly from the SUMA monitoring tool. However, we have to have the application for running it and obtaining the parameters. SUMA monitoring tools also offer data about SUMA core components. After parameterizing the model we solved it with the LQNM Solver and with ParaSRVN. Sequential models produced errors below 11.8%. The prediction error in a parallel application was 18 %. Parallel model predictions can be improved by enhancing the input parameters.

6

GridSim Model of SUMA Applications

We use GridSim 4.0 to build the first GridSim models for sequential and parallel applications. In the models, we use elements of the third layer of the GridSim architecture. SUMA components (Client, Scheduler, Proxy, UserControl and Execution Agent) are Entities that extend from the gridsim class. Entities are registered by the Grid Information Service. Scheduler, Proxy and UserControl use

Evaluating Tools for Performance Modeling of Grid Applications

Fig. 1. Sequential LQNM

Fig. 2. Parallel LQNM

Fig. 3. Preliminary steps of the sequential and parallel execution in GridSim

7

8

Curiel, Alvarez, Flores

a GridResource Entity to simulate their processing tasks (i.e., to find a Proxy, to validate Users, etc.). The process of creating a GridResource is as follows: First, Processing elements (PE) objects are created with a suitable MIPS rating. PE are assembled together to create a Machine. GridSim Machine class represents an uniprocessor or shared memory multiprocessor machine. One or more machines form a GridResource. SUMA Core components submit Gridlets to the GridResource SumaCore (figure 3). In the sequential model, a GridResource with a single machine and one or more PEs is bound to the Execution Agent. The execution of one application is modeled by means of a loop where some instructions (simulated by a Gridlet submitted to a GridResource) are followed by messages to the Client Entity to either load classes or execute I/O operations. A network topology was created to allow the Entities to communicate. Each Entity defines an instance of the Link class. An instance of Router class is created to forward data from one Entity to another. Figure 3 shows the sequence of messages among SUMA Core Entities before executing the application. In the parallel model each Slave has associated its own GridResource. Each GridResource has a Machine object and a PE. 6.1

Model Development, Parameterization and Validation

The modeling process was a bit difficult despite the programmers experience in Java. The learning curve of GridSim is slow. Knowledge of object oriented and Java programming is required. On the other hand, once GridSim architecture and philosophy is understood, it was very easy to construct the models because there is a direct correspondence between SUMA components and GridSim Entities. The use of asynchronous messages allows us to make calls between Entities in both directions: call and callbacks, without copying components. Since the model is a Java program, one can insert special instructions in any place to obtain execution times. Many identical parallel nodes can be added by changing a single parameter. We execute the benchmarks used to parameterize the LQNM in five machines with the following characteristics: AMD Athlon 64 3800 1GB RAM (runs the Client), Pentium IV 3.4 GHz, 512 RAM (for the Scheduler, the User Control, the Proxy and the Execution Agent); machines are connected by a LAN (10baseT Ethernet). The main GridResource static parameters were architecture, operating system, MIPS, number of machines and number of PE. The first two parameters were obtained from OS commands. The SiSoftware Sandra Lite 2007 (http://www.sisoftware.co.uk) was used to compute the MIPS rating per machine. The number of machines and PE depend on the particular Grid architecture. The main parameters of the Link class are: delay (ping command), MTU (standard Ethernet) and the bandwidth (obtained from router specifications). The main drawback was to obtain the gridletLength value for each SUMA component. The gridletLength is expressed in MI (Millions Instruction) and it is difficult to know the value of this parameter in a Java Program. The solution was to measure the program, to obtain the execution times and to convert measured times into MI. Prediction errors were below 7.9%

Evaluating Tools for Performance Modeling of Grid Applications

7

9

Comparison of Tools

After using the selected tools for developing the SUMA models, it is possible to compare them based on the “desirable characteristics”: 1. Capabilities for constructing a representative Grid model: In this aspect GridSim is so far one of the most complete tool. Some aspects that could be included are: (a) New classes that allow to model new software components (middleware, operating systems, etc.). (b) Improvements to hardware resources models (more complex network topologies, different network protocols, memory models, etc.) (c) Background load of processors with probabilistic behavior. LQNM tools were not designed for modeling Grid systems. They could be useful for modeling small Grids (for example intra-organizational Grids) or other kind of distributed systems, such as clusters with simple application models. Some features could be added to improve distributed systems models, for example: network models. 2. Capabilities for easily constructing a representative model of the application: With respect to this characteristic, GridSim also has many advantages: it allow us to model diverse models of parallel and distributed applications. [8] describes GridSim extensions for simulating data Grids. GridSim is based on deterministic simulation where no random events occur. This is a drawback because it limits the type of application to be modeled. However, GridSimRandom class and eduni.simjava.distributions package can be used for incorporating randomness in data. This randomness would require new outputs processing. LQNM tools seem suitable for modeling sequential and some kind of distributed applications. Stochastic behavior can be included in analytical and simulation models by indicating means and variances. 3. Metrics for performance analysis: GridSim provides a small set of metrics. Metrics about gridlet processing are: CPU time, wall clock time and waiting time. There are not explicit metrics about throughput or resource utilization. There are, however, a variety of metrics in the new network models. GridSim output should be improved by incorporating new metrics. LQNM tools offer diverse metrics to evaluate the application performance (response times, waiting times, devices utilizations, etc) and simulation results (confidence intervals). 4. Ease to use: The learning curve of GridSim is slow. [9] describes a Javabased Graphical User Interface (GUI) tool for GridSim, which aims to reduce the learning process and enables fast creation of simulation models. Future research projects could be oriented to provide higher level models and methodologies to transform them in GridSim models. Higher level parameters that can be internally transformed should also be included. These features would help application designers and application developers without experience in performance evaluation or Java programming. GridSim documentation is good. LQNM could be easily constructed from Use Case Maps, so application developers/designers not need to be queuing network experts. 5. Efficiency: GridSim uses serial simulation. Techniques to reduce simulation times should be incorporated to Grid modeling tools: parallel simula-

10

Curiel, Alvarez, Flores

tions and to combine analytical and simulation approaches. Analytical LQNM meaningfully reduces model execution times. However, approximated analytical techniques only can be used in very simple application models.

8

Conclusions

The aim of our research is to evaluate simulation tools for performance modeling of Grid applications. We have used two set of tools for modeling applications that run in the Java-based computational Grid SUMA. One set of tools solve LQNMs. They were not designed to model Grid environments but offer some advantages for modeling sequential and distributed applications (especially Client/Server applications). These advantages are: 1) Possibility of obtaining analytical and simulation results. 2) LQNM can be derived from Use Case Maps. 3) The tools provide several metrics for evaluating application performance and quality of simulation results. On the other hand, tools like GridSim allow us to model in detail diverse aspects of Grid platforms and a variety of parallel application models. They have many of the “desirable characteristics”, but three aspects need to be improved: efficiency, the set of metrics and the ease of use. These features will help designers and application developers construct right models and obtain results in very short times. Future research includes the evaluation of different tools and the modeling of different kinds of Grid applications.

References 1. Franks, G.: Performance Analysis of Distributed Server Systems. PhD thesis, Carleton University (2000) 2. Mascarenhas, E.: A System for Multithreaded Parallel Simulation with Migrant Thread and Objects. PhD thesis, Purdue University (1996) 3. Sulistio, A., Poduval, G., Buyya, R., Tham, C.: Constructing a grid simulation with differentiated network service using gridsim. In: Proc. of the 6th. International Conference on Internet Computing (ICOMP’ 05). (2005) 4. Cardinale, Y., Curiel, M., Figueira, C., Garc´ıa, P., Hern´ andez, E.: Implementation of a corba-based metacomputing system. In: Proc. of Workshop on Java for High Performance Computing.LNCS. (2001) 5. Snavely, A., Chun, G., Casanova, H., der Wijngaart, R.V., Frumkin, M.: Benchmarks for grid computing: A review of ongoing efforts and fututre directions. Sigmetrics Perfor. Eval. Rev 30(4) (2003) 27–32 6. Qu´etier, B., Capello, F.: A survey of grid research tools: simulators, emulators and real life platforms. In: Proc. of 17th IMACS World Congress (IMAC 2005), France. (2005) 7. Petriu, D.C., Woodside, C.: Software performance models from systems scenarios in use case maps. Proc. of TOOLS, Springer Verlag, LNCS 794 (2002) 159–177 8. A.Sulistio, Cibej, U., Robic, B., Buyya, R.: A Toolkit for Modeling and Simulation of Data Grids with Integration of Data Storage, Replication and Analysis. Technical Report GRIS-TR-2005-13, University of Melbourne (2005) 9. Sulistio, A., Yeo, C.S., Buyya, R.: Visual modeler for grid modeling and simulation (gridsim) toolkit. In: Proc. of the 3rd. International Conference on Computational Science (ICCS 2003), Springer Verlag Publications (LCNS Series) (2003)

Suggest Documents