Grid-user driven grid research, The CoBRA grid - Semantic Scholar

3 downloads 0 Views 298KB Size Report
(LWG) is introduced, under the acronym CoBRA grid ... For the base layer of CoBRA we selected the H2O ..... scientists in our research group are working on.
Grid-user driven grid research, The CoBRA grid P. Hellinckx, G. Stuer, W. Hendrickx, F. Arickx and J. Broeckhove Computational Modelling And Programming Antwerp University Middelheimlaan, 1 – 2020, Antwerp, Belgium E-mail: [email protected]

Abstract A new multi-purpose LightWeight Grid system (LWG) is introduced, under the acronym CoBRA grid (Computational Basic Reprogrammable Adaptive grid). It provides the functionality grid users require, and offers grid developers the opportunity to test and alter grid components while the grid is in use. The grid was proven to operate properly by testing several applications and introducing new components. The performance is tested by the “sleep testing technique”, and a by implementing a quantum physics calculation. The user friendliness is also evaluated with the latter distributed computational problem. The possibility to port the system onto well established systems like CONDOR and GLOBUS is discussed.

1. Introduction Distributed computing techniques can be categorised in different levels of abstraction built on top of a heterogeneous hardware (see Fig. 1). At the lowest level, low level programming language features, such as sockets, transfer raw data from one resource to another. To put semantics on the raw data, and structure the communication process, protocols were introduced. Protocols like SOAP [1], HTTP [2], IIOP [3], MPI [4], ssh [5], provide interaction between grid entities and permit the transfer of typed data. On top of these low level communication mechanisms, multiple categories of distribution abstractions were introduced. A first category contains the communication applications like ssh clients. They are generally used by script based distributed applications to communicate with, and transfer data to, the different available resources. A second category of distributed applications can be distinguished as application dependent grids (ADG). Such grids solve one particular type of computational

problem. ‘APST’ [6][7] and ‘Nimrod’ [8] are example of distributed applications created to distribute and compute parameter sweeps. HeavyWeight Grids (HWGs) constitute a third category of grid systems. These grid technologies have three main strengths: robustness, reliability and controlling mechanism. These strengths allow them to control an important amount of resources. These grids provide extensive computational power, storage, etc.. Projects such as GLOBUS [9][10][11] (used by EGEE [12]) or BOINC [13] (used by Seti@Home [14]) demonstrate this. The main drawback of the HWG is its deployment and maintenance cost. The effort needed to install or even to update often demands a lot of man-hours and system down time. The fourth category are the LightWeight Grids (LWGs). These attempt to combine low deployment costs with the strengths of a HWG. We distinguish two groups of LWGs. Low Level LWGs (LL-LWGs) mainly focus on scalability strength of a HWG, whereas High Level LWGs (HL-LWG) stress the support for robustness and monitoring mechanisms. LL-LWGs like H2O [15] and JINI [16] can share an enormous amount of resources, but fail to connect and monitor them in a scalable way. HL-LWGs like CONDOR [17] are robust and straightforward to monitor but need tricks like flocking to connect large amounts of resources. Most distributed computing applications are deployed in the Distributed resource consuming layer. Using a command line interface or programming language libraries, programs are split up into many parts and deployed on the different available resources by choosing a distribution technique which fulfils the user’s requirements. Our intention is to explicitly address the needs of different kinds of users, ranging from developers to consumers, in the design of the CoBRA grid. This requirement does not fit into the scheme of Fig. 1. Monitoring and robustness is provided mainly by the

HWG and the HL-LWG category, whereas the need for flexibility and adaptability only match with the LLLWG category. It is to bridge this gap the CoBRA grid system was developed. CoBRA provides a robust grid system based on a LL-LWG without sacrificing its flexibility and adaptability.

Fig. 1. Categories and abstraction layers in distributed computing

2. H2O as a LL-LWG For the base layer of CoBRA we selected the H2O framework, as it was low level, almost without any undesired features, easy to use, and always one step ahead of JINI (at the time we started this project). H2O is a component-based, service oriented framework for distributed metacomputing. H2O is designed to cater for lightweight, general purpose, loosely coupled networks where resources are shared on a P2P basis. Adopting a provider-centric view of resource sharing, it provides a lightweight, generalpurpose, Java based, configurable platform. H2O adopts the microkernel design philosophy: resource owners host a software backplane (called a kernel) onto which owners, clients, or third-party resellers may load components (called pluglets) that deliver value added services. It is this last aspect that makes H2O unique: by separating the service provider from the resource owner, it becomes possible for any authorized third party to deploy new services in a running kernel.

3. CoBRA Our intention is to explicitly address the needs of different kinds of users, ranging from developers to consumers, in the design of the CoBRA grid. This involves the introduction of a variety of interfaces, privileges, and abstraction levels, to let different types of users operate in parallel without interference.

3.1. Grid users & privileges We distinguish three kinds of users: - Resource Consumer: A scientist who needs the computing power of the grid. - Grid Researcher: The goal of this researcher is to modify and improve components of the grid. - Grid Developer: A developer of low level grid components such as filetransfer, classloading, etc. In concordance with this we introduce three levels of abstraction, pictured in Fig. 2: Low Level Layer, Middleware Layer and User Layer. The Resource Consumers are allowed to extend the User Layer components and to use the Middleware Components to serve their implementation. The Grid Researcher has privileges for extending (in the object-oriented sense) User Layer and Middleware Layer entities. He can use the Low Level Layer components for this purpose. The Grid Developer is allowed to extend, use and alter every component on every level, i.e. modify the basic building blocks and interfaces of the grid. A grid researcher can only extend existing components The interfaces of those components stay intact and remain straightforward to use for the Resource Consumers.

Fig. 2. Grid Layer Privileges of the users. White is “editable”, gray is “extendable”, dark gray is “usable”.

3.2. User Layer The User Layer aims at making distributed programming straightforward for grid users by providing an abstract implementation of the two basic building blocks. The first building block is the task. It implements an undivisible unit of a distributed program and can be computed by an available resource. The abstract Java class contains three abstract functions: - GetDataImpl(): By implementing this method the grid user defines de retrieval process of the input data. - ExecuteImpl(): This method implements the execution of the task. It is an independent subtask of the total computation.

-

PutDataImpl(): This method describes how the resulting data is returned to the appropriate storage location. The task class contains a number of additional functions that can be used in the implementation of the above abstract functions. They provide an abstraction level on top of low level OS dependent operations like file management (further discussed in the Low Level Layer). The second building block is called the TaskMaster (TM). This part of the User Layer implementation defines a solution to the problem distributed by the user. By implementing the three abstract functions below, the user divides the total problem into independent sub-tasks, submits those tasks, interprets their results and acts on these interpretations. - getData(): By implementing this method the user defines the retrieval process of the input data needed by the TaskMaster. - run(): The implementation of this abstract method, inherited from the abstract super class Runable, defines the solution of the distributed problem. In this function the different tasks are created, submitted and the task results interpreted. - putData(): This method describes how the resulting data is returned to the appropriate storage location. The TaskMaster (TM) class contains a number of additional methods useful for the implementation of the above abstract functions. They provide an abstraction level on top of low level OS dependent or grid dependent operations like file management or task submission (further discussed in the Low Level Layer).

3.3. Middleware Layer The Middleware abstraction level built on top of the Low Level Layer implements the different components needed to serve the TM and distribute the tasks (see Fig. 3). To support these User Layer requirements, three Middleware Components are essential: a Broker, a Resource and a Resource Manager.

Fig. 3. Deployment process of a distributed problem on the CoBRA grid) The Broker component is that part of the grid serving and monitoring the TaskMaster execution. It processes the TM related notification messages, and manages

user notification. Its interface, accessible by standard H2O connections, has two purposes. First of all it offers the grid users the opportunity to submit and execute an instance of a TM class. Secondly it offers several functions to connect the abstraction of the User Layer with the Low Level Layer. In order to control the different resources the grid contains Resource Components. The resource class is an abstract class that has to be implemented for every different kind of resource. Up to now only two kinds of resources are implemented (scheduler and worker) and a third one is under construction (GLOBUS gateway). The first one, the worker resource, enables a computer to serve a task created by a user. The standard Submit function of the resource interface places the task in a queue and serves it when its predecessors have finished. The remaining interface functions are used to connect the abstraction of the User Layer with the Low Level Layer. The second type of resource is a scheduler. This resource keeps track of the available resources in the grid and sends tasks to unoccupied available resources. The last component of the Middleware Layer is the Resource Manager. Resource discovery is a key problem in grid infrastructures. Keeping track of all resources in a configuration file is static and complicates adding and removing resources and prevents resource failure handling. Instead, the CoBRA grid implements a resource manager which monitors the available resources in the grid and reacts to any change. Using the low level event triggering mechanism explained below, it ensures that every affected middleware component reacts to changes in the available resources. When a resource fails e.g. the resource manager ensures that the resource’s tasks are rescheduled. The problem of detecting and monitoring the available resources is solved by a software component [18] built on top of H2O. This combination of a pluglet and a JINI LUS offers the opportunity to register the available pluglets of H2O in a JINI LUS. This information allows the resource manager to detect the available resources by selecting those entries in the LUS which match a particular type of resource. This mechanism is more versatile, and is also applied to detect any component within the grid on demand.

3.4. Low Level Layer The lowest level built directly on top of H2O is called the Low Level Layer. This abstraction level bridges the gap between the Grid Components and the

bare H2O software. In fact, it provides a number of capabilities that are not available in the H2O framework, but are indispensable for the construction of Middleware Components. Three key capabilities are Filesystem, Codetransfer and Notification. A robust file system is essential to manage input and output data. As scalability is an important issue in the construction of a grid, a decentralized file system is required. This leaves us with two options. Either we create another abstraction layer by introducing a decentralized shared file system or we construct a decentralized individual file system on each node. While the abstraction layer of the shared file system hides all the file transfer and file location issues, it introduces parallel file access problems. We have chosen to construct the decentralized individual file system, because the latter problems are difficult to understand by the average grid user, and because of the high development cost of a shared file system. The absence of any remote file management mechanism makes the development directly on top of H2O hard to handle. To make the deployment more straightforward, the file system was constructed in three individual layers on top of the standard Java file streams using the H2O technology (see Fig. 4). The H2O users and grid users use the CoBRA layer to manage files and to transfer data from one node to another. This layer manipulates the H2O file system layer to manage and transfer the data. It stores its data in a designated directory within the H2O file system. Each TM deployment on the grid will work in a different subdirectory inaccessible to other runs. The H2O file system layer manages data in a designated directory on each individual node and transfers data using the H2O File Stream layer which is extended from the standard JavaStreams. It uses JavaStreams in combination with the H2O infrastructure to read and write data from/to a remote machine. Extending the streams from the standard JavaStreams makes the combination with buffered streams and other standard java techniques straightforward. Besides transferring data through the framework, there should also be a mechanism to transfer code (extension and implementation of the Task and TaskMaster class) through the framework. This seems straightforward, but is not. It would be easy if every middleware component had a definition of every piece of code transferred within its classpath. This would imply knowing the transferred code before deploying the Middleware Components and adding all possible implementations of a TaskMaster and a Task to the

classpath. This solution is neither dynamic nor scalable and therefore not an option.

Fig. 4. CoBRA filesystem abstraction layers We solve this problem by adding code carriers. These carriers transport the marshaled Java code and the URI of its classpath in combination with the system related notification infrastructure discussed below. When the carriers reach the deployment destination, they unmarshal the code using a classloader initialized with the appropriate classpath. They connect the code with the notification infrastructure and hand it over to the appropriate component (broker, worker, …) which will then serve the code. Notifying and acting on certain events can be implemented in different ways. We have opted for the event listener model, because this mechanism was already available in the lookup and discovery mechanism described earlier. If a component wants to act on a certain task event, it adds an implementation of the TaskListener class to the TaskCarrier. When a task is unmarshaled by its serving component, the listeners are connected to the task as described above. A TaskListener is in fact an implementation of an interface with an abstract function for every possible event. When events occur they are handled in every component by calling the matching procedure in the listeners added to the TaskCarrier. Using the same mechanism, resource listeners are used to react on changes in the available resources. When resources die, their tasks are rescheduled on other resources; when new resources appear, they are added to the resource pool.

4. Sleeptesting Testing distributed systems by using sleeps was introduced in [19]. The fundamental properties of the proposed testing technique are total problem range coverage, simulation correctness and easy definability. This means that it is possible to generate a simulation

indistinguishable of the original problem, completely defined by a couple of parameters. By directly manipulating the fundamental building blocks of distribution problems, the tasks, the properties mentioned above are obtained. Fig. 5 shows a typical distribution scheme.

Fig. 5. Typical distribution scheme: x tasks of length ta distributed on y computers.

The infrastructure was tested using twenty-three computers. Table 1 shows the characteristics and deployed Middleware Components of the twenty-three systems. All are PIV (1.7Ghz -2.66Ghz), eleven with 512MB and twelve with 1GB RAM. Twenty of them were used as worker machines, one as a combination of a scheduler and a resource manager, one as a broker and one to submit the code to this broker. In order to get a reliable characterization of the CoBRA grid, the infrastructure is tested with a variety of input parameters. An experiment has been conducted for each combination of the following parameters: number of tasks 100, 1000; task lengths 1 sec, 2 sec, 5 sec, 10 sec, 20 sec, 50 sec, 100 sec, 200 sec, 500 sec and 1000 sec. To obtain statistically relevant data, each experiment is repeated 10 times.

4.2. Results The scheme consists of ‘x’ different tasks which can all be computed independently of one another. There is no relation between the tasks and/or the computers that execute them. Different distributive problems are distinguished by their task list. Fulfilling the property of having total problem range coverage comes down to dealing with this difference in task list. An easy way to generate every possible task list would solve this problem. This is done by replacing the actual tasks by a time consuming dummy function. See Fig. 6.

-

Fig. 6. Dummy tasks: each task ‘a’ is replaced by a sleep of length ‘ta’.

Percentage of overhead: The percentage of overhead in the Performance time OH (2) POH =

-

Speedup: Measures how much faster a distributed run is than a local run LocalTime (3) SU =

This function occupies the processor for a given time. In the current version of our testing technique the tasks are replaced by a sleep function, which sleeps the actual running time of the task. Applying this technique, one can build any task list one requires by generating the corresponding list of sleeps. Due to the number of tasks, each with specific properties, in a task list it is impossible to define this manually. To resolve this problem a test-generator is built. Given the appropriate input, an XML-file [20] representing the intended task list is constructed. The main advantages of this approach are the possibility to test an infrastructure without generating an effective load. It can be tested when in use by other users and/or programs. This also implies that the generated load on an idle infrastructure is produced by the distributed system itself and can easily be measured and analyzed.

4.1. Setup

In order to analyze the test results some unambiguous definitions are introduced: - Performance P: The time needed to distribute and execute ‘x’ tasks of length ‘l’ on ‘y’ workers. Overhead OH: The performance P minus the amount of execution time of each task on the available workers.  # tasks  OH = P − * tasklength (1)  # workers 

P

P

Table 2 shows the percentage of overhead (POH) and the speedup (SU) for 100 and 1000 tasks of different task lengths (l) l. The higher OH with fewer tasks indicates that there is a startup cost independent of the number of tasks. The mean value of the overhead is about 13 seconds but it varies from 9 to 20 seconds. Table 2: Sleep Test Results l 1 2 5 10 20 50 100 200 500 1000

POH_100 80.49% 59.30% 40.26% 22.81% 15.58% 6.89% 3.89% 2.15% 0.77% 0.36%

POH_1000 64.11% 41.15% 17.70% 10.26% 5.27% 2.67% 1.05% 0.40% 0.25% 0.13%

SU_100 3.90 8.14 11.95 15.44 16.88 18.62 19.22 19.57 19.85 19.93

SU_1000 7.18 11.77 16.46 17.95 18.95 19.47 19.79 19.92 19.95 19.97

Further analysis of the overhead (Fig. 7) indicates that the amount of overhead remains more or less constant when the task lengths increase. This indicates that the overhead is independent of the task length and consists only of a constant overhead per task and a startup cost. The mean overhead per task is 0.95 s and varies from ±0.7 to ±12 s.

Fig. 7. The Performance P and overhead OH when distributing 100 sleeptasks of different Task Length. From Table 2 one concludes that a distribution without data transfer increases the performance even with tasks with as little as 1 second execution time. It becomes really performing, with speedup > 19, in every situation where tasks with an execution time longer than 1 minute are used. Whenever the amount of tasks increases the startup cost tends to disappear because it is stretched over a lot of tasks. In this situation the distribution of tasks with execution time 20 seconds already results in an almost perfect speedup. We conclude that the overhead due to the framework is negligible when using a realistic division of the distributable problem. Even if really small tasks occur, the framework remains performing.

5. Application Based Testing The tests of the previous section indicate that the system is performant and stable albeit in an artificial setting. In this section we intend to extend this demonstration to a realistic computation, a nuclear quantum physics calculation. We focus on the ease of the application development and on the performance.

5.1. The Quantum Physics Problem To obtain physical properties of quantum systems, such as atoms, nuclei or molecules, one needs to solve the so-called Schrödinger equation. In order to solve it, proper boundary conditions must be chosen. The solutions, the so-called “wave-functions" then allow for the calculation of physical quantities. The equation and its boundary conditions are usually too complex to

be solved for many-body systems (e.g. a nucleus), and approximations have to be introduced. One approach is to expand the wave-function on a discrete, infinitedimensional, set of basis states. Substitution of this approximation in the equation and boundary conditions, leads to a much simpler matrix equation in the expansion coefficients to be solved. The matrix formulation can be further simplified by choosing expansion bases with specified properties. The Modified J-Matrix (MJM) model [21] is such an approach, and has been applied to 3-cluster nuclear systems [22]. We consider it here to obtain scattering results for a 3-particle configuration of a triton and 2 neutrons of 5H. The calculations essentially consist of two steps: (1) a CPU intensive calculation of the matrix elements in the matrix equation, and (2) the solution of the matrix equation. Step 2 can be obtained in reasonable time on a single node, and we will therefore only consider the gridification of step 1. The oscillator expansion basis for the solution of a 3-cluster problem is enumerated by a set of indices. These are the hypermoment K, describing the three cluster geometry; relative angular momenta l1 and l2 between the three clusters coupled to the total angular momentum L, a constant of motion; the oscillator index n. The number of (l1l2) combinations depends on both K and L. The essential matrix to be determined is the energy matrix, denoted by ^

^

K i , (l1 , l2 )i L, ni H K j , (l1 , l2 ) j L, n j = K i , li , ni H K j , l j , n j

(1)

where Ĥ is the Hamiltonian, or energy, operator, and i and j distinguish basis states; the right hand side in (1) simplifies the notation, omitting L as an overall constant, and replacing the combination (l1l2)i by li. The theory to obtain (1) [22] is well beyond the scope of this paper, but it can be broken down to ^

K i , li H K j , l j

= ∑ ∑ ∑ R( K i , l i , l r , t )

^

K i , lr H K jls

R( K j , l s , l j , t )

(2)

t

where stands for a matrix over all ni, nj indices. The R factors are so-called Raynal-Revai coefficients as discussed in [21][22], and the nature and range of index t depends on the nucleus (5H) and its cluster decomposition (t + n + n). The granularity of the problem is clear from (2), and reduces the problem to a fork and join algorithm by calculating all independent matrices for fixed Ki,Kj and all allowed combinations lr,ls,t (the fork), followed by a summation to obtain (2) (the join). In this paper we discuss a calculation for L = 0 and K = 0,2,4, …,16, a range of l1 = l2 = 0,1,…, K/2 values, and 45 t values. All of the computational code components are implemented in Fortran90 using the Intel v7 compiler. A farmer-worker distribution algorithmic model seems

an evident choice for this problem, because all of the fork tasks are independent of one another. The tasks are file based, meaning that they get their input from a series of files and write their results into one. All input files except one, a configuration file which contains the particular indices for the current computation, have a constant content for all tasks. Implementing this code onto the CoBRA grid is straightforward. First of all the TaskMaster class is extended. It implements the preparation of the first step mentioned above and computes the second step. The getdata function transfers the input files (gnrl.cfg, Kl1l2Combs.dat, …) to the broker machine. The run function implements the task creation. For every task the appropriate input file is created and an instance of the task class is created and submitted. Finally step two is conducted by launching the recombination binary on the broker machine. The implementation of the putdata function puts the resulting data on the submission machine. Secondly the task class has to be extended. The implementation of the getdata function transfers the files which are not available on the worker from the broker to this worker. The execute implementation runs the FORTRAN executable. The implementation of the putdata function transfers the resulting data to the broker machine.

CoBRA

0.05

0.14

0.54

1.98

6.17

13.72 15.95 18.30 19.26

The distribution implemented consists of a lot of tasks. The amount of tasks increases from 135 tasks for K=0 to 18315 tasks for K=16 and the average task length increases from 9 ms for K=0 to 54 sec for K = 16. The large amount of small Tasks for a small K implies that the effort of distribution takes more time than running the individual tasks. Because of this, the distributed program will have a larger execution time than the local run. Whenever K has a value larger than 6, the mean task length becomes large enough (above the black line in Fig. 8) to achieve a speedup when distributing the problem. Comparing the SU of the bare H2O distribution with the SU of the CoBRA grid shows that H2O is 0.58% faster than the CoBRA grid. This is caused by the extra scheduling step a task has to follow when using the CoBRA grid. In the bare H2O distribution tasks were created and sent to a resource; in the CoBRA distribution, tasks are created and sent to a scheduler from where they are sent to the actual resource. This additional step is needed to provide scalability. This slight loss of performance is inevitable while constructing a multi-component grid and by consequence acceptable.

5.1. Test Results In the first step this implementation of the QF problem tested the user friendliness and completeness of the distributed system. As no parts of the framework had to be altered and no tricks had to be used to solve problems caused by the design of the framework and because no designing problems were encountered by the user we can conclude that the CoBRA framework is easy to use and contains all the required features needed to distribute a problem. As such it passes the first testing step. The second step tests the performance. Table 3 compares a run with certain fixed parameters on a local machine with a run on a bare H2O infrastructure (tested in [23]) and one on the CoBRA grid. The first row contains the values of the input parameter K explained below. The second row lists the speedup results of the tests done with bare H2O and the last row contains the speedups of the tests performed on the CoBRA infrastructure. Table 3: QF Test Results SU/K H2O

0 0.04

2 0.11

4 0.49

6 3.35

8 6.72

10 12 14 16 14.28 17.75 19.72 19.80

Fig. 8. Speedups obtained using bare H2O and using the CoBRA grid We can conclude that the CoBRA infrastructure passes its user oriented deployment and performance test.

6. Economic Scheduler As a final test, the functionality of the grid was tested from a grid scientist’s point of view. Colleague scientists in our research group are working on economic scheduling. This means scheduling based on the needs and budget of a user. They extended the scheduler component and added all the functionality described in [24]. They didn’t encounter any design related obstacles when implementing. This test proves

the flexibility of the infrastructure and its ability to serve as a research tool. As the researchers are currently working on a more advanced implementation, a more detailed description will follow in a future paper.

7. Future work In the near future we intend to gather additional user experience on the CoBRA grid. One project deals with a compute–intensive graphics problem, namely the rendering of diamonds. The possibility of rendering various alternative cuts of a diamond before actually cutting has clear economic benefits. Another project creates a gateway to the Globus toolkit. It makes it possible to use resources managed by the Globus middleware. This link offers the opportunity to add HeavyWeight Grid resources like the BeGrid to our CoBRA grid. We also aim to extend work on the CoBRA grid itself. In particular we intend to develop scheduler components that are capable of application level scheduling e.g. of parameter sweep computations. This project will focus resource mapping based on requirement prediction.

8. Conclusion This paper has introduced a new approach to bring academic grid users and developers closer to one another. This required a grid infrastructure which suited grid users as well as grid researchers. In other words a pluggable, scalable, flexible, robust, reliable and performing LightWeight Grid was required. Such a multipurpose Grid system had not been developed before. The CoBRA grid has been created and tested from the developer, resource consumers and researcher point of view to prove that the infrastructure satisfied their requirements. As the CoBRA grid has passed those tests without any problem we conclude that the infrastructure is well suited for an academic environment.

9. References [1] A.E. Walsh ed, “ UDDI, SOAP and WSDL: The Web Service Specification Reference Book”, Prentice Hall, New Jersey, 2000 [2] HTTP Hypertext Transfer Protocol. Available: http://www.w3.org/Protocols/ [3] IIOP Internet Inter-ORB Protocol. Available: http://www.omg.org/library/iiop4.html

[4] M. Snir, S. Otto, S. Husslederman, D. Walker, J. Dongarra, “MPI, The complete Reference”, Volume 1, MIT Press, Cambridge Massachusetts, 1998 [5] Secure shell Available: http://www.ssh.com/ [6] Casanova H., Obertelli G., Berman, F., and Wolski, R., The AppLes Parameter Sweep Template: User-Level Middleware, Proceedings of SC00, November, 2000. [7] APST A Parameter Sweep Tool available at: http://grail.sdsc.edu/projects/apst/ [8] Nimrod Available: http://www.csse.monash.edu.au/~davida/nimrod/ [9] I. Foster, C. Kesselman, “The Globus Toolkit” in The Grid: Blueprint for a New Computing Infrastructure by I.Foster, C.Kesselman eds., Morgan Kaufmann, San Francisco, 1999 [10] I. Foster, C. Kesselman, S. Tuecke, “The Open Grid Service Architecture” in The Grid 2: Blueprint for a New Computing Infrastructure by I.Foster, C.Kesselman, eds., Elevier, Amsterdam, 2004 [11] GLOBUS Available http://www.globus.org/ [12] EGEE Enabling Grids for E-siencE Available http://public.eu-egee.org/ [13] BOINC Berkeley Open Infrastructure for Network Computing Available: http://boinc.berkeley.edu/ [14] Seti@home Search for Extraterrestrial Intelligence Available: http://setiathome.ssl.berkeley.edu/ [15] H2O Availalble: http://www.mathcs.emory.edu/dcl/h2o/ [16] W.K. Edwards, “Core Jini”, Prentice Hall, New Jersey, 2001 [17] CONDOR Available: http://www.cs.wisc.edu/condor/ [18] Dirk Gorissen, Gunther Stuer, Kurt Vanmechelen and Jan Broeckhove, “H2O Metacomputing – Jini Lookup and Discovery” Lecture Notes in Computational Science 3515 p.1072-1079 [19] P. Hellinckx, G.Stuer, F. Hancke, D. Dewolfs, F. Arickx, J. Broeckhove and T. Dhaene, “Dynamic Problem-Independent Metacomputing Characterization Applied To The Condor System” Procedings ESM2003 p.262-269 [20] XML Available: www.Xml.com [21] J. Broeckhove, F. Arickx, W. Vanroose and V. Vasilevsky: The Modified J-Matrix method for ShortRange Potentials. J. Phys. A: Math. Gen 37 (2004) 1-13 [22] V. S. Vasilevsky, A. V. Nesterov, F. Arickx and J. Broeckhove: The Algebraic Model for Scattering in Three-s-cluster Systems: Theoretical Background. Phys. Rev. C63 (2001) 034606:1-16 [23] P. Hellinckx, K. Vanmechelen, G. Stuer, F. Arickx, and J. Broeckhove, “User Experiences with Nuclear Physics calculations on a H2O metacomputing system and on the BEgrid.” Lecture Notes in Computational Science 3515 p.1080-1088 [24] R.Wolski, J.S. Plank, J. Brevik, T.Bryan, “ Analyzing Market-based Resource Allocation Strategies for the Computational Grid” International Journal of Highperformance Computing Applications 15(3), 2001