Parallel and Distributed Processing - Semantic Scholar

2 downloads 0 Views 1MB Size Report
Metainformation and Workflow Management for Solving Complex Problems in Grid Environments. Han Yu, Xin Bai, Guoqiang Wang, Yongchang Ji, and Dan C.
Metainformation and Workflow Management for Solving Complex Problems in Grid Environments Han Yu, Xin Bai, Guoqiang Wang, Yongchang Ji, and Dan C. Marinescu School of Computer Science University of Central Florida P. O. Box 162362 Orlando, FL 32816-2362 Email: (hyu,xbai,gwang,yji,dcm)@cs.ucf.edu

Abstract In this paper we discuss the semantics of process and case description necessary for the workflow management on a grid used for scientific applications and outline the structure of basic ontologies supporting the coordinated execution of complex tasks. We survey the core services provided by our environment and discuss in some detail the planning service.

1

Problem Formulation

Our objective is to automate the execution of complex problems such as data collection and analysis in computational sciences, in a dynamic and complex environment, such as a computational grid. The complexity of the problem we wish to solve has several dimensions. First, computational tasks are interspaced with data collection and human decisions including computation steering. Second, most computational tasks are data intensive, a program may require access to very large data sets, consisting of GBytes or TBytes of data. Third, the process description, the description of the tasks to be carried out and their dependencies, includes constructs for iterative execution with a number of cycles that cannot be pre-determined, concurrent execution of coarse grain or fine grain computations, and multiple choices. Fourth, the process description may change during execution, as a result of the need to: (a) modify the parameters of the computational models, (b) change the algorithms, (c) improve the quality of the solution, (d) incorporate new experimental data, or (e) adapt to the environment. Last, but not least, computational tasks require substantial amounts of resources (CPU cycles, main memory, secondary storage, network band-

width). Some of the computational tasks are long lasting and require checkpointing. Sometimes tasks may have soft deadlines. In turn, the environment has a number of characteristics that make the execution of computational tasks difficult. The resource-rich environment is highly heterogeneous. The heterogeneity and the richness of the environment make the process of matching the resource requirements of a computational task with the resources available rather difficult. For example, if a parallel computation involves fine grain parallel computations, then a PC cluster with a switch with high latency and low bandwidth will be a poor choice to execute this task. At the same time, it is very likely that if we have enough information about the resource requirements of a computational task and about the characteristics of the resources available, we could locate a better site to execute this task. Task migration is likely to be more difficult in this environment. Additional data transformations may be necessary before and/or after migrating a task. Transformation such as data compression/decompression, encryption/decryption and byte swapping are likely to be necessary. If a task has soft deadlines, e.g., it must be completed within the next 24 hours, then the search for a site with adequate resources for the execution of the task must be complemented by the ability to access history information about the past execution of the task, as well as hardware performance data. Last, but not least, the environment cannot be characterized as user-friendly. In fact it may be non-cooperative. Even if the user knows the duration of each individual task and may wish to reserve in advance resources for that task, the system may either not support resource reservations, or may impose a prohibitive cost for the advanced reservation of resources. Resource acquisition on the spot markets, based upon some form of resource brokerage, generally

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

faces stiff competitions from other groups sharing the same resources, or even from the members of the same group. The heterogeneity makes some of the resources (e.g. those with a proven record of reliability) more desirable than other resources thus, hot-spot contention cannot be discounted. As a result, brokers must maintain full information about resources with similar characteristics and group them in multiple equivalence classes based upon different sets of properties. The system consists of autonomous nodes in different administrative domains, and negotiations for the use of resources must address additional constraints, such as security, cost, and reliability, besides the strict resource requirements. The ability to recover from errors caused by the failure of individual nodes is a critical aspect for the execution of complex tasks. The absence of an automated procedure to coordinate the execution of a complex task, recover from system failures, and construct new process descriptions in response to the changing conditions, could only result in end-user frustration.

2

Process Description and Coordination

The need for an intelligent infrastructure is amply justified by the complexity of both the problems we wish to solve and the characteristics of the environment. The basic architecture of the multi-agent system we are currently developing is illustrated in Figure 1 and presented in detail elsewhere [13, 14]. Various services are performed by agents built upon the Jade multi-agent framework. We distinguish between core services, provided by the computing infrastructure, that are persistent and reliable, and end-user services provided by end-users. The reliability of end-user services cannot be guaranteed; such services may be short-lived. A non-exhaustive list of core services includes: authentication, brokerage, coordination, information, ontology, matchmaking, monitoring, planning, persistent storage, scheduling, and simulation. The authentication services contribute to the security of the environment. Brokerage services maintain information about classes of services offered by the environment, as well as past performance data bases. Though the brokerage services make a best effort to maintain accurate information regarding the state of resources, such information may be obsolete. Accurate information about the status of a resource may be obtained using monitoring services. Coordination services act as proxies for the end-user. A coordination service receives a case description and controls the enactment of the workflow. Planning services are responsible for creating the workflow. Information services play an important role; all end-user services and other core services register their offerings with the information services. Ontology services

maintain and distribute ontology shells(i.e., ontologies with classes and slots but without instances) as well as ontologies populated with instances, global ontologies, and userspecific ontologies. Matchmaking services allow individual users represented by their proxies (coordination services) to locate resources in a spot market, subject to a wide range of conditions. Individual users may only be intermittently connected to the network. Persistent storage services provide access to the data needed for the execution of user tasks. Simulation services are necessary to study the scalability of the system and they are also useful for end-users to simulate an experiment before actually conducting it. Scheduling services provide optimal schedules for sites offering to host application containers for different end-user services. Core services are replicated to ensure an adequate level of performance and reliability. Core services may be organized hierarchically, in a manner similar to the DNS (Domain Name Services) in the Internet. Coordination service is one of the most critical components of the infrastructure. This service cooperates with other core services to ensure a seamless transition among the individual activities involved. Coordination is based on both the process description and case description of a computing task. A process description is a formal description of the complex problem the user wishes to solve. For the process description, we use a formalism similar to the one provided by Augmented Transition Networks (ATNs) [12]. The coordination service implements an abstract ATN machine. A case description provides additional information for a particular instance of the process the user wishes to perform, e.g., it provides the location of the actual data for the computation, additional constraints, and conditions [8]. The grammar of the process description in BNF form follows. The symbol S denotes the start symbol, while e stands for an empty string.

S = ::= BEGIN, , END ::= | | | | ::= ; ::= {FORK {; ; } JOIN} ::= {ITERATIVE {COND

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Information Service

Brokerage Service

Matching Service User Interface

Application Container

Coordination Service Planning Service

Ontology Service

Scheduling Service

Application Container

Application Container

Simulation Service

Persistent Storage Service

Core Services

Application Container

End-User Services

Figure 1. Core and end-user services. The User Interface (UI) provides access to the environment. Applications Containers (ACs) host end-user services. Shown are the following core services: Coordination Service (CS), Information Service (IS), Planning Service (PS), Matching Service (MS), Brokerage Service (BS), Ontology Service (OS), Simulation Service (SimS), Scheduling Service (SchS), and Persistent Storage Service (PSS).

{}} {}} ::= {CHOICE {} {ConditionalActivities} ConditionalActivitySet MERGE} ::= ; | | e ::= {} | e ::= {COND {}} {}

::= ::= ; | ::= . ::= String ::= Classification | Size | Location | ... ::= < | > | = ::= ::= | ::= | ::= a | b | ... | z | A | B | ... | Z

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

::= 0 | 1 | ... | 9

3

Planning

The original process description is either created manually by an end user, or automatically, by the planning service. Process descriptions can be archived using the system knowledge base. The planning service is responsible for creating original process descriptions (also called plans) and more often for re-planning, to adapt an existing process description to new conditions. Planning is an artificial intelligence (AI) problem with a wide range of real-world applications. Given a system in an initial state, a set of actions that change the state of the system, and a set of goal specifications, we aim to construct a sequence of activities, that can take the system from a given initial state to a state that meets the goal specifications of a planning problem [15].

3.1

Activities

Activities are the building blocks used by the planning service to compose plans. A plan consists of two types of activities, end-user activities and flow control activities. Every end-user activity corresponds to an end-user computing service that is available in the grid computing system. Such activities run under the control of Application Containers(ACs). Every end-user activity has preconditions and postconditions. The preconditions of an activity specify the set of necessary data and their specifications for executing the activity. An activity is valid only if all preconditions are met before execution. The Postconditions of an activity specify the set of conditions on the data that must hold after the execution of the activity. Flow control activities do not have associated computing services. They are used to control the execution of activities in a plan. We define six flow control activities: Begin, End, Choice, Fork, Join, and Merge. Every plan should start with a Begin activity and conclude with an End activity. These two activities cannot occur anywhere else in a plan. The direct precedence relation reflects the causality among activities. If activity B can only be executed directly after the completion of activity A we say that A is a direct predecessor activity of B and that B is a direct successor activity of A. An activity may have a direct predecessor set of activities and a direct successor set of activities. We use the term “direct” rather than “immediate” to emphasize the fact that there may be a gap in time from the instance an activity terminates and the instance its direct successor activity is triggered. For the sake of brevity we drop the word

“direct” and refer to predecessor activity set, or predecessor activity and successor activity set, or successor activity. A Choice flow control activity has one predecessor activity and multiple successor activities. Choice can be executed only after its predecessor activity has been executed. Following the execution of a Choice activity, only one of its successor activities may be executed. There is a one to one mapping between the transitions connecting a Choice activity with its successor set and a condition set that selects the unique activity from the successor set that will actually gain control. Several semantics for this decision process are possible. A Fork flow control activity has one predecessor activity and multiple successor activities. The difference between Fork and Choice is that after the execution of a Fork activity, all the activities in its successor set are triggered. A Merge flow control activity is paired with a Choice activity to support the conditional and iterative execution of activities in a plan. Merge has a predecessor set consisting of two or more activities and only one successor activity. A Merge activity is triggered after the completion of any activity in its predecessor set. A Join flow control activity is paired with a Fork activity to support concurrent activities in a plan. Like a Merge activity, a Join activity has multiple predecessor activities and only one successor activity. The difference is that a Join activity can be triggered only after all of its predecessor activities are completed.

3.2

Formalism of Planning in Grid Computing

Definition: A planning problem, P, in a grid computing system is defined as 3-tuple: P = {Sinit , G, T }, where: 1. Sinit is the initial state of the system, which include all the initial data provided by an end user and their specifications; 2. G is the goal specification of the problem, which includes the specification of all data expected from the execution of a computing task; 3. T is a complete set of end-user activities available to the grid computing system.

3.3

Planning Service

The planning service is one of the core services provided by an intelligent grid computing environment. The function of the planning service in our framework is to generate valid process descriptions, for the end users. Throughout the next few sections we use the terms plan and process description with essentially the same meaning. When we talk about the

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

execution of a plan we actually understand the execution of the case description based upon the process description, or plan used for a specific computational task. The planning service accepts planning requests from the coordination service. Such an assignment includes: 1) the set of the initial data available to the end user, 2) the goal of planning, and 3) other useful information. The goal is often expressed in terms of the results of computations expected by the end user. Once the process description is created the planning service sends it to the coordination service, and possibly to an archiving service. Figure 2 shows the exchange between the planning service and coordination service for a standard planning request.

of this activity. The activity can be included in the new plan only if there is at least one application container that can provide the execution of the activity. Re-planning, however, does not fully guarantee the success of the execution of a plan because the state of resources needed by various activities typically changes frequently. Figure 3 shows the flow of communications between the planning service and other services during re-planning. 1. Planning task specification and a list of non-executable activities

Coordination Service 8. A new plan

1. Planning task specification

2. Brokerage Service?

Information Service

Coordination Service

Planning Service

Planning Service

Figure 2. The interactions between the planning service and the coordination service. In addition to ab-initio generation of valid process descriptions, the planning service is involved in re-planning. Re-planning is triggered by the coordination service, whenever the state of the environment is such that the execution of the current case description based upon a valid process description cannot continue. When re-planning is required, the coordination service sends to the planning service all available data, including the initial set of data and the data modified, or created during the execution of the case description. Conceptually, re-planning has the same attributes as planning, but with one major difference: during replanning, the planning service has to improve the robustness of plans. To achieve this goal, the planning service needs to interact with the runtime environment and avoid reusing in the new plan those activities that prevent the previous plan from successful execution. In other words, the planning service should know whether an activity used in the new plan is executable or not. There are two possible methods of acquiring this knowledge. With the first method, the knowledge is given directly by the coordination service. With the second method, the planning service gets support from other services in the grid computing framework. This method consists of three steps. First, the planning service asks the information service for a brokerage service that is available in the system. Second, the planning service contacts with the brokerage service to get a group of Application Containers that can possibly provide the execution of the activity. Third, the planning service communicate with each Application Container for the availability of execution

3. Brokerage Service found 4. Application Containers for the activity?

7. Executable or not executable

6. Activities executable?

2. plan

Application Container

Brokerage Service

5. A group of Application Containers found

… ...

Application Container

Figure 3. The flow of communications between the planning service and other services during re-planning.

3.4

A Genetic-based Planning Approach

Applying evolutionary computation methods to planning has proved its effectiveness for solving traditional planning problems [6, 10, 9, 11]. We have designed a Genetic Algorithm (GA) based approach in AI planning [15]. The existing approach, however, cannot be directly applied due to the unique features of the grid computing planning problem. Modifications are necessary to adapt the existing approach to planning in a grid computing system. In this section, we present a novel genetic-based approach for the planning service in a grid computing system. We discuss the main features of this approach, including the internal representation of the plans, the solution initialization, the genetic operators, the selection scheme, and the plan evaluation.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

3.4.1

Individual Representation

A

A simple GA uses a linear binary string to represent candidate solutions. As we use a nonlinear structure for the process description, we must use a nonlinear representation scheme to encode plans, accordingly. In this approach, we use a tree structure to represent and evolve the process description. The tree representation has been widely used in Genetic Programming (GP) to evolve solutions that achieve the desired functions [6]. Therefore, this approach is more GP-based than GA-based due to the representation scheme used. A plan tree consists of a group of nodes. The nodes can be either terminal nodes or controller nodes. Every terminal node is a leaf in a plan tree corresponding to an end-user activity in the process description. On the other hand, controller nodes are internal nodes and must have at least one child node in a plan tree. Controller nodes are used to direct the plan execution, and corrspond to the flow control activities in the process description. There is not a one-to-one correspondence between controller nodes in the plan and flow control activities in the process description. We now provide some details of the semantics of each type of controller node and show how they are correlated to the flow control activities. There are four types of controller nodes: sequential, concurrent, selective, and iterative.

Sequential B A

A sequential node does not have a corresponding flow control activity in the process description. As the arrows in the process description specify the sequence of activity execution, we can convert a sequence of activities in a process description into a tree structure with the sequential node as the root. Figure 4 gives an example of such conversion. 2. A concurrent node informs the environment that all activities that correspond to its children can be executed either sequentially or concurrently. If the activities are executed sequentially, they can be executed in any order. Only after all these activities are executed can the execution of the concurrent block of activities be completed.

C

C

(b)

(a)

Figure 4. Process description versus plan tree for sequential activities. (a) a partial process description consisting of a sequence of activities; (b) the corresponding plan tree with the sequential node as the root node.

Each concurrent node corresponds to a pair of Fork and Join activities. Figure 5 gives an example of a partial process description with concurrent execution of activities and the corresponding plan tree. Fork Concurrent A

1. A sequential node requires that all activities corresponding to its children be performed sequentially. The sequence of execution is specified by the relative location of each node among its siblings. Activities are executed from left to right. The leftmost child of a sequential node is executed first; the rightmost child of a sequential node is executed last. Only when the activity of its rightmost child completes can the block controlled by the sequential node terminate and the flow control be transferred to the next control structure.

B

B A

B

Join

(a)

(b)

Figure 5. Process description versus plan tree for concurrent activities. (a) a partial process description consisting of a set of concurrent activities; (b) the corresponding plan tree with the concurrent node as the root node. 3. A selective node informs the environment that only one of the activities corresponding to its children have to be executed. The execution of a selective block can be finished as long as one of the activities is executed. Each selective node corresponds to a pair of Choice and Merge activities. Figure 6 gives an example of a partial process description with selective execution of activities and the corresponding plan tree. 4. An iterative node requires that all activities that correspond to its child nodes must be executed iteratively until some stopping conditions are met.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Choice Selective A

B A

B

Merge

(b)

(a)

Figure 6. Process description versus plan tree for selective activities. (a) a partial process description consisting of a set of selectively executed activities; (b) the corresponding plan tree with the selective node as the root node.

Each iterative node corresponds to a loop in a process description. The loop is formed where a transition in a process description terminates at an activity that has been executed before the activity as the source of the transition. When we convert from a process description to a plan tree, we insert all the nodes within a loop as the children of the iterative node. The sequence of children follows the execution order of the activities in the loop. Figure 7 gives an example of this conversion.

plied recursively, in a top-down manner, until a complete plan tree is generated. The similar methods can be used to convert a plan tree to a process description. The size of a plan tree is defined as the number of nodes in the tree. We set an upper limitation, Smax , to the size of plan trees during the evolution of solutions. The purpose of setting a limitation is to prevent the unlimited growth of trees, also called “bloat”, a commonly observed problem in GP. The value of Smax should be properly set to ensure the efficiency of the search without compromising the quality of solutions. 3.4.2

During initialization, we randomly generate a population of trees as the candidate solutions. These solutions may not encode valid plans for a given problem, but they must conform to the structure of plan trees defined in Section 3.4.1. The size of every initial tree cannot exceed Smax , the limitation on the tree size. The initialization of a plan tree consists of two steps. In the first step, we generate an arbitrary tree structure for a plan of a given size. In the second steps, we instantiate each node in the tree. Every internal node is instantiated with a controller node, which is randomly selected from four controller nodes. Every terminal node is instantiated with an end-user activity. 3.4.3

A

Iterative

A

B

Choice

(a)

Genetic operators

Genetic operators are the driving forces to push the evolution of solutions forward. The operators we use include crossover and mutation.

Merge

B

Solution Initialization

(b)

Figure 7. Process description versus plan tree for iterative activities. (a) a partial process description consisting of a set of iteratively executed activities; (b) the corresponding plan tree with the iterative node as the root node. When we convert a complete process description to a plan tree, the above methods of conversion should be ap-

1. Crossover takes place between a pair of plan trees. We apply a crossover method that is commonly used in GP. This method consists of four steps. First, we select two trees as parents and decide if they can be crossed over. The probability of two trees taking part in crossover is determined by a parameter called “crossover rate”. Second, if the two trees are not crossed over, we keep them and terminate the crossover process. Otherwise, we randomly select a node from each parent. Third, we switch the subtrees associated with the selected nodes between the two parents. As a result, we create two new plan trees, each containing partial plans from both parents. Finally, we replace the parents with the new trees. In case the size of a new tree exceeds Smax , crossover fails and both parents are kept. Figure 8 shows a simple example of how the crossover works on plan trees. 2. Each Mutation consists of three steps. First, we randomly select a node in the tree to be mutated. The

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Sequential

Sequential

Sequential

Concurrent

A

A

D

Selective

E B

F

B

D E

F

Sequential

G

F

G

(b)

E

An activity can be executed only if all its preconditions meet the system state right before the execution of the activity. To evaluate the plan validity fitness, we need to simulate the execution of a plan.

Concurrent

D F

Selective

B

G

C

(c)

Figure 8. An example of crossover performed on two plan trees. (a) two original trees are selected as parents; (b) a node is selected from each parent; (c) two new plan trees are created by switching the subtrees associated with the selected nodes.

probability of a node being selected is determined by a parameter called “mutation rate”. Second, we randomly generate a tree, using the same method as plan initialization. Third, we replace the subtree associated with the selected node with the randomly generated tree. If, however, the new tree exceeds the size limitation, mutation fails and we keep the original tree. Figure 9 illustrates a simple example of mutation on a plan tree. Node “Selective” is selected and is replaced by a randomly generated tree. 3.4.4

E

C (b)

A

D

Concurrent

Figure 9. An example of mutation performed on a plan tree. (a) a node is selected to be mutated; (b) the subtree associated with the selected node is replaced by a randomly generated tree.

Concurrent

B

C (a)

Sequential

Selective

A

G

C (a)

A

D

Selective

Plan Evaluation

The fitness of a plan is evaluated with the following three aspects: 1. Plan validity fitness fv : can every activity in a plan be executed?

During simulation, we follow the sequence of execution of activities and verify their validity. For each activity, we check if the current system state satisfies the preconditions of the activity. If the activity is valid, we update the system state to the one after this activity is executed. The new system state will include all new and modified data resulting from the execution of the activity. If the activity is not valid, we don’t update the system state. This process is continued until the simulation of the complete plan is finished. In case there are “selective node” or “iterative node” in a plan tree, conditional execution of plan is necessary. We need to enumerate each possible flow of execution and simulate the execution of a plan multiple times. If a single activity is simulated multiple times, each execution is counted in the validity check. The fitness of plan validity can be calculated with the following equation: number of activities that are valid during execution fv = total number of activities that are executed

(1)

2. Goal fitness fg : How does the execution of a complete plan reach the goal specifications of the planning problem? After finishing the execution of a complete plan, we get the final state. The evaluation of goal fitness is usually problem specific, meaning the “closeness” between the final state and the goal specification is largely dependent on the characteristics of the computing task. Generally speaking, a “closer” match between the final state and the goal specification results

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

in a higher goal fitness. The following equation shows a simple goal fitness function in which we assume that each goal specification has an equal weight in evaluating the overall goal fitness.

3.4.6

Procedure of the Genetic-based Approach

The following pseudocode describes the procedure of this approach. 1. Initialize population; 2. While some stopping conditions are not met, do

number of goal specifications that the final state satisfies fg = total number of goals specified in the problem

(a) Evaluate the current population; (b) Select the individuals in the current population and form a new population;

(2) If a plan is simulated multiple times due to the conditional execution of some activities, the goal fitness is given as the average goal fitness of each execution.

(c) Crossover; (d) Mutate; 3. Select a plan that has the highest fitness as the final solution.

3. The efficiency of plan representation fr . The efficiency of a plan representation is determined by the number of nodes (including both activity nodes and controller nodes) in a plan tree. The following equation gives a tentative plan efficiency function.

number of nodes in a plan tree . fr = 1 − Smax

(3)

Clearly, 0 ≤ fr < 1. A small plan tree receives a high fr . The overall fitness of a plan is the weighted sum of all three aspects of fitness. f = wv × fv + wg × fg + wr × fr

(4)

where wv , wg , wr are the weights of the three aspects of fitness, respectively, and wv + wg + wr = 1. 3.4.5

(5)

Selection

Selection is a process of picking the individuals from the current population and forming a new population for the next generation. Selection is typically fitness based. An individual with a relatively high fitness has higher chances to be selected than an individual with a relatively low fitness. We use the tournament selection scheme in our approach. We randomly select two individuals from the current population each time and compare their fitness. The individual with higher fitness is selected and duplicated to the next generation. This simple process is continued until we have selected a new population with the same size as the current population.

4

A Case Study: A Virtual Laboratory for Computational Biology

In [7] we discuss an environment for computational biology, used for 3D reconstruction of virus structure in electron microscopy. Experimental data is collected using an electron microscope. Given a set of 2D images of a virus, and an initial model of the electron density map, the goal of the computation is to construct a 3D model of the virus at the finest resolution possible given the physical limitations of the experimental instrumentation. Once we have a detailed electron density map of the virus structure we can proceed to atomic level modeling, namely placing of groups of atoms, secondary, tertiary, or quaternary structures on the electron density maps. The computation is composed of several steps [7]. In the first step, we extract the 2D virus projections from the micrographs. Then, we determine the initial orientation of individual views using an “ab initio” orientation determination program (POD). Next, we execute an iterative computation consisting of 3D reconstruction followed by orientation refinement. The parallel program used for reconstruction is called P3DR and the parallel program for orientation refinement is called POR. The iterative process stops whenever no further improvement of the electron density map at that resolution is noticeable. Then we use a correlation procedure to determine the resolution of the electron density map. The parallel program used for correlation is called PSF. The iterative computation is then repeated at a higher resolution, possibly discarding some of the input data that do not correlate well with the rest. A new approach is now implemented; we create two streams of input data, e.g., by assigning odd numbered virus projections to one stream and even numbered virus projections to the second stream. Then

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Parameters

Values

Population Size Number of Generation Crossover Rate Mutation Rate Smax wv wg

200 20 0.7 0.001 40 0.2 0.5

Table 1. Parameter Settings in the experiments.

we construct two models of the 3D electron density maps and determine the resolution by correlating the two models. The process description, shown in Figure 10, consists of 7 (seven) end-user activities and 6 (six) flow control activities. The pair of Choice and Merge activities in this workflow is used to control the iterative execution for resolution refinement. The computation ends when the resolution is better than the one specified as computation goal. Figure 11 shows the corresponding plan tree. Figure 12 shows the linkage between the ontologies describing the metainformation used by various agents. The ontologies are created and maintained using the Prot´eg´e tool. This software package was developed at Stanford primarily for medical applications and it is distributed freely [4]. Figure 13 shows the instances of the ontologies created for the above computation of 3D reconstruction. These instances are used by the coordination service to automate the execution.

5

Experiments

We test the planning algorithm using the computational biology described in Section 4 as test case. Table 1 shows the parameter settings used in the experiment. We test the algorithm ten times and select the individual with the highest fitness in the final generation as the solution. Then we calculate the average fitness, validity fitness, goal fitness, and the size of solutions over ten runs, shown in Table 2. The experimental results show that this planning algorithm is able to find consistently good solutions for the tested problem. It finds a valid and feasible plan that reaches both the optimal validity and goal fitness in every run. The generated plan trees are also efficient, with an average size of less than ten nodes.

Average Fitness Average Validity Fitness Average Goal Fitness Average Size of solutions

0.928 1.0 1.0 9.7

Table 2. Experiment results collected from the best solutions of ten runs.

6

Summary and Conclusions

This paper advocates the use of intelligent agents for the coordination and control of complex tasks in heterogeneous environments. At the present time, scripts written in some scripting language such as Perl or Python [5], represent the favorite approach to coordinate complex tasks in grid environments. While scripts have the advantage of simplicity, they are limited in their ability to deal with a heterogeneous environment and with very complex tasks with changing process descriptions. Lacking the ability to infer new facts from existing ones, to construct new process description through planning, and possibly to learn, the current generation of scripting languages seem more suitable to coordinate execution of well-defined tasks on homogeneous and stable environments. In our opinion, the use of intelligent agents in heterogeneous environments is long overdue [1]. The field of intelligent agents has reached a degree of maturity. Java is a widely popular programming language, and several versatile agent platforms based upon Java have gained recognition. Jade is one of them. Tuplespaces such as the TSpaces from IBM [2] facilitate cooperation in multi-agent systems. Expert system shells such as Jess [3] are available. The main obstacle to the implementation of intelligent frameworks for grid computing is the difficulty in assembling the metainformation critical to an intelligent environment. We need to construct a fair number of ontology classes to describe various objects manipulated by the agents and define the slots identifying the relevant properties of the objects. This is indeed a daunting task. The most difficult problem we encountered in our design is the choice of a coordination language and the construction of various ontologies. It is interesting to note that while some of the application domains, such as electron microscopy, have already started defining such ontologies, less progress is noticeable in computer-related areas where ontologies for scheduling, planning, for hardware and software resources are in their infancy.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

7

Acknowledgments

IEEE Computer Society Press, Los Alamitos, CA, ISBN 07695-1926-1, 2003.

This research is supported in part by the National Science Foundation grants MCB9527131, DBI0296107, ACI0296035, and EIA0296179.

References [1] L. B¨ol¨oni, K. Jun, K. Palacz, R. Sion, and D. C. Marinescu. The Bond agent system and applications. In Proc. of the 2nd International Symposium on Agent Systems and Applications and the 4th International Symposium on Mobile Agents (ASA/MA 2000), Lecture Notes in Computer Science, volume 1882, pages 99–112, Heidelberg, 2000. SpringerVerlag. [2] I. Corporation. Tspaces: intelligent connectionware. [3] E. Friedman-Hill. Jess, the java expert system shell, technical report sand98-8206, sandia national laboratories, 1999. [4] W. E. Grosso, H. Eriksson, R. W. Fergerson, J. H. Gennari, S. Tu, and M. A. Musen. Knowledge modeling at the millennium (the design and evolution of prot´eg´e-2000). In Proc. of the 12th International Workshop on Knowledge Acquisition, Modeling and Mangement (KAW’99), Banff, Canada, 1999. [5] J. Hugunin. Python and java: The best of both worlds. In Proc. of the 6th International Python Conference, San Jose, California, 1997. [6] J. R. Koza. Genetic Programming. MIT Press, Cambridge, MA, 1992. [7] D. Marinescu and Y. Ji. A computational framework for the 3d structure determination of viruses with unknown symmetry. Journal of Parallel and Distributed Computing, 63:738– 758, 2003. [8] D. C. Marinescu. Internet-Based Workflow Management: Towards a Semantic Web. Wiley, New York, NY, 2002. [9] I. Muslea. SINERGY: A linear planner based on genetic programming. In Proc. of the 4th European Conference on Planning. Springer, 1997. [10] L. Spector. Genetic programming and ai planning systems. In Proc. of the 12th National Conference of Artificial Intelligence, pages 1329–1334, 1994. [11] C. H. Westerberg and J. Levine. GenPlan: Combining genetic programming and planning. In Proc. of the 19th Workshop of the UK Planning and Scheduling Special Interest Group (PLANSIG 2000), Open University, Milton Keynes, UK, 2000. [12] T. Winograd. Language as a cognitive process. AddisionWesley, Reading, MA, 1983. [13] B. Xin, H. Yu, G. Wang, Y. Ji, and D. C. Marinescu. Coordination in intelligent grid environments. In (in preparation), 2004. [14] B. Xin, H. Yu, G. Wang, Y. Ji, and D. C. Marinescu. An intelligent environment for grid computing. In (in preparation), 2004. [15] H. Yu, D. C. Marinescu, A. S. Wu, and H. J. Siegel. A genetic approach to planning in heterogeneous computing environments. In 12th Heterogeneous Computing Workshop (HCW 2003), CD-ROM Proc. of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003).

Han Yu Han Yu is a Ph.D. student in the School of Computer Science at the University of Central Florida (UCF). He received a B.S. degree from Shanghai Jiao Tong University in 1996 and an M.S. degree from the University of Central Florida in 2002. His research areas include genetic algorithms and AI planning. Xin Bai Xin Bai is a Ph.D. student in the School of Computer Science at the University of Central Florida (UCF). He received a B.S. degree and an M.S. degree from Northern Jiaotong University in 1993 and 1998, respectively. His research areas include grid computing and multi-agent system. Guoqiang Wang Guoqiang Wang is a Ph.D. student in the School of Computer Science at the University of Central Florida (UCF). He received a B.S. degree from Southeast University in 2001. His research areas include grid computing and multi-agent system. Yongchang Ji Yongchang Ji received his M.S. and Ph.D. degrees in computer science from the University of Science and Technology of China in 1996 and 1998, respectively. In 2000-2001 he was a Post-Doctoral Research Associate with the Department of Computer Sciences of Purdue University, West Lafayette, Indiana, and now he is a Post-Doctoral Research Associate with the School of Computer Sciences, University of Central Florida, Orlando, Florida. His research interests include Scientific Computing, Grid Computing, and Parallel and Distributed System Architecture, Algorithms, and Tools. He has co-authored several publications in these areas. Dan C. Marinescu Dan C. Marinescu is Professor of Computer Science at University of Central Florida. He is conducting research in parallel and distributed computing, scientific computing, Petri Nets, and software agents. He has co-authored more than 150 papers published in refereed journals and conference proceedings in these areas. He is the author of the book Internet-based Workflow Management published by Wiley in 2002, has co-edited Process Coordination and Ubiquitous Computing published by CRC Press in 2002. His most recent book Lectures on Quantum Computing will be published by Prentice Hall in 2004.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

BEGIN TR1 POD TR2 P3DR1 TR3

MERGE TR4 Sequential

POR TR5 POD

FORK TR6

P3DR2

TR9

P3DR1

TR8

TR7 P3DR3 TR10

POR

P3DR4

Concurrent

PSF

TR11

JOIN TR14

Iterative

P3DR2

P3DR3

P3DR4

TR12

Figure 11. The corresponding plan tree to the process description for the 3D reconstruction of virus structures.

PSF TR13

CHOICE TR15

END

Figure 10. The process description for the 3D reconstruction of virus structures. POD - “ab initio” parallel orientation determination program. P3DR - the parallel program used for 3D reconstruction. POR - the parallel program for orientation refinement. PSF - parallel program to compute the correlation of the structure factors.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Task Process Description ID Name Location Activity Set Transition Set Creator

ID Name Owner Submit Location Status Data Set Result Set Case Description Process Description Need Planning

Transition ID Source Activity Destination Activity

Data Name Location Time Stamp Value Category Format Owner Creator Size Creation Date Description Latest Modified Date Classification Type Access Right

Software Name Type Manufacturer Version Distribution

Activity ID Name Task ID Owner Service Name Type Execution Location Input Data Set Output Data Set Input Data Order Output Data Order Status Constraint Work Directory Direct Predecessor Set Direct Successor Set Retry Count Dispatched By

Resource Name Type Location Number of Nodes Administration Domain Hardware Software Access Set

Case Description ID Name Initial Data Set Result Set Constraint Goal Condition

Service Name Type Time Stamp User Set Location Creation Date Version Description Command History Input Condition Output Condition Input Data Set Output Data Set Input Data Order Output Data Order Cost Resource

Hardware Type Speed Size Bandwidth Latency Manufacturer Model Comment

Figure 12. Logic view of the ontology structure used by the framework.

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Task Task ID T1

Name 3DSD

Owner UCF

Process Description PD-3DSD

Case Description CD-3DSD

Process Description Name PD-3DSD

Case Description

Activity Set {BEGIN, POD, …, END}

Name CD-3DSD

Transition Set {TR1, TR2, … ,TR15}

Initial Data Set {D1, D2, … , D7}

Activity Name BEGIN POD P3DR1 MERGE POR FORK P3DR2 P3DR3 P3DR4 JOIN PSF CHOICE END

Goal Result Set {D12}

Transition

ID Task ID A1 T1 A2 T1 A3 T1 A4 T1 A5 T1 A6 T1 A7 T1 A8 T1 A9 T1 A10 T1 T1 A11 T1 A12 T1 A13

Type Begin End-user End-user Merge End-user Fork End-user End-user End-user Join End-user Choice End

Service Name

Input Data Set

Output Data Set Constraint

POD P3DR

{D1, D7} {D2, D7, D8}

{D8} {D9}

POR

{D5, D7, D8, D9}

{D8}

P3DR P3DR P3DR

{D3, D7, D8} {D4, D7, D8} {D2, D7, D8}

{D10} {D11} {D9}

PSF

{D10, D11}

{D12}

Cons1

ID TR1 TR2

Source Actvity BEGIN POD

Destination Activity POD P3DR1

TR3 TR4 TR5

P3DR1 MERGE POR

MERGE POR FORK

TR6 TR7 TR8

FORK FORK FORK

P3DR2 P3DR3 P3DR4

TR9 TR10 TR11

P3DR2 P3DR3 P3DR4

JOIN JOIN JOIN

TR12 TR13 TR14

JOIN PSF CHOICE

PSF CHOICE MERGE

TR15

CHOICE

END

Data Name D1 D2

Creator User User

D3 D4 D5 D6

User User User User

D7 D8 D9 D10

User POD, POR P3DR1,P3DR4 P3DR2

D11 D12

P3DR3 PSF

Size 3K

1.5G

Classification POD-Parameter P3DR-Parameter

Format Text Text

P3DR-Parameter P3DR-Parameter POR-Parameter PSF-Parameter

Text Text Text Text

2D Image Orientation File 3D Model 3D Model

Service Name

Input Data Set Input ConditionOutput Data Set Output Condition

POD P3DR POR

{A, B} {A, B, C} {A, B, C, D}

C1 C3 C5

{C} {D} {E}

C2 C4 C6

PSF

{A, B, C}

C7

{D}

C8

3D Model Resolution File

Conditions: C1: A.Classification = "POD-Parameter" and B.Classification = "2D Image" C2: C.Type = "Orientation File" C3: A. Classification = "P3DR-Parameter" and B. Classification = "2D Image" and C. Classification = "Orientation File" C4: D.Classification = "3D Model" C5: A.Classification = "POR-Parameter" and B.Classification = "2D Image" and C.Classification = "Orientation File" and D.Classification = "3D Model" C6: E.Classification = "Orientation File" C7: A.Classification = "PSF-Parameter" and B.Classification = "3D Model" and C.Classification = "3D Model" C8: D.Classification = "Resolution File"

Constraints: Cons1: if (D10.Classification=”Resolution File” and D10.value>8) then Merge else End

Figure 13. Instances of the ontologies used for enactment of the process description in Figure 10

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

0-7695-2132-0/04/$17.00 (C) 2004 IEEE