Toward A Meta-Grid Middleware
1
Toward A Meta-Grid Middleware Heithem Abbes1,2, Christophe Cérin1, Mohamed Jemni2, Walid Saad2 1 LIPN/UMR 7030 — CNRS/Université Paris 13 99,avenue Jean-Baptiste Clément,93430 Villetaneuse,France 2 Research Unit UTIC,ESSTT — University of Tunis 5,Av. Taha Hussein,B.P. 56,Bab Mnara,Tunis,Tunisia {heithem.abbes,christophe.cerin}@lipn.fr
[email protected],
[email protected]
Abstract Institutional Desktop Grid systems are attractive for running distributed applications with significant computational requirements. While the rapid increasing number of users and applications running on such systems does demonstrate the potential of Desktop Grid, current implementations follow the old-fashioned master-worker paradigm. Obviously, vulnerability to failures and permanent administrative monitoring are the disadvantages of clientserver architectures. Moreover, it is important to exploit existing computing systems in order to make a meta-grid middleware able to support any kind of applications. To bypass this, we propose a novel system, called BonjourGrid, able to orchestrate multiple instances of Institutional Desktop Grid middlewares, able to remove the risk of single-source bottleneck and failure, and able to guaranty the continuity of services in a distributed manner. We choose XtremWeb-CH, Boinc and Condor as computing systems in BonjourGrid. Thus, BonjourGrid can create a specific environment for each user based on XtremWeb-CH, Boinc or Condor. In addition, BonjourGrid can be adapted to fulfill all the requirements of a decentralized job scheduler. In this paper, we set experimentations using the three cited systems: Boinc, Condor and XtremeWeb-CH, in order to analyze the overhead generated by BonjourGrid and its capacity to manage multiple instances of Desktop Grid. The evaluation proves that BonjourGrid is able to manage more than 400 applications instantiated in a concurrent way on an Institutional Desktop Grid using more than 1000 machines. Analyzing the execution of 405 applications with 2110 tasks during 3 hours demonstrates the potential of BonjourGrid concept and shows that, comparing to a classical Desktop Grid, such as Condor, Boinc and XtremWeb-CH, with one central master, BonjourGrid gives an acceptable overhead. Keywords: Desktop Grid, Resource management, Meta Grid, Middleware.
APSCC-08-03二校.indd
1
1 Introduction Desktop Grids have been successfully used to address large applications with significant computational requirements, including search for extraterrestrial intelligence (SETI@Home), global climate prediction (Climatprediction.net), and cosmic rays study (XtremWeb). While the success of these applications demonstrates the potential of Desktop Grid, existing systems are often centralized and suffer from being relying always on an administrative staff who guarantees the operation of the master. Although the failure of the master is not frequent and replication techniques can resolve this problem when it occurs, we still believe in the need to decentralized approaches since all computing departments do not have the same type of material quality so a master crash can disable the whole system. In this context, we have proposed, in a previous work [2], a novel approach, called BonjourGrid, that orchestrates existing local desktop grid system (Institutional Desktop Grid or Enterprise Desktop Grid [7][6]), in a decentralized manner. BonjourGrid creates dynamically a Computing Element (CE) (a CE is a set of workers managed by one master) when a user needs to run an application. In addition, in [1], we have described how BonjourGrid can act as a multi-job scheduler by creating a specific CE for each application. In the two above-mentioned works, we have used only one system, which was XtremWeb-CH [3], to build computing elements. This paper is an extension and gives more details of the concept of BonjourGrid with tow other computing systems, which are Boinc and Condor. We have conducted several experimentations, with the three computing systems: XtremWeb-CH, Boinc and Condor, on the distributed platform Grid’5000, using more than 300 machines and 1000 virtual ones. This paper is organized as follows. Section 2 presents an overview of BonjourGrid and outlines its fundamental components. Section 3 gives a detailed description of the design of BonjourGrid and the interaction between user, services, and resources. Section 4 gives a detailed description of the applications used in the experimentations;
2009/12/29
下午 06:16:33
2
Journal of Internet Technology Volume 11 (2010) No.1
it describes also the application generator and the workload model. Section 5 illustrates how BonjourGrid can be used as a decentralized job scheduler. Section 6 shows the experimental setup and analyses the obtained measures using XtremWeb-CH, Boinc and Condor. Section 7 summarizes and discusses related works. Section 8 concludes the paper and suggests some future work.
2 Overview of BonjourGrid BonjourGrid has the following vision: in the same institution, all the machines have the same role and each user has a desktop machine on his desk. A user requests for a computation, provides tasks graph and codes implementing his distributed application (we developed a graphical tool called SDAD to make this operation easier for the user). BonjourGrid deploys a master node, locally, on the user machine, and requests for participants (workers). Negotiations to select them should now take place using a publish/subscribe infrastructure. Each machine publishes its state (idle, worker, or master state) when changes occur and also information about its local load or its use cost, in order to provide useful metrics for choosing participants. Under these assumptions, the master node can select a subset of workers nodes according to a strategy that could balance the “power” of the node and the “price” of using it. When a master node finishes the application, it becomes free and returns to an idle state and releases all the owned workers to return also to idle state. When no application is submitted, all machines are in the idle state. The key idea of BonjourGrid is to make a meta-grid middleware relying on existing Institutional Desktop Grid middlewares, and to orchestrate and coordinate multiple instances through a Publish/Subscribe system (see Figure1). Each instance will be responsible for one application. Because the following middlewares are success stories, we choose to interface BonjourGrid with XtremWeb [4], BOINC [9] and Condor [11].
Figure 1 BonjourGrid: The user A (resp. B) Deploys Locally a Coordinator on His Machine with NA = 4 (resp. NB = 5) workers. Level 1 Shows the Infrastructure State after the Building of Two Computing Elements for A and B. Level 0 Presents the Fabric Layer of the Infrastructure.
APSCC-08-03二校.indd
2
BonjourGrid is built on top of Bonjour [14] protocol. Bonjour is an implementation by Apple of the ZeroConf protocol [10]. Its goal is to obtain a functional IP network, which does not depend on any infrastructure including DHCP or DNS servers. It is a multicast IP network. Many assumptions have conducted to use a Publish/Subscribe system, in general and Bonjour in particular: (1) Bonjour is an industrial protocol validated by Apple, (2) Different versions for three kinds of OS (Windows, Linux, MacOS) already exist (3) Many Linux distributions include dns-sd daemon (the basic daemon needed for Bonjour), the same thing for MacOS, (4) With the increasing in bandwidth capacity, we believe that we do not risk to have a network congestion or saturation. To prove this, in a previous work, we carried out experiments on Bonjour using the Grid5000 [16] platform over more than 300 machines AMD Opterons connected with a 1Gb/s network (a great setup for an institutional desktop grid). Measurements show that Bonjour is reliable and very powerful in resources discovery. Indeed, it discovers more than 300 registered services (0% of loss) published simultaneously in less than 2 seconds.
3 Service Oriented Architecture for Building a Computing Element Each machine in BonjourGrid has one to three states (Idle, Worker or Coordinator). Each state is associated to a service: IdleService for idle machine, WorkerService for worker and CoordinatorService for the coordinator state. When a machine changes its state, it publishes the corresponding service to notify its new state after having deactivated the old one. 3.1 From Idle to Coordinator State We suppose that each user keeps his machine connected to the desktop grid during the execution of his application in order to avoid managing the fault-tolerant problem between coordinators which is not the scope of this article but for which we have developed solutions. A user submits his application through the local user interface of BonjourGrid (installed on his machine) and selects the desired computing system among XtremWebCH, Boinc or Condor. The user machine changes, now, its state into the coordinator state in order to initiate the construction phase of a new CE with the selected computing system. If this machine is a worker for another application, BonjourGrid waits until it finishes the task in progress then launches the coordinator in order to not degrade the machine performance. This is a preliminary choice; the next version will support the migration of a job to another idle machine, if there is no idle machine, BonjourGrid informs the coordinator to put the job in background until it finds an
2009/12/29
下午 06:16:33
Toward A Meta-Grid Middleware
idle machine or a worker finishes its task, so that the user will not wait for a long time, especially, if his machine runs a long job. The coordinator starts by running a discovery program (or a Browser) on idle machines. The discovered machines are recorded in an IdleMachineDict dictionary (BonjourGrid is developed entirely in Python). From this data structure, the coordinator selects machines which fit the application requirements and creates a new MyWorkersDict dictionary. The coordinator continues the research, i.e., its browser remains listening on idle machines, until the size of MyWorkerDict reaches the number of required machines (i.e., the size of the CE). Thereafter, the coordinator stops the browser. Now, the coordinator checks if selected workers accept to work for it. Indeed, the coordinator publishes for each selected worker, a new “RequestWorker” service using the “Worker Name” as service type. At this point, note that the browser program listens on service type and not on service names. The confirmation of participation of the “A” worker to the “C” coordinator relies on the fact that “A” publishes a new “MyConfirmation” service putting as service type the name of the “C” coordinator. Only the “C” coordinator runs a browser program on services of type. Hence, if the browser discovers new registration of a service of a certain type, it means that there is a new idle machine, which has accepted to work for it. If confirmations number does not reach the size of the CE, the coordinator launches again the browser on idle machines, but now with the number of missing machines only. It is possible that more than one coordinator select the same worker “A” and publish a service “WorkerRequest” with the same service type (PTR: WorkerA). Thus, there may be contention in the access to the “A” worker. BonjourGrid provides efficient and simple feedbacks on that. Indeed, an idle machine “A” confirms its participation to the first service “RequestWorker” discovered (i.e., the first coordinator), knowing that the protocol Bonjour guarantees that the browser does not discover more than one service at the same time. In the middleware layer (i.e., XtremWeb-CH, Boinc or Condor), BonjourGrid establishes the connection b e t w e e n C o o r d i n a t o r a n d Wo r k e r s , w h i c h h a v e accepted to participate, according to the policy of the middleware. Indeed, when an idle machine publishes the CoordinatorService, it launches also the Coordinator program. This program remains listening to new connections of workers. In the same way, when an idle machine browses “RequestWorker” of a coordinator and publishes the service “MyConfirmation”, it starts also the Worker program with the IP address of the coordinator (the address is stored in an attribute of the “RequestWorker” service). Since the first connection of a Worker, the Coordinator submits the
APSCC-08-03二校.indd
3
3
first task of the application and remains ready to other connections from other workers. If the coordinator does not find available idle machines, the application finishes its execution with only connected Workers before reaching the required size of the CE. When the application finishes its execution, the coordinator deactivates its CoordinatorService service and stops Coordinator program. It returns into its initial state by publishing the IdleService service again. Figure 2 illustrates the steps of the protocol from idle status to coordinator status.
Figure 2 Steps to Transform an Idle Machine into a Coordinator
3.2 From Idle to Worker State Figure 3 presents states diagram to describe the steps taken by a host to change its state from idle to worker. Indeed, when a host joins BonjourGrid system, it is recorded with the initial Idle state. This host remains listening to notifications of coordinators which need its participation. When the machine discovers the participation request, it answers to this request by publishing a “MyConfirmation” service as described above. Thereafter, the machine stops the browser, removes the IdleService service and publishes the WorkerService one to announce its new state: it is no longer free, now it becomes a worker. The worker runs the XWWorker program with the IP address of the coordinator. The worker remains in possession of the coordinator while it is alive. It runs a browser in order to listen to new events from its coordinator, especially its “death”. When the coordinator is stopped, the worker removes the WorkerService and stops the XW-Worker process. Finally, the machine returns to the initial state by publishing the IdleService again. In this way, the coordinator does not need to contact its workers to release them.
2009/12/29
下午 06:16:33
4
Journal of Internet Technology Volume 11 (2010) No.1
Figure 3 Steps to Transform an Idle Machine into a Worker
4 Type of Applications In addition to bag-of-task applications (BoT), which do not take into account the dependence between tasks, BonjourGrid supports also, distributed applications (DA) with dependencies. Indeed, this feature is offered by supported computing systems such as XtremWeb-CH and Condor. XtremWeb-CH [3] is an improved version of XtremWeb that we use in BonjourGrid as one of the computing system. Although it was shown that XtremWeb, Condor and Boinc support a great number of applications, we want to evaluate BonjourGrid when it manages multiple instances of CE carrying out realistic jobs. Since, we aim in this work to evaluate BonjourGrid as a decentralized scheduler and not to evaluate XtremWeb-CH, Boinc or Condor in running complex distributed applications, we focus on the capacity of BonjourGrid in the management of several instances of local desktop grid. 4.1 Workload Model Evaluating a system with a set of specific applications does not allow necessarily its real behavior evaluation. In the same way, using the arrival pattern of the tasks applying the Poisson’s law does not reflect the reality of arrival models. Thus, in this work, we use the workload models proposed by Feitelson which make it possible to generate a workload model very close to the reality. In fact, among several workload programs proposed by Feitelson and al., we choose the one of Feitelson and Lublin [8] because it fits our requirements
APSCC-08-03二校.indd
4
Indeed, we need a workload in which we can specify the system size (number of machines), the period of arrival time of jobs, the maximum number of parallel tasks per application and the maximum runtime of a task (see Table 1 for an example of output results). Feitelson and Lublin mention that this workload is generated for rigid jobs. They are jobs that specify the number of processors they need, and run for a certain time using this number of processors. This matches our requirements well, since we need to construct several CE with different sizes. Then an application is a set of parallel tasks. But, in order to detect the final date of an application, we inserted in each one a fictitious task (a “gather” task) which starts its execution only after having received all the results (fictitious) of the preceding tasks. The number of tasks varies from 2 to N which is the maximum number of machines available in the system. An application with k tasks means that k−1 tasks are parallel and the kthtask is the gather task. The tasks of a given application, except the gather, make the same sleeping time according to the runtime mentioned in the workload model; thereafter they send a file (fictitious) to the gather. Table 1 A Part of the Workload Output
Application ID 1 2 3 4
Arrival Time(s) 19 39 69 98
Run Time (s) 4 11 13 87
Number of // tasks 32 13 16 128
4.2 A Generator of Applications We have also developed a generator of applications in Python, which takes as parameter the workload model and provides a set of applications (one application for each entry in the workload). Indeed, for each entry in the workload (see Table 1), the generator uses the configuration presented in the entry to create a compressed file containing the XML description of the dataflow graph. This file is used by XtremWeb-CH to describe what are the parallel tasks and the different precedence between application tasks. As mentioned before, the tasks of the same application carry out a sleep for simulating the runtime of the corresponding entry in the workload. The gather is a fictitious task, which is useful just as a barrier to announce the application end. The generator contains also the necessary directives to build an application description according to the syntax of Boinc and Condor.
2009/12/29
下午 06:16:34
Toward A Meta-Grid Middleware
5 BonjourGrid as a Decentralized Job Scheduler As mentioned before, we are interesting in Institutional (or Enterprise) Desktop Grid (i.e., all machines are in the same institution). In this kind of environments, each machine is owned by one user. The experiment consists in emulating a set of users who submit applications to the BonjourGrid system. BonjourGrid will act as a decentralized job scheduler for our emulation system. It assigns an application to the first free machine that will construct a CE to run the application. For that, we suppose that each user has only one application in order to create the maximum of CE instances. Applications have different sizes (i.e. the number of parallel tasks and turnaround time of a task, see Table 1). In BonjourGrid, each user, who submits an application, creates automatically and in a transparent way a coordinator on his machine (he cannot use another free machine as a coordinator for his application), which will be responsible for the execution of his application. In our experiment we make possible that a user can use another free machine as a coordinator for his application. The selection of coordinator, in this case, is done in an automatic way using a list of machines as detailed later. We aim at analyzing the behavior of BonjourGrid comparing to the classical master/worker model using the same computing system (XtremWeb-CH, Boinc and Condor). We want to compute the overhead generated by the dynamic creation of a CE for each application. Until this moment, BonjourGrid does not have any system for the failure of a coordinator. Indeed, in the execution level of an application, XtremWeb-CH, Boinc or Condor, can manage the fault tolerance problems caused by workers. Although, these systems offer this feature, we still have the problem of a faulty coordinator. Among many alternatives, we can solve this problem, using checkpoints system to maintain the last state of the coordinator. Another alternative consists in replicating the coordinator in several machines. But, we do not deal with this problem in this paper; we assume that grafting existed solutions can solve the problem. We are focusing on how is it possible to manage several CE instances with different sizes and computing systems.
6 Experimentation and Validation From a front-end machine, we submit the set of the applications. We developed a program which takes as parameter (1) the workload model, (2) the path of the applications and (3) the list of machines which will be used as coordinators or workers. All the machines are initialized with the idle state. The program submits the applications
APSCC-08-03二校.indd
5
5
following the arrival pattern times in the workload. A submission of an application consists in selecting the first free machine from the list on which a coordinator will be launched to start the construction step of CEs, as well as the execution of the application. Each machine which changes its state from idle towards a worker or coordinator state will be locked; a temporary file/tmp/occupied is created to mention that it is occupied during the turnaround time of the application. The workers are released when the coordinator is deactivated. The coordinator is deactivated automatically when the application is finished. In the current version of BonjourGrid, we give more control to the user: he may deactivate the coordinator when his application is finished or when he does not need more application to launch. To make a successful experiment, we have improved the initial version by adding a new layer composed of (Shell and SQL) scripts which detect the end of applications by periodically consulting the database of the coordinators to see whether the state of the application is “Finished” or not. For the analysis of the obtained results, it is important to note that, especially for this experimentation, the database of the coordinator is erased each time the coordinator is deactivated, because the same machine may be used as a coordinator for another application, which may cause an overlapping in the detection of the application end. It is an implementation choice made for XtremWeb-CH and Boinc. 6.1 BonjourGrid and XtremWeb-CH We carried out our first experiments on the Grid5000 platform using 128 machines (AMD opterons) on the local Orsay’s node. With the application generator, we created 100 applications with different parallel tasks numbers. It varies from an application to another one from 2 to 128 tasks. We generated 2110 tasks for the 100 applications. The run time of a task varies from 1 to 150 seconds according to the workload model. The arrival pattern is set to a period of 3 hours. This configuration is applied for BonjourGrid, which dynamically creates a CE for each application. The same configuration is used for the centralized XtremWeb-CH model, which uses only one central coordinator and 127 static workers to carry out the 100 applications. In XtremWeb-CH, if an application has N parallel tasks and if there is only k free workers (k < N), the coordinator starts the execution using k workers, and waits for new “incoming” or “leaving” workers. The same principle is applied in BonjourGrid. Then, the coordinator may finish all application tasks with k workers k < N if there are no sufficient free machines. This detail is important to understand measurements analysis.
2009/12/29
下午 06:16:34
6
Journal of Internet Technology Volume 11 (2010) No.1
Among the traces obtained after the execution of the 100 applications, we illustrate on Figure 4 the difference between the submit time of applications, i.e. the request for a machine to be used as a coordinator to launch an application, and the time to obtain it. Indeed, if an application contains
128 parallel tasks, it will occupy all the system during its execution. If a new submission arrives before the end, it should wait so that the coordinator releases the machines. Figure 5 illustrates the difference between end times of the applications running XtremWeb and BonjourGrid.
Figure 4 Time Difference between Submit Request and CE Starting in BonjourGrid. Axis Y1 (on the left) Represents the Time in Seconds and the Second Axis Y2 (on the right) Represents Tasks Number per Application
On Figure 4, we focus on analyzing the pics first. For instance, the application n°68 comprises 128 tasks, the related coordinator will try to find 128 workers (ideal case) to build the CE. Thus this application will occupy all the free machines in the system for at least its turnaround time of a task (108 sec) plus the communication time between the workers and the coordinator. Indeed, in XWCH version a worker sends a WorkRequest message to the coordinator each 30 seconds. After a short time (39 sec), a new submission application is arrived to the system, thus BonjourGrid will not find any free machine and must wait 212 sec (runtime of task in the workload), at least, to launch a coordinator on the first released machine. In general, we note that 6% of applications exceeded by 1 minute to find a free machine on which BonjourGrid launches a coordinator. But the majority (90%) of applications requests less than 30 seconds. We remind here that a CE is a master and a set of workers that means: if BonjourGrid finds a free machine for the coordinator, it is not sufficient to start execution; it should also find free machines for workers. Concerning the turnaround time of applications, we measure the difference between the date of ending an application and its submission one. The overhead generated by BonjourGrid includes the waiting time to find a free machine (coordinator) plus the building time of a CE (connection of workers to the coordinator) plus the time to erase the database of the coordinator when it is released. On Figure 5, the average overhead is about 2 minutes by application. The peaks in the curves are explained by the
APSCC-08-03二校.indd
6
saturation of the system (i.e. there is no more free machines). We notice that each peak has at least one previous application which uses more than 120 workers. For some applications (e.g., numbering 51, 63, 68), BonjourGrid gives better turnaround times. This is explained by the fact that in XW: (1) The coordinator is overloaded and needs more time to allocate machines and to affect tasks; (2) It is possible that the master looses connections to the workers since it uses the same port to receive all heart beat signals emitted by them each 30 seconds to mark their presences. In contrast with XW, using BonjourGrid, a coordinator is responsible for only one application at the same time and listens only to the workers which carry out its application; (3) The coordinator assigns a task to a worker if this one sends a WorkRequest message (this concept is called a Back-off effect, it is useful to bypass the firewall). The worker sends this signal each 30 seconds. It is possible that the submission date is just one second after this request, and then the master is obliged to wait another WorkRequest signal to submit the application if there is not any other free worker. This is a waste of time especially as the back-off time of XW (30 seconds) does not reflect the reality of existing Institutional Desktop Grid systems. It is too small and a more realistic back-off time is around 5 minutes at least. In contrast with XW, in BonjourGrid, the submission initiates the CE, i.e., the coordinator is started with 0 workers but has already the list of tasks in its queue, thus, from the first connection, each worker catches a task.
2009/12/29
下午 06:16:34
Toward A Meta-Grid Middleware
7
Figure 5 End Time Difference between BonjourGrid and XtremWeb. Y2 (on the right) Gives Tasks Number per Application
We have realized a second experiment aiming to confirm our previous analysis. Indeed, we have added 20 machines (15%) to the 128 ones of the previous experiments. We would like to relax the saturation of the system by increasing the number of free machines. Figure 6 shows that the difference in end times, comparing to the curve of Figure 5, decreased considerably. In the curve of Figure 6, there is just one peak (because there is three previous applications which used more than 128 workers) with an attenuation of 140 seconds. This improvement is explained by (1) there is more chance to find a free machine to launch a coordinator, hence, we reduce this difference between the application submission date and its CE building start time, and (2) because, as mentioned before, the backoff effect can delay the application tasks submission in XW, but not in BonjourGrid. This improvement confirms that the
first setup with only 128 machines and 2110 tasks is a very stressful scenario for BonjourGrid. Figure 6 confirms that if there are sufficient free machines, BonjourGrid gives an acceptable overhead. Yet, it is important to make more experiments over a large number of machines to check if BonjourGrid scale well or not, i.e., can it is able to orchestrate a great number of CEs? In that respect, we performed out two other experiments using the virtual system Vgrid [15]. Vgrid provides a mechanism to create several virtual machines on the same host, then, the different virtual machines communicate through a virtual hub created on each physical machine (or real machine denoted by RM in the following). Indeed, when we mention that we have 500 virtual machines that means: X VM*Y RM; for instance 10 VM*50 RM or 5 VM*100 RM. We have used Vgrid to reach 1000 hosts.
Figure 6 End Time Difference when BonjourGrid Has 15% Supplementary Machines
Figure 7 illustrates that the use of virtual machines did not give good results. For that reason, we have picked out the execution of the 50 applications already executed on 128 real machines (RM) in order to observe the impact of virtual machines (VM) use. We point out that virtual machines lacked sufficient RAM to manage many opened sockets at the same time and to allocate necessary memory for the Java virtual machine. This explains the fact that: the increase of machines number from 128 real machines to 500
APSCC-08-03二校.indd
7
virtual ones (4 VM * 125 RM) did not enhance turnaround times. Indeed, in most cases, times reached on many virtual machines are worst than times obtained on less real ones, for both BonjourGrid and XW. Despite the fact that use of VM has a negative impact on turnaround time, we intend to go on in experiments, since this allow us to test whether BonjourGrid can manage a large number of CEs over a large grid composed of hundred of nodes.
2009/12/29
下午 06:16:34
8
Journal of Internet Technology Volume 11 (2010) No.1
To this end, we have performed out an experimentation with 405 applications over 1000 virtual machines. We have launched 4 virtual machines per real machine (4 VM * 250 RM). The number of parallel tasks per application varies from 2 to 128. The majority of applications have more than 32 parallel tasks and 20% have more than 60 parallel tasks. From Figure 8, we deduce that BonjourGrid performs better around the 380th application. In fact, the XW-Coordinator becomes overloaded after this number of submitted applications and then it takes more time to update its database, to get results from workers and to submit new tasks. Moreover, it is important to mention that for XW evaluation, we have chosen to run a specific virtual machine with 1500 MB of RAM for the coordinator. However, the XW-coordinator does not succeed to exceed 500 connections of workers. This limitation is caused by the low performance of virtual machines. In contrast with XW,
for BonjourGrid evaluation, we kept the same configuration for all virtual machines (300 MB per virtual machine) and then the coordinator in BonjourGrid does not have the same performance than the XW-coordinator. Thus, performance of virtual machines does not give the correct turnaround times of applications. We remind that these two last experiments are performed out to show that BonjourGrid can scale and manage more than 400 CE with different sizes. The main features of BonjourGrid are the self-organization and the decentralization. The performance of BonjourGrid depends on the behavior of the computing system (i.e., XW in the current experiment). In that respect, we have realized other experiments with other computing systems such as BOINC or Condor to analyze the performance of BonjourGrid in that context.
Figure 7 End Time Difference between BonjourGrid and XtremWeb for 50 Spplications with 500 Virtual Machines
Figure 8 End Time Difference between BonjourGrid and XtremWeb for 405 Applications with 1000 Virtual Machines
APSCC-08-03二校.indd
8
2009/12/29
下午 06:16:34
Toward A Meta-Grid Middleware
6.2 BonjourGrid and Boinc After setting a specific image containing all requested modules to run Boinc (server and client), we have realized experimentations on the Grid’5000 testbed using more than 200 machines (AMD Opterons) of the Orsay’s node. With the application generator, we generate 130 applications with different parallel tasks. They vary from 2 to 128 tasks. The execution time varies from 1 to 500 seconds according to the workload model. The arriving pattern is set to 3 hours. Figure 9 shows that 60% of applications in BonjourGrid give an overhead compared to Boinc. The overhead varies between 24 seconds (in application 44) and 1277 seconds (in application 56). 31% of applications give an overhead near the lower limit (24 seconds). As mentioned before with XtremWeb-CH, the overhead contains the necessary time to find a free machine to launch a coordinator, plus the building time of a computing element (creation of an account and attachment to the coordinator project) plus the necessary time to empty the database of the coordinator. We mention that in Boinc, the creation phase of a computing element is composed of two stages. First, Boinc creates customer accounts in the project (coordinator side) and sends the client certificate (the authentication keys: Code key is the account created to focus on the project and seek jobs). This security mechanism is essential in Boinc because the access to the coordinator is certified.
9
BonjourGrid gives better execution times for applications composed of a large number of tasks. In Figure 9, the applications 20 and 111, composed of 128 tasks, and the applications 70, 72, 95, 116 and 118, composed of 64 tasks, give lower time comparing to Boinc. In contrast, Boinc give better execution time for applications with fine granularity (1 to 20 tasks) (see on Figure 9 the applications 5, 6, 7, 23, 37.50, 55, 97, 109 and 120). We explain the two last phenomena by (1) the replication method and (2) the back-off technique adopted by Boinc. In fact, Boinc replicates tasks on multiple machines to check that the obtained results are identical. Therefore, an application requiring a large number of machines will be executed on a large computing element, thus BonjourGrid will be able to find machines to replicate a job, but for small computing elements (for applications composed of a few tasks), there is no sufficient machines to execute replications. In contrast to BonjourGrid, Boinc has permanently 200 machines, so it has more resources to do replications in the same time as the execution of tasks. Consequently, if the computing element does not have sufficient resources for replication, the overhead is more important between BonjourGrid and Boinc.
Figure 9 End Time Difference between BonjourGrid and Boinc
APSCC-08-03二校.indd
9
2009/12/29
下午 06:16:34
10
Journal of Internet Technology Volume 11 (2010) No.1
As we mentioned before in XtremWeb-CH section, the back-off time is benefic for BonjourGrid, in particular with Boinc. Indeed, in Boinc, the coordinator receives all the requests on the same port. Thus, the coordinator could become overloaded if the number of worker is important. At this stage, the client becomes unable to join a project and download the necessary files to execute the jobs. If the coordinator is unreachable, the worker waits for a 7 seconds (back-off time) for a new call, (the back-off can reach 30 minutes, contrarily to XtremWeb-Ch where the worker sends a request each 30 seconds). In contrast with Boinc, in BonjourGrid each application is assigned to a coordinator, so the probability of coordinators overload is very weak. Thus, the coordinator is able to manage all requests and the communications time between coordinator and workers doses not exceed 7 seconds. Finally, BonjourGrid is much more stable during the emulation despite the overhead generated. This overhead can decrease when we use more than 200 machines. 6.3 BonjourGrid and Condor We have prepared all required configuration to install the different packages of Condor in a dedicated image on Grid’5000 platform. Using more than 200 machines of the Orsay’s node, we have realized a set of experimentations to analyze the overhead generated by BonjourGrid when we use Condor to build a computing element. We generate 130 applications with precedence between tasks, each one
contains from 2 to 128 tasks. The execution time of tasks varies from 1 to 500 seconds and the arrival pattern is spread out over 3 hours. It is the same workload model used with Boinc experimentations. In BonjourGrid, the majority of applications present an overhead, which varies from 18 to 676 seconds. This overhead includes the necessary time to find a free machine for coordinator, plus the configuration time of workers and coordinator modules. Contrarily to XtremWeb-CH and Boinc, in Condor, it is not necessary to erase the coordinator database, because Condor does not install a static module of coordinator or worker, it is, rather, based on a configuration of services (manager, submit and execute). Thus, we have realized scripts to make a real-time configuration when we select a coordinator to launch an application and to build a new computing element. This configuration phase needs about 20 seconds. Figure 10 shows that 35% of applications, executed with BonjourGrid, generate an overhead of about 30 seconds. The applications, which have given an important overhead, are preceded by applications composed of an important number of parallel tasks (more than 100 tasks). Consequently BonjourGrid has not found sufficient resources to build computing elements with the required sizes. It is also important to mention that BonjourGrid gives less overhead with Condor comparing to XtremWeb-CH and Boinc.
Figure 10 End Time Difference between BonjourGrid and Condor
APSCC-08-03二校.indd
10
2009/12/29
下午 06:16:34
Toward A Meta-Grid Middleware
7 Related Work In this section, we briefly sketch the main properties of some popular desktop grid systems. The current systems follow the old-fashioned master/worker model. A master has a list of workers ready to execute tasks. It assigns a task to the worker that fits the application task requirements. Each worker sends a heart beat signal to announce its availability. If a worker crashes, the master restarts task from scratch on another worker, some systems use checkpointing techniques. Moreover, in desktop grid systems, the jobs of the machine owner have the highest priority. Indeed, when a machine is executing an application task as a worker and the owner comes back on his machine (e.g., at this moment he uses the mouse or the keyboard), the task is suspended and can be resumed later on the same machine. XtremWeb [4], Boinc [9] and Condor [11] follow the same general design explained before, but each one has a different implementation with some particularities. BonjourGrid does not propose a novel desktop grid system. It is, rather, a novel approach which manages, in a decentralized manner, different instances of existing local desktop grid using XtremWeb, Boinc or Condor as a computing system. BonjourGrid uses, for that, a publish/ subscribe announcement based infrastructure. OurGrid [5] system avoids the centralized server by creating the notion of the home machine from which applications are submitted; the existence of several home machines reduces the impact of failures at the same time. Moreover, OurGrid provides an accounting model to assure a fair resources sharing in order to attract nodes to join the system. However, the originality of BonjourGrid comparing to OurGrid is that it supports distributed applications with precedence between tasks (since XW-CH handles this type of applications), while OurGrid supports only Bag-of-Tasks (BOT) applications (BOT applications are independent divisible tasks). WaveGrid [12] is a P2P middleware which uses a time zone-aware overlay network to indicate when hosts have a large block of idle time. This system reinforces the idea of BonjourGrid concept since changing from a set of workers to another one depending on the time zone (Wave Grid) is analogous to the principle of creating a CE from an application to another one in BonjourGrid, and depending on user’s requirements. Approaches based on publish/subscribe systems to coordinate or decentralize desktop grid infrastructures are not very numerous according to our knowledge. The project which could resemble a little bit to our project is the Xgrid project [13]. In Xgrid system, each agent (or worker) makes itself available to a single controller. It receives computational tasks and returns the results of these
APSCC-08-03二校.indd
11
11
computations to the controller. Hence, the whole architecture relies on a single and static component which is the controller. Moreover, Xgrid runs only on MacOS systems. In contrast with Xgrid, in BonjourGrid, the coordinator is not static and is created in a dynamic way. Furthermore, BonjourGrid is more generic since it is possible to graft on it any computing system (XtremWeb, Boinc, Condor) while Xgrid has its own computing system.
8 Conclusion and Future Works In this work we have shown how it is possible to manage several instances of local desktop grid (in the same institution). We emulate a multiple job scheduler that run several applications in a decentralized manner. Using BonjourGrid middleware, the management of these instances are realized without any central server. Our meta-middleware is called BonjourGrid, a novel approach designed to orchestrate multi-instances of computing elements in a decentralized manner. BonjourGrid dynamically creates a specific environment for the user who initiates applications without any need for a system administrator because BonjourGrid is fully autonomous and decentralized. We have conducted several experimentations to demonstrate the functionality of BonjourGrid and to analyze the overhead induced by the decentralization. We conclude that BonjourGrid is extremely robust and well capable of managing more than 400 CEs in an interval of 3 hours, over 1000 nodes on the Grid5000 testbed. We succeed to deploy and to interface three popular computing systems with BonjourGrid: XtremWeb-CH, Boinc and Condor. We demonstrate that each user can create his environment with the desired computing system in a decentralized manner and without any administrator interventions. Several issues must be taken into account in future work. The first one is the fault-tolerance of the coordinator. Indeed, it is important to continue the execution of the application when the coordinator (user machine) fails (one instance would be disconnected). The second issue is the reservation of participants: in the current version, BonjourGrid allocates available resources for a user without any reservation rules. Thus, if a user demands all the available machines for a long time, BonjourGrid grants them to that user. The third issue is in going up from a small to a wide area network. While the current version works only in a local network infrastructure, it is important to bypass this constraint. Grafting the new package of Bonjour, Wide Area Bonjour from Apple, using dynamic DNS update and unicast DNS queries to enable wide-area service discovery, seems to be a good solution to resolve this problem. Since the interfaces of Bonjour and WideArea Bonjour are the same, the port of our system should be easy to realize.
2009/12/29
下午 06:16:35
12
Journal of Internet Technology Volume 11 (2010) No.1
Acknowledgements Experiments presented in this paper were carried out using the Grid’5000 experimental testbed, an initiative from the French Ministry of Research through the ACI GRID incentive action, INRIA, CNRS and RENATER and other contributing partners. This work is supported by a grant of Regional Consil of Ile-de-France under the ’Cotutelle internationale’ (SETCI) program.
References [1] H. Abbes, C. Cérin and M. Jemni, Bonjourgrid as a Decentralised Job Scheduler, APSCC ’08: Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference, Washington, DC, USA, 2008, pp.89-94. [2] H. Abbes, C. Cérin and M. Jemni, Bonjourgrid: Orchestration of Multi-Instances of Grid Middlewares on Institutional Desktop Grids, 3rd Workshop on Desktop Grids and Volunteer Computing Systems (PCGrid 2009), in Conjonction with IPDPS 2009, Rome, Italy, May 29, 2009. [3] N. Abdennadher and R. Boesch, Towards a Peer-toPeer Platform for High Performance Computing, Advances in Grid and Pervasive Computing, 2007. [4] F. Cappello, S. Djilali, G. Fedak, T. Herault, F. Magniette, V. Néri and O. Lodygensky, Computing on Large Scale Distributed Systems: Xtremweb Architecture, Programming Models, Security, Tests and Convergence with grid, FGCS, Vol.21, Issue 3, 2005, pp. 417-437. [5] W. Cirne, F. Brasileiro, N. Andrade, L. Costa, A. Andrade, R. Novaes and M. Mowbray, Labs of the World, unite!!! Journal of Grid Computing, Vol.4, No.3, 2006, pp.225-246. [6] P. Domingues, A. Andrzejak and L. M. Silva, Using Checkpointing to Enhance Turnaround Time on Institutional Desktop Grids, 2nd IEEE Intern. Conf. on e-Science and Grid Computing, Amsterdam, Netherlands, 2006. [7] D. Kondo, A. A. Chien and H. Casanova, Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids, SC’04, Washington, DC, USA, 2004, pp.17. [8] U. Lublin and D. G. Feitelson, The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs, Journal of Parallel and Distributed Comput., Vol.63, Issue 11, November, 2003, pp.11051122. [9] D. P. Anderson, A System for Public-Resource Computing and Storage, IEEE/ACM International
APSCC-08-03二校.indd
12
Workshop on Grid Computing, 2004. [10] D. Steinberg and S. Cheshire, Zero Configuration Networking: The Definitive Guide, O’Reilly Media, Inc., First Edition, December, 2005. [11] D. Thain and M. Livny, Building Reliable Clients and Servers, The Grid: Blueprint for a New Computing Infrastructure, 2003. [12] D. Zhou and V. Lo, Wavegrid: A Scalable FastTurnaround Heterogeneous Peer-Based Desktop Grid System, IPDPS’20. [13] Xgrid, http://gcd.udl.cat/upload/recerca/ [14] Bonjour Protocol, http://developer.apple.com/networking/ bonjour [15] Vgrid, http://www.lri.fr/quetier/vgrid/ [16] Grid’5000, http://www.grid5000.fr
Biographies Heithem Abbes is currently a Teaching Assistant in Computer Science at Institut Supérieur des Arts du Multimédia (ISAM), University of Manouba, Tunisia. He received his Master degree in Computer Science in 2005, and his Engineer Diploma in Computer Science in 2003 from the Faculty of Sciences of Tunis. Since 2006, he is working towards his Ph.D. at the research unit UTIC in Tunisia in collaboration with the laboratory LIPN in France. His research is focused on Grid Computing and Peer-to-Peer systems.
Christophe Cérin is currently a Full Time Professor in Computer Science at the University of Paris 13, France where he leads the Grid Systems Group. His current research interests are in grid computing, distributed systems, high performance computing (multi-core machines and multitreaded libraries), resource management and reliability. He works primarily at the boundary between scheduling and middleware. He received his Ph.D. degree from the University of Paris 11, Orsay in 1992, his “habilitation à diriger des recherches» in 2002 with the university of Paris 6, all in computer science. He is a member of the IEEE and he recently managed the French national ANR SafeScale project with the Universities of Grenoble, Rennes and Brest. He is also acting as an international expert for evaluating bi-national projects for the count of the French Ministry of Foreign Office.
2009/12/29
下午 06:16:35
Toward A Meta-Grid Middleware
13
Mohamed JEMNI is a Professor at Ecole Supérieure des Sciences et Techniques de Tunis (ESSTT), University of Tunis, Tunisia. He obtained the HDR (Habilitation to Supervise Research) in Computer Science from University of Versailles, France in 2004. He received his Ph.D. in Computer Science from University of Tunis in 1997 and the Engineer Diploma in Computer Science from Faculty of Sciences of Tunis in 1991. He is the Head of the Laboratory Research UTIC: “Unit of Technologies of Information and Communication” of the University of Tunis (www.utic.rnu.tn). Since August 2008, he is the General chair of the Computing Center el Khawarizmi, the internet services provider for the sector of the higher education and scientific research in Tunisia (www.cck.rnu.tn). He published more than 120 papers in international journals and conferences. His Research Projects Involvement are High performance Computing; Algorithmic and tools for Grid computing and Advanced research in e-learning and e-accessibility.
Walid Saad is a member of the research unit UTIC in Tunisia. He received his Master degree in Computer Science in November 2009 from Ecole Supérieure des Sciences et Techniques de Tunis (ESSTT), University of Tunis, Tunisia. His research is focused on Grid Computing and Peer-to-Peer systems.
APSCC-08-03二校.indd
13
2009/12/29
下午 06:16:35