66 (2006) 1052â1061. 1059. Table 1. The computational pool with a peak CPU power of 149877 MIPS or 29.4. GFLOPS. OS. Arch. CPU (GHz). Number. Linux.
J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061 www.elsevier.com/locate/jpdc
Grid computing for parallel bioinspired algorithms N. Melab∗ , S. Cahon, E-G. Talbi Laboratoire d’Informatique Fondamentale de Lille, CNRS UMR 8022, INRIA Futurs, DOLPHIN Project, Université des Sciences et Technologies de Lille, 59655, Villeneuve d’Ascq cedex, France Received 16 May 2005; received in revised form 18 October 2005; accepted 2 November 2005 Available online 20 March 2006
Abstract This paper focuses on solving large size combinatorial optimization problems using a Grid-enabled framework called ParadisEO–CMW (Parallel and Distributed EO on top on Condor and the Master Worker Framework). The latter is an extension of ParadisEO, an open source framework originally intended to the design and deployment of parallel hybrid meta-heuristics on dedicated clusters and networks of workstations. Relying on the Condor–MW framework, it enables the execution of these applications on volatile heterogeneous computational pools of resources. The motivations, architecture and main features will be discussed. The framework has been experimented on a realworld problem: feature selection in near-infrared spectroscopic data mining. It has been solved by deploying a multi-level parallel model of evolutionary algorithms. Experimentations have been carried out on more than 100 PCs originally intended for education. The obtained results are convincing, both in terms of flexibility and easiness at implementation, and in terms of efficiency, quality and robustness of the provided solutions at run time. © 2006 Elsevier Inc. All rights reserved. Keywords: Meta-heuristics; Parallelism; Frameworks; Grid computing; Spectroscopic data mining
1. Introduction Real-world optimization problems are often complex and NP-hard, their modeling is continuously evolving in terms of constraints and objectives, and their resolution is CPU time-consuming. Although near-optimal algorithms such as meta-heuristics (generic heuristics) allow to reduce the temporal complexity of their resolution they are unsatisfactory to tackle large problems. Grid computing has recently been revealed as a powerful way to deal with time-intensive problems [14]. Our challenge is the design and transparent deployment of meta-heuristics on computational grids for solving largescale complex problems to achieve high effectiveness and efficiency.1 Meta-heuristics are based on the iterative improvement of either a population of solutions (e.g. evolutionary ∗ Corresponding author. Fax: +33 3 28 77 85 37.
E-mail addresses: melab@lifl.fr (N. Melab), cahon@lifl.fr (S. Cahon), talbi@lifl.fr (E-G. Talbi). 1 This work is a part of the current national joint grid project GGM of ACI-Masse de Données supported by the French government. 0743-7315/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2005.11.006
algorithms or EAs) or a single solution (e.g. Tabu search) of a given optimization problem. In this paper, we focus on the first category i.e. EAs. The design of grid-aware EAs often involves the cost of a sometimes painful apprenticeship of parallelization techniques and grid computing technologies. In order to free from such burden those who are unfamiliar with those advanced features, optimization frameworks must integrate the up-to-date parallelization techniques and allow their transparent exploitation and deployment on computational grids. In [9], we have proposed a framework called ParadisEO dedicated to the reusable design of parallel and distributed EAs for only dedicated parallel hardware platforms. In this paper, we extend the framework to deal with computational grids. ParadisEO is an extensible LGPL C++ framework based on a clear conceptual separation of the meta-heuristics from the problems they are intended to solve. This separation and the large variety of implemented optimization features allow a maximum code and design reuse. In addition, ParadisEO is one of the rare frameworks that provide the most
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
common parallel/distributed models. These models are portable on as well distributed-memory machines as shared-memory multi-processors as they are implemented using standard libraries such as MPI/PVM and PThreads. They can be exploited in a transparent way, one has just to instantiate their associated ParadisEO components. The parallel optimization components provided in ParadisEO can be deployed only on dedicated machines. Therefore, we aim at extending the framework to allow the flexible design and deployment of meta-heuristics on non-dedicated computational grids. The new framework, called ParadisEO–CMW, is a coupling between ParadisEO and MW [17]. The latter enables a quick and easy Master–Worker oriented parallelization of scientific computations using Condor [24] on grids. The Condor–MW system provides low-level solutions to some crucial issues such as the volatility of machines, scheduling, etc. It particularly provides application-specific logical checkpoint handling functions to deal with the fault-tolerance issue. These functions allow the user to save the state of an application to permanent storage (disk). We have analyzed the major meta-heuristics and defined for each of them and each of their associated parallel/distributed models the required state for checkpointing. In ParadisEO–CMW, the system-level middleware as well as the checkpointing mechanism are transparent for the user. For validation, ParadisEO–CMW has been experimented on a real-world application consisting in near-infrared spectroscopic data mining [20]. The different exploited parallel models and their implementation and performance issues are discussed. The remainder of the paper is organized as follows. Section 2 highlights the principles of EAs and their parallel models. In Section 3, we describe the major design features and architecture of ParadisEO and the related work. Section 4 presents the design and implementation of ParadisEO on top of Condor– MW. Section 5 shows and comments some experimental results obtained with ParadisEO–CMW on the near-infrared spectroscopic data mining application. In Section 6, we conclude the paper and draw some perspectives of the presented work.
2. Parallel evolutionary algorithms 2.1. Principles of EAs EAs are stochastic search techniques that have been successfully applied for solving many real and complex problems. An EA is an iterative technique that applies stochastic operators on a pool of individuals (the population) (see Algorithm 2.1). Every individual in the population is the encoded version of a tentative solution. Initially, this population is generated randomly. At each generation of the algorithm, solutions are selected, paired and recombined in order to generate new solutions that replace worse ones according to some criteria, and so on. An evaluation function associates a fitness value to every individual indicating its suitability to the problem (selection criterion).
1053
Algorithm 2.1. EA pseudo-code. Generate(P(0)); t := 0; while not Termination_Criterion(P(t)) do Evaluate(P(t)); P’(t) := Selection(P(t)); P’(t) := Apply_Reproduction_Ops(P’(t)); P(t+1) := Replace(P(t), P’(t)); t := t+1; endwhile The above pseudo-code shows the genetic components of any EA. There exist several well-accepted subclasses of EAs depending on representation of the individuals or how each step of the algorithm is designed. The main subclasses of EAs are the genetic algorithms (or GAs) [22], genetic programming (or GP) [5], evolution strategies (or ES) [6], etc. A survey of the history and a state-of-the-art of evolutionary computation can be found in [4]. 2.2. Parallel models of EAs For non-trivial problems, executing the reproductive cycle of a simple EA on long individuals and/or large populations requires high computational resources. In general, evaluating a fitness function for every individual is frequently the most costly operation of the EA. Consequently, a variety of algorithmic issues are being studied to design efficient EAs. These issues usually consist of defining new operators, hybrid algorithms, parallel models, and so on. Parallelism arises naturally when dealing with populations, since each of the individuals belonging to it is an independent unit. Due to this, the performance of population-based algorithms is specially improved when running in parallel. A review of the parallel paradigms is proposed in [10] for GAs and in [2] for EAs. Basically, three major parallel models for EAs can be distinguished: the island (a)synchronous cooperative model (see Fig. 1), the parallel evaluation of the population (see Fig. 2), and the distributed evaluation of a single solution (see Fig. 3): • Island (a)synchronous cooperative model. Different EAs are simultaneously deployed to cooperate for computing better and robust solutions. They exchange in a(n) (a)synchronous way genetic stuff to diversify the search. The objective is to allow the delay of the global convergence, especially when the EAs are heterogeneous regarding the variation operators. The migration of individuals follows a policy defined by few parameters: the migration decision criterion, the exchange topology, the number of emigrants, the emigrants selection policy, and the replacement/integration policy. • Parallel evaluation of the population. It is required as it is in general the most time-consuming. The parallel evaluation follows the Master–Worker model. The Master applies the following operations: selection, transformation and replacement as they require a global management of the population. At each generation, it distributes the set of new solu-
1054
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
performed on the results returned by the partial functions. As a summary, for this model the user has to indicate a set of partial functions and an aggregation operator of these. 3. The ParadisEO framework 3.1. Related work
Fig. 1. The cooperative island model of evolutionary algorithms.
Fig. 2. The parallel evaluation step of the population.
Fig. 3. The distributed evaluation of a single solution.
tions between different workers. These evaluate and return back the solutions and their associated quality. An efficient execution is often obtained particularly when the evaluation of each solution is costly. • Distributed evaluation of a single solution. The quality of each solution is evaluated in a parallel centralized way. That model is particularly interesting when the evaluation function can be itself parallelized as it is CPU time-consuming and/or IO intensive. In that case, the function can be viewed as an aggregation of a certain number of partial functions. The partial functions could also be identical if for example the problem to deal with is a data mining one. The eval uation is thus data parallel and the accesses to database are performed in parallel. Furthermore, a reduction operation is
Several extendible frameworks for the reusable design of parallel and distributed meta-heuristics have been proposed and most of them are available on the Web. Some of them are restricted to only parallel and distributed EAs. The major of them are the following: DREAM 2 [3], ECJ 3 [19], JDEAL 4 [11] and Distributed BEAGLE 5 [15]. These frameworks are reusable as they are based on a clear object-oriented conceptual separation between solution methods and optimization problems. They are also portable as they are developed in Java except the latter, which is coded in C++. However, they are limited regarding the parallel distributed models. Indeed, in DREAM and ECJ only the island model is implemented using Java threads and TCP/IP sockets. JDEAL provides only the Master–Slave parallel evaluation of the population model using TCP/IP sockets. Distributed BEAGLE provides this latter model and the synchronous migration-based island model. Except Evolutionary Algorithm Machine (DREAM), all these cited frameworks do not allow the deployment of the provided algorithms on computational grids or peer-topeer systems. The Distributed Resource DREAM Project [3] (http://www.world-wide-dream.org) is a peer-topeer software infrastructure devoted to support info-habitants evolving in an open and scalable way. It considers a virtual pool of distributed computing resources, where the different steps of an EA are automatically and transparently processed. However, only the island model is provided and this model does not need a great amount of resources. Most of existing frameworks [12,21], for implementing single solution-based methods, do not allow parallel distributed development. Those enabling parallelism/distribution are often dedicated to only one solution method. For instance, [7] provides parallel skeletons for the TS method. Two skeletons are provided and implemented in C++/MPI: independent runs (multi-start) model with search strategies, and a Master–Slave model with neighborhood partitioning. The two models can be exploited by the user in a transparent way. To the best of our knowledge, there does not exist any grid-enabled framework for the design and deployment of single solution-oriented methods. Few frameworks available on the Web are devoted to both parallel and distributed single solution and population2 Distributed Resource Evolutionary Algorithm Machine: http://www. world-wide-dream.org. 3 Java Evolutionary Computation: http://www.cs.umd.edu/ projects/plus/ec/ecj/. 4 Java Distributed Evolutionary Algorithms Library: http://laseeb. isr.ist.utl.pt/sw/jdeal/. 5 Distributed Beagle Engine Advanced Genetic Learning Environment: http://www.gel.ulaval.ca/∼beagle.
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
based methods, and their hybridization. MALLBA 6 [1] and ParadisEO–CMW are good examples of such frameworks. MALLBA and ParadisEO–CMW have numerous similarities. They are C++/ MPI open source frameworks. They provide all the previously presented distributed models, and different hybridization mechanisms. However, they are quite different as we believe that ParadisEO–CMW is more flexible because the granularity of its classes is finer. Moreover, from the GridComputing point of view, unlike MALLBA, ParadisEO–CMW relies on Condor–MW which is portable and widely used. In addition, MALLBA is usable only in a dedicated wide area network [1] whereas ParadisEO–CMW allows the deployment on as well-dedicated computational pools as on non-dedicated and volatile material environments. In MALLBA, communications are based on NetStream, an ad hoc flexible and OOP message passing service upon MPI. 3.2. ParadisEO: an extended EO The “EO” part of ParadisEO means Evolving Objects because it is basically an extension of the Evolving Objects (EO) [16] framework. EO is an LGPL C++ open source framework downloadable from http://eodev.source forge.net. The framework is originally the result of an European joint work [16]. EO includes a paradigm-free Evolutionary Computation library (EOlib) dedicated to the flexible design of EAs through evolving objects superseding the most common dialects (genetic algorithms, evolution strategies, evolutionary programming and genetic programming). Flexibility is enabled through the use of the object-oriented technology. Templates are used to model the EA features: coding structures, transformation operators, stopping criteria, etc. These templates can be instantiated by the user according to his/her problem-dependent parameters. The object-oriented mechanisms such as inheritance, polymorphism, etc. are powerful ways to design new algorithms or evolve existing ones. Furthermore, EO integrates several services making it easier to use, including visualization facilities, on-line definition of parameters, application checkpointing, etc. In its original version, EO does not enable the design of local search methods such as descent search or hill-climbing, simulated annealing, Tabu search or gradient-based search. Moreover, it does not offer any facility for neither parallelism and distribution, nor hybridization. Sticking out those limitations was the main objective in the design and development of ParadisEO. In the following, we will focus only on parallel/distributed mechanisms. The ParadisEO framework [9] is dedicated to the reusable design of parallel hybrid meta-heuristics by providing a broad range of features including EAs, local search methods, parallel and distributed models, different hybridization mechanisms, etc. Such rich content and utility of ParadisEO increases its usefulness. ParadisEO is a C++ LGPL extensible open source framework based on a clear conceptual separation of the meta-heuristics from the problems they are intended to solve. This separation 6 MAlaga+La Laguna+BArcelona: http://neo.lcc.uma.es/mallba /mallba.html.
1055
and the large variety of implemented optimization features allow a maximum code and design reuse. Such conceptual separation is expressed at implementation level by splitting the classes in two categories: provided classes and required classes. The provided classes constitute a hierarchy of classes implementing the invariant part of the code. Expert users can extend the framework by inheritance/specialization. The required classes coding the problem-specific part are abstract classes that have to be specialized and implemented by the user. The classes of the framework are fine-grained, and instantiated as evolving objects embodying each one only one method. This is a particular design choice adopted in EO and ParadisEO. The heavy use of these small-size classes allows more independence and thus a higher flexibility compared to other frameworks. Changing existing components and adding new ones can be easily done without impacting the rest of the application. ParadisEO is one of the rare frameworks that provide the most common parallel and distributed models quoted in Section 2.2. These models are portable on distributed-memory machines and shared-memory multi-processors as they are implemented using standard libraries such as MPI, PVM and PThreads. The models can be exploited in a transparent way, one has just to instantiate their associated ParadisEO components. The user has the possibility to choose by a simple instantiation MPI or PVM for the communication layer. The models have been validated on academic and industrial problems, and the experimental results demonstrate their efficiency [9]. 4. Grid-enabled ParadisEO The first release of ParadisEO allows a transparent exploitation of the parallel models presented in Section 2.2 in dedicated environments. In this paper, the focus is on their re-design and deployment on large-scale non-dedicated computational grids. This is a great challenge as nowadays there is no effective grid-enabled framework for meta-heuristics to the best of our knowledge. 4.1. Architecture of ParadisEO–CMW To Grid-enable ParadisEO, first of all, one needs a Grid middleware and two interfaces: an infrastructure interface and a Grid Application Programming Interface. The former provides communication and resource management tools. Our approach consists in using the Condor high-throughput computing system as a Grid infrastructure. In addition, the MW abstract programming framework supplies the two required interfaces. Condor [24] is a High-Throughput Computing (HTC) system that deals with heterogeneous computing resources and multiple users. It allows to manage non-dedicated and volatile resources, by deciding of their availability using both the average CPU load and the information about the recent use of some peripherals like the keyboard and mouse. An environment including such resources is said adaptive since tasks are scheduled among idle resources and dynamically migrated when some resources get used or failed. In addition, Condor uses some sophisticated techniques [18] like matchmaking and checkpoint-
1056
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
cess to the Condor infrastructure. The fourth and lowest level supplies communication and resource management services. 4.2. Checkpointing-based fault tolerance
Fig. 4. A layered architecture of ParadisEO–CMW.
ing. These allow, respectively, to associate job requirements and policies on resources owners, and to periodically save the state of running jobs, and to restart them using this state after their failure. MW [17] is a software framework developed within the context of the project “Meta-computing Environments for Optimization” (MetaNEOS) [25]. It allows an easy development of Master–Worker applications for computational grids. MW is a set of C++ abstract classes including interfaces to application programmers and Grid-infrastructure developers. Gridenabling an application with MW or porting MW to a new grid software toolkit consists in re-implementing a small number of virtual functions. In MW, the infrastructure interface provides the access to communication and resource management. The communication is performed between the master and the workers. The resource management encompasses: available resource request and detection, infrastructure querying to get information about resources, fault-detection, and remote execution. These basic resource management services could be provided by Condor. The architecture of ParadisEO–CMW is layered as it is illustrated in Fig. 4. From top to down, the first level supplies the optimization problems to be solved using the framework. The second level represents the ParadisEO framework including optimization solvers embedding single and multi-objective meta-heuristics (evolutionary algorithms and local searches). The third level provides interfaces for Grid-enabled programming and for ac-
An important issue to deal with in the gridification process of ParadisEO is fault-tolerance. MW automatically reschedules on Worker Nodes unfinished tasks which were running on processors that failed. However, this cannot be applied to the master process that launches and controls tasks on Worker Nodes. Nevertheless, a couple of primitives are provided by MW to fold up or unfold the whole application enabling the user to save/restart the state in/from a file stream. Note that only the Master Node is concerned by these checkpointing operations. Dealing with meta-heuristics, these functionalities are easily investigated as checkpointing most of the meta-heuristics is straightforward. It consists in saving the current solution(s), the best solution found since the beginning of the search, the continuation criterion (e.g. the current iteration for a generational counter), and some additional parameters controlling the behavior of the heuristic (e.g. the temperature for the simulated annealing, the tabu list for the Tabu search method, etc.). In ParadisEO–CMW, default checkpoint policies are initially associated to the deployed meta-heuristics, and can be exploited by the users in a transparent way. In addition, these policies can be extended with more application-specific features. 4.3. A case study: parallel evaluation of an EA’s population ParadisEO–CMW is illustrated in Fig. 5 through an UMLbased [8] scenario that shows the design and implementation of the parallel evaluation of an EA’s population model. The EA eoEA is the owner of the parallel evaluator component that implements the parallel evaluation of an EA’s population model eoDistPopLoopEval. The EA may thus trigger the execution of that component, and will be informed on its completion. The different steps of the parallel evaluation process are the following: (1) The object eoDistPopLoopEval informs the main controller eoDriver that it is ready to deploy n new tasks, n designates the population size. Each task consists in an evaluation of one individual. (2) As soon as a Worker Node is available, eoDriver prepares one of these tasks to be spawn. The preparation consists in packing the data of the considered task, registering the corresponding individual being evaluated, next solution not yet evaluated in the pool of waited solutions, and the task identifier, an integer automatically generated by the Register to designate the task (stored in each worker node). These information are used when the result (fitness value of the evaluated individual) is returned back. (3) The task is then scheduled to the Worker Node at the request of this latter. (4) At the arrival of the task on the Worker Node, the component eoWorker running on that node extracts from the
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
the_ea: eoEA
pop_eval: eoDistPopLoopEval
the_driver: eoDriver
the_register: Register
1057
the_register: Register
the_worker: eoWorker
pop_eval: eoDistPopLoopEval
start
addNewTask (this) N times getID (this_algo)
PackDriverTask (this) N times
send getAlgo (id)
unpackWorkerTask ()
executeTask ()
packWorkerTask ()
send unpackDriverTask (this) N times notifyTerminationOf (this)
Fig. 5. The parallel evaluation of a population model provided in ParadisEO–CMW.
information associated with the task the identifier of this latter. The identifier allows eoWorker to find in the referential of procedures (local Register) the component that is able to execute the task. In this case study, the component eoAlgo will be selected to evaluate the fitness of the individual. (5) eoWorker invokes eoAlgo to extract the individual from the information associated with the task, the fitness evaluation function, and packs the result to be returned back to the Master Node. (6) At the Master Node, eoDirver updates with the returned result (fitness value) the corresponding registered nonevaluated individual. When the fitness value of the nth last individual is returned, the eoDistPopLoopEval component resumes. The eoEA is then notified about that event, what allows it to move on to the next step of the evolution process i.e. the replacement phase. The checkpointing mechanism consists in this case in saving at the Master Node level the returned results of the different spawned tasks i.e. the different individuals and their associated fitness values. 5. Application to NIR spectroscopic data mining 5.1. Problem formulation and resolution The problem consists in discovering, from a set of data samples, a predictive mathematical model for the concentration of
sugar in beet. Each data sample contains a real measure (obtained by chemical analysis) of the concentration of sugar in one beet sample, and a set of its absorbances to 1024 NIR wavelengths. The full dataset (1800 samples) has been provided by the LASIR labs 7 at Lille. According to the Beer Lambert law the problem is linear. The Partial Least Square (PLS) regression is known to be the most efficient technique to deal with such problem. However, the number of wavelengths in the resulting predictive model is often high, thus decreasing its understandability. The statistical analysis of the data allowed us to highlight that many absorbances are correlated between them (redundancy issue), and at the same time they are less correlated with the concentration (irrelevance issue). In order to withdraw both irrelevant and redundant wavelengths a feature selection has to be performed. As feature selection is an NP-hard problem we use a GA to solve it (see Fig. 6). The individuals of the population are bit-strings of 1024 bits, each of them designates a selection of wavelengths. Each bit set to 1 means that its corresponding wavelength is selected for the computation of the predictive model. The PLS method is used as the fitness function of the GA. More exactly, the prediction error (RMSEP) returned by the PLS method is the fitness value of the individuals (selections). The traditional crossover and mutation operators are used as reproduction operators. A full and detailed description of the GA can be found in [20].
7 http://lasir.univ-lille1.fr.
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
RMSEP
Speed-up Fitness computation time
G.A.
5.5
90 O 1 1 O
O
1 O
1
Selection (ranking)
80
a1 a2
PLS regression
Children
5 4.5
O
Speed-up
Mutation (flip)
6
100
Evaluation
Parents
4 3.5
70
3 2.5
60
an
2
Crossover (one point and uniform crossover)
1.5
50
Fitness computation time
1058
1 40 0.1
Database
The hybridization GA–PLS is CPU time consuming because the fitness evaluation is particularly costly as the PLS method handles large matrices. Parallelism is thus required to get results at short notice. The parallel island model and the parallel evaluation of the population described in Section 2.2 are exploited in our implementation of the GA based on ParadisEO–CMW. 5.2. Experimentation with ParadisEO –CMW We have implemented and experimented some parallel models provided by ParadisEO on top of Condor–MW. In the following, we experimentally evaluate the influence of the granularity of parallelism and asynchronism on the efficiency of execution for the parallel evaluation of the population model. We also evaluate the effect of the genetic feature selection on the building of predictive models in spectrum analysis. This effect is measured in terms of efficacy (quality of discovered predictive models) and robustness (gap between different executions). Finally, we present some measurements related to the volatility of the Grid. The efficiency of the parallel evaluation of the population model depends strongly on the granularity of the fitness function (PLS procedure). The granularity is defined as the selection rate of the features (wavelengths) composing the individuals. Fig. 7 shows the evolution of the speed-up for the asynchronous parallel evaluation model as a function of the feature selection rate. The speed-up is defined in the traditional way as the ratio between the serial execution time and the parallel execution time. The speed-up varies from 41.6 to 98.27 on 100 machines for a feature selection rate comprised between 10% and 90%. This result show that the parallel evaluation of the population model provided by ParadisEO–CMW could be very efficient for optimization problems in which the fitness function execution time is about or more than 5 s. The scalability of the parallel models depends on the way they are implemented. There are two modes for the implementation of these models: synchronous and asynchronous. We particularly focus here on the parallel evaluation of the population
0.3
0.4 0.5 0.6 0.7 Feature selection rate
0.8
0.5 0.9
Fig. 7. Speed-up obtained with the asynchronous parallel evaluation model as a function of the feature selection rate.
100 'perf /su.sync' 'perf /su.async' 'linear speed-up'
90 80 70 Speed-up
Fig. 6. The GA-based feature selection method.
0.2
60 50 40 30 20 10 0
0
10
20
30 40 50 60 70 Number of processors
80
90
100
Fig. 8. Speed-ups obtained with the synchronous and asynchronous parallel evaluation models.
model. In the synchronous model, the individuals are evaluated in a serial way. In addition, the termination of all the parallel evaluations must be synchronized. When these evaluations are highly variable in terms of consumed CPU time, their synchronization increases the waiting and wasted time of the workers. To evaluate the impact of the heterogeneity of computations on the scalability, we have experimented the application with a variable feature selection rate. The individuals of the population are generated with a feature selection rate comprised between 0.1 and 0.9 corresponding to a fitness execution time between 0.69 and 5.62 s. The experimentations have been conducted on a cluster of 100 IBM e326 Opteron 2.4 GHz, 4 Go RAM and 80 Go HD interconnected by a Gigabit Ethernet network. Fig. 8 confirms that the asynchronous model is by far more scalable than the synchronous model. To evaluate the impact of genetic feature selection on the quality and robustness of the discovery process of predictive models we have deployed a hierarchical parallel model. The hybrid model combines the island model and the parallel evaluation of the population model. As the two models are often
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
Table 1 The computational pool with a peak CPU power of 149 877 MIPS or 29.4 GFLOPS Arch.
CPU (GHz)
Number
Linux — — — — — —
Intel — — — — — —
2.4 2.4 2.0 1.4 0.7 0.45 —
36 28 8 8 14 28 122
0.16 ag1 mesh16 ring4
0.15 Error prediction
OS
1059
0.14 0.13 0.12 0.11 0.1 0.09
limited to, respectively, few islands and about 100 evaluations, their combination allows to increase the degree of parallelism to justify the use of computational grids. The island model is multi-threaded, and each island is managed by a single thread. Even the islands are deployed on only one machine, their fitness evaluation phase is performed in parallel using the second model (parallel evaluation of the population). As this evaluation phase represents the most time-consuming in an EA, one can say that the parallelism involved by the island model is exploited. The combined model has been experimented during working days (volatile grid) on the education network of the Polytech’Lille engineering school. The hardware platform is composed of over 120 heterogeneous Linux Debian PCs (see Table 1) originally used for education. Fig. 9 illustrates the evolution of the fitness corresponding to the best computed selection of wavelengths over the generations for 1, 4 and 16 island(s). The quality of the discovered predictive model is improved compared to the quality obtained without selection. Indeed, the Root Mean Square Error Prediction (RMSEP) associated to the model computed by PLS regression without (respectively, with) preliminary feature selection is 0.150 (respectively, 0.1035). The predictive model is thus by far more accurate with feature selection and the accuracy improvement is at least 41.91% (with only one island). In addition, the execution of the parallel GA is robust as the average RMSEP and deviation obtained with 10 runs are, respectively, 0.1 and 0.0009. Another issue that arises with the synchronous model on volatile grids like that investigated in this work is the following. When a worker fails its assigned task is re-scheduled, and this re-scheduling operation requires several seconds. The global efficiency on a volatile pool is thus dramatically decreased. That is why the asynchronous model is preferred, although it is known to be generally more elitist. However, its combination with the cooperative island model enables to bypass such problem. Table 2 presents some execution statistics obtained with a complete run of the application exploiting simultaneously the island model and the parallel evaluation of the population. The application is composed of 16 EAs cooperating in a mesh topology. All perform the asynchronous evaluation step in parallel. The experiments have been conducted on the computational pool described above in Table 1.
0
20
40
60 80 100 Number of gen.
120
140
Fig. 9. Evolution over generations of the RMSEP associated to the best found feature selection with 1, 4 and 16 island(s).
Table 2 Some performance measurements Number of workers Total wall clock time Cumulative CPU time of workers Average number of available workers Average number of active workers Parallel efficiency
122 36 953 s (10 h) 2 363 571 s (27 days) 115 78 0.82
The heterogeneous and dynamic nature of the Grid makes application performance difficult to assess. Therefore, in [17] different metrics such as the parallel efficiency have been defined. The definition of the parallel efficiency in a non-dedicated environment is the following. Let I a set of workers and J a set of tasks. The parallel efficiency [17] is defined as
j ∈J
t (j )
i∈I
U (i)
=
.
• U (i)—wall clock time that worker i (i ∈ I) is available. • t (j )—CPU time spent is solving task j (j ∈ J ). The parallel efficiency is about 82% meaning that the available workers (their associated machines) are exploited in a very good proportion. In the computation of the parallel efficiency, the CPU cost of marshaling/unmarshaling operations of individuals has not been taken into account as it is negligible compared to the execution time of their fitness function. Unlike standard performance measures the parallel efficiency, as defined in [17], allows to separate the application code performance from the computing platform performance. Indeed, this can be checked through Table 3 that illustrates some statistics related to the use of the computational pool during one hour of our experimentation. One can notice that the Condor system and ParadisEO–CMW make together use of over 90% of the CPU time. Table 2 shows that 82% is consumed by the application.
1060
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
Table 3 Use of the computational pool during 1 h Owner average Total (%) 8.1
Condor average
Idle average
Owner peak
Condor peak
89.9
2.1
10
90
experimental platform 8 will serve as a wide-area testbed for the framework. We will also experiment ParadisEO–CMW on the real-world radio network design application, a very hard practical multi-objective problem in which many layers of parallelism and cooperation are considered. References
6. Conclusion ParadisEO [9] is a C++ LGPL extensible open source framework dedicated to the rapid reusable design and implementation of parallel hybrid meta-heuristics on dedicated machines. In this paper, we have extended the framework to enable its transparent and easy use in a non-dedicated and volatile large-scale environments. To do that, ParadisEO is ported to Condor–MW which is a C++ software framework allowing an easy development of Master–Worker applications for computational grids. The coupling consists in re-implementing some base classes, and this is facilitated by the fact that the two frameworks are written in the same language i.e. C++. The resulting framework called ParadisEO–CMW is to the best of our knowledge the first proposed Grid-enabling framework for parallel hybrid meta-heuristics. The island model is not Master–Worker, and is therefore implemented in a multi-threaded way. Even if its islands are deployed on the same processor, only the control part is concurrently executed on this processor. Indeed, the evaluation of the populations of the different islands which represents the most costly part is performed in parallel. The two other models are Master–Worker models in nature, thus their MW-based implementation is straightforward. ParadisEO–CMW has been experimented on a real-world NIR spectroscopic data mining application. This latter consists in applying an island-based GA to feature selection with a wrapper approach. That is to say that the evaluation of the quality of the solutions (predictive models) is based on a data mining method (the PLS in this work). This evaluation is yet the most costly part of the GA, its cost is increased with the wrapper approach. Therefore, the exploitation of the computational grids is needed. The experimentation of the model on a non-dedicated network of over 120 PCs has provided several promising results. The island model allows to delay the convergence of the evolution process and thus to obtain better results in terms of the reliability of the discovered predictive model. However, as it requires intensive CPU time the computational grid is necessary to deploy it. On the other hand, the model based on the parallel evaluation of the population must be exploited in an asynchronous way for more scalability. Finally, the results show that our framework exploits efficiently the available resources through the use of Condor–MW. In the future, we plan to investigate the Condor Flocking mechanism [23] and the coupling of ParadisEO–CMW with Globus [13] to harness pools of resources belonging to different institutions and different administrative domains. The Grid5000
[1] E. Alba, the MALLBA group, MALLBA: a library of skeletons for combinatorial optimization, in: R.F.B. Monien (Ed.), Proc. of the EuroPar, Lecture Notes in Computer Science, vol. 2400, Paderborn, Springer, Berlin, 2002, pp. 927–932. [2] E. Alba, M. Tomassini, Parallelism and evolutionary algorithms, IEEE Trans. Evol. Comput. 5 (6) (2002) 443–462. [3] M.G. Arenas, P. Collet, A.E. Eiben, M. Jelasity, J.J. Merelo, B. Paechter, M. Preuß, M. Schoenauer, A framework for distributed evolutionary algorithms, in: Proc. of PPSN VII, September 2002. [4] T. Bäck, U. Hammel, H.P. Schwefel, Evolutionary computation: comments on the history and current state, IEEE Trans. Evol. Comput. 11 (1) (1997) 3–17. [5] W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming—An Introduction; On the Automatic Evolution of Computer Programs and its Applications, Morgan Kaufmann, CA, Los Altos, 1998. [6] H.G. Beyer, H.P. Schwefel, Evolution strategies: a comprehensive introduction, Natur. Comput. 1 (1) (2002) 3–52. [7] M.J. Blesa, Ll. Hernandez, F. Xhafa, Parallel skeletons for tabu search method, Kyongju City, Korea, IEEE Computer Society Press, 2001, pp. 23–28. [8] G. Booch, James Rumbaugh, Ivar Jacobson, The Unified Modeling Language User Guide, Addison Wesley Professional, 1999. [9] S. Cahon, N. Melab, E-G. Talbi, Building with ParadisEO reusable parallel and distributed evolutionary algorithms, Parallel Comput. 30 (5–6) (2004) 677–697. [10] E. Cantú-Paz, Efficient and Accurate Parallel Genetic Algorithms, Kluwer Academic Publishers, Dordrecht, 2002. [11] J. Costa, N. Lopes, P. Silva, JDEAL: the java distributed evolutionary algorithms library, 2000. [12] L. Di Gaspero, A. Schaerf, Easylocal++: an object-oriented framework for the flexible design of local-search algorithms, Softw. Pract. Exper. 33 (8) (2003) 733–765. [13] I. Foster, C. Kesselman, Globus: a metacomputing infrastructure toolkit, Internat. J. Supercomput. Appl. 11 (2) (1997) 115–128. [14] I. Foster, C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, 1999. [15] C. Gagné, M. Parizeau, M. Dubreuil, Distributed BEAGLE: an environment for parallel and distributed evolutionary computations, in: Proc. of the 17th Annu. Internat. Symp. High Performance Computing Systems and Applications (HPCS) 2003, May 11–14, 2003. [16] M. Keijzer, J.J. Morelo, G. Romero, M. Schoenauer, Evolving objects: a general purpose evolutionary computation library, in: Proceedings of the fifth International Conference. on Artificial Evolution (EA’01), Le Creusot, France, October 2001. http://eodev.sourceforge.net. [17] J. Linderoth, S. Kulkarni, J.-P. Goux, M. Yoder, An enabling framework for master–worker applications on the computational grid, in: Proceedings of the Ninth IEEE Symposium. on High Performance Distributed Computing (HPDC9), Pittsburgh, PA, August 2000, pp. 43–50. http://www.cs.wisc.edu/condor/mw/. [18] M. Livny, J. Basney, R. Raman, T. Tannenbaum, Mechanisms for high throughput computing, SPEEDUP J. 11 (1) (1997) http://www.cs.wisc.edu/condor/. [19] S. Luke, L. Panait, Z. Skolicki, J. Bassett, R. Hubley, A. Chircop, ECJ: a java-based evolutionary computation and genetic programming research system, 2002. 8 https://www.grid5000.fr/.
N. Melab et al. / J. Parallel Distrib. Comput. 66 (2006) 1052 – 1061
[20] N. Melab, S. Cahon, E-G. Talbi, L. Duponchel, Parallel genetic algorithm based wrapper feature selection for spectroscopic data mining, in: BioSP3 Workshop on Biologically Inspired Solutions to Parallel Processing Problems, IEEE IPDPS2002 (Internat. Parallel and Distributed Processing Symp.), Fort-Lauderdale, USA, IEEE Press, April 2002, p. 201. [21] L. Michel, P. Van Hentenryck, Localizer++: an open library for local search, Technical Report CS-01-02, Brown University, Computer Science, 2001. [22] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, Cambridge, 1996. [23] D. Thain, T. Tannenbaum, M. Livny, Condor and the grid, in: F. Berman, G. Fox, T. Hey (Eds.), Grid Computing: Making the Global Infrastructure a Reality, Wiley, New York, 2002. [24] D. Thain, T. Tannenbaum, M. Livny, Distributed computing in practice: the condor experience, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Ltd., 2004. [25] The MetaNEOS project, Metacomputing environments for optimization. http://www.mcs.anl.gov/metaneos, 2000. Nordine Melab received the Master’s, Ph.D. and HDR degrees in computer science, both from the Laboratoire d’Informatique Fondamentale de Lille (LIFL, Université de Lille1). He is an Associate Professor at Polytech’Lille and a member of the OPAC team at LIFL. He is involved in the DOLPHIN project of INRIA Futurs. He is particularly a member of the Steering Committee of the French Nation-Wide project Grid5000. His major research interests include parallel and grid computing, combinatorial optimization algorithms and applications and software frameworks.
1061
Sébastien Cahon received the Master’s and Ph.D. degrees in computer science, both from the Laboratoire d’Informatique Fondamentale de Lille (LIFL, Université de Lille1). Currently, he has a post-doctoral position within the OPAC team at LIFL. He is involved in the DOLPHIN project of INRIA Futurs. His major research interests include parallel and grid computing, combinatorial optimization algorithms and applications and software frameworks. El-Ghazali Talbi received the Master’s and Ph.D. degrees in computer science, both from the Institut National Polytechnique de Grenoble. He is presently Professor in computer science at Polytech’Lille (Université de Lille1), and researcher in Laboratoire d’Informatique Fondamentale de Lille. He is the leader of OPAC team at LIFL and the DOLPHIN project of INRIA Futurs. He took part to several CEC Esprit and national research projects. His current research interests are mainly parallel and grid computing, combinatorial optimization algorithms and applications and software frameworks.