Comparing Parallelization of an ACO: Message ... - Semantic Scholar

Comparing Parallelization of an ACO: Message Passing vs. Shared Memory Pierre Delisle1, Marc Gravel1, Michaël Krajecki2, Caroline Gagné1, and Wilson L. Price3 1 Département d’informatique et de mathématique, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada, G7H 2B1 {pierre_delisle, marc_gravel, caroline_gagne}@uqac.ca 2 Département de Mathématiques et Informatique, Université de Reims Champagne-Ardenne – F-51687 Reims Cedex 2, France [email protected] 3 Faculté des Sciences de l’administration, Université Laval, Québec, Canada, G1K 7P4 [email protected]

Abstract. We present a shared memory approach to the parallelization of the Ant Colony Optimization (ACO) metaheuristic and a performance comparison with an existing message passing implementation. Our aim is to show that the shared memory approach is a competitive strategy for the parallelization of ACO algorithms. The sequential ACO algorithm on which are based both parallelization schemes is first described, followed by the parallelization strategies themselves. Through experiments, we compare speedup and efficiency measures on four TSP problems varying from 318 to 657 cities. We then discuss factors that explain the difference in performance of the two approaches. Further experiments are presented to show the performance of the shared memory implementation when varying numbers of ants are distributed among the available processors. In this last set of experiments, the solution quality obtained is taken into account when analyzing speedup and efficiency measures.

1 Introduction Many interesting combinatorial optimization problems belong to the class said to be NP-hard [1] and therefore cannot be solved in polynomial time by exact algorithms, be they sequential or parallel. Metaheuristics offer a solution approach which, without guaranteeing optimality, is generally able to produce good solutions. Because these methods often require long computation times and considerable working memory, parallelization would seem to be a promising avenue for performance improvement. The field of parallel metaheuristics is, however, young and the transition between a conventional sequential metaheuristic and an efficient parallel implementation is neither simple nor obvious. Works describing parallel approaches to combinatorial optimization problems are generally based on one of the two following paradigms: message passing and shared M.J. Blesa et al. (Eds.): HM 2005, LNCS 3636, pp. 1 – 11, 2005. © Springer-Verlag Berlin Heidelberg 2005

2

P. Delisle et al.

memory. However, most of the recent literature is based on the message passing model which is better known and more mature. The recent emergence of shared memory multiprocessors such as the SMP (Symmetric Multiprocessor) revived interest in the shared memory model, but shared memory implementations of metaheuristics are rare and it is therefore difficult to evaluate the potential of this approach. The aim of this paper is to propose a shared memory approach to the parallelization of the Ant Colony Optimization (ACO) metaheuristic and to compare its performance with a message passing implementation described by Randall & Lewis [2]. The ACO algorithm they use as a basis for their implementation is very similar to ours, as is the algorithmic approach to parallelization. For these reasons, we found a comparison of the experimental results of the two approaches to be particularly interesting and appropriate.

2 The ACO and Literature Review on Its Parallelization The version of the ACO used by Randall & Lewis is described in Fig. 1. It is the ACO proposed by Dorigo & Gambardella [3] for the solution of the travelling salesman problem. However, Randall & Lewis did not use a candidate list in the transition rule to limit computations and no local search is incorporated in the algorithm. It is important to define the specific version of the algorithm being used to ensure a fair comparison of the two parallel implementations of the ACO. Briefly, the description of the metaheuristic is as follows: the initial cycle of the algorithm starts with a random choice of the starting city for each of the ants. For each ant, tours are then constructed by adding cities one at a time. A certain number of ants (m), which may be considered to be independent agents, construct tours simultaneously. Each city is added to the tour according to a transition rule (Expression 1) which takes into account the visibility d (distance) as well as the accumulation of the pheromone trail (t) which is a form of memory of the quality of solutions found so far. As a new city is added to the tour, a local update of the pheromone trail (Expression 2) is carried out to reduce the probability of other ants repeating the same choice. When all ants have completed a tour, a global update of the pheromone trail is made (Expression 3) using the best tour (L+) found in the cycle. If required, the best solution found so far is updated and a new cycle commences. There are few parallel implementations of the ACO in the literature because it is a relatively new metaheuristic. Using the message passing model, Bullnheimer et al. [4] proposed two parallel versions of the Ant System: a synchronous version and a partially asynchronous version. The parallel synchronous version may be thought of as a low-level parallelization whose aim is to accelerate the algorithm by the generation of parallel ants in a master-slave fashion. At each cycle, the master broadcasts the pheromone trail matrix to the slaves and each slave constructs a tour which, along with its tour-length, is reported to the master. A synchronization of the process is carried out at each iteration. To reduce communication costs in the synchronous version, these authors developed a partially synchronous version where a number of cycles of the sequential algorithm are carried out independently on different processors. Following these “local” iterations, a global update is carried out by the master. Subsequent to a comparison of the performance of these two versions

Comparing Parallelization of an ACO: Message Passing vs. Shared Memory

3

of the algorithm, the authors concluded that the partially asynchronous version is preferable because it allows a major reduction in the frequency and volume of communications. Note that these algorithms were not implemented on a parallel architecture and it is difficult to determine the efficiency of their parallelization scheme. Set the pheromone trail matrix τ(0) at τ0 = (n Lnn)-1 for each pair of city ij; FOR t = 0 to tMax DO **** Ant construction and local update of the pheromone trail **** FOR k = 1 to m DO Place ant k on a randomly chosen city; End FOR FOR i = 1 to n-1 DO FOR k = 1 to m DO k

Choose the next city j є J i to visit according to : ⎧ ⎪ ⎪ argmaxl∉Tabuk j =⎨ ⎪ ⎪⎩ J

⎧⎪ ⎨[τ il (t)] ⎪⎩

⎡1⎤ ⋅⎢ ⎥ ⎣dil ⎦

β

⎫⎪ ⎬ ⎪⎭

if q ≤ q if q > q

where J is chosen according to the probability: β [τ ij (t )] ⋅ ⎡⎢ 1 ⎤⎥ ⎣ dij ⎦ p ijk (t ) = β ⎡ 1 ⎤ ∑ [τ il (t )] ⋅ ⎢⎣ d il ⎥⎦ l∉Tabou k Local update of the pheromone trail for (i, j):

τij (t ) = (1 - ρ l ) ⋅ τ ij (t ) + ρ l ⋅ τ 0

0

0

(1)

(2)

Update the length tour Lk with the addition of the city j; End FOR End FOR *** Best solution and global update of the pheromone trail *** Update of L+, the best solution so far found; Global update of the pheromone trail using L+:

τij (t + 1) = (1 - ρ g ) ⋅ τ ij (t ) + ρ g ⋅ ∆τ ij (t )

(3)

End FOR Fig. 1. Description of the sequential ACO (Randall & Lewis)

Stützle [5] presents two strategies for parallelization using a message passing architecture: the execution of multiple copies of the same algorithm, and the acceleration of a single copy of the algorithm. In the first case, one may use the same search parameters or vary them for each of the parallel runs. While the author did not

4

P. Delisle et al.

obtain significant differences between the two approaches when he used them in applying the Max-Min Ant System to the travelling salesman problem, he suggests that performance differences could occur for other problems. The second case, that aimed at the acceleration of a single execution of the algorithm, is similar to the synchronous version described by Bullnheimer et al., but in addition it uses a local search procedure. An efficient parallelization can therefore be obtained by applying a local search in parallel to solutions previously generated. Talbi et al. [6] successfully solved the quadratic assignment problem using a parallel approach. They proposed a parallel model of the ACO similar to the synchronous model of Bullnheimer et al., however they used Tabu Search as a local improvement method. Michel & Middendorf [7] proposed an island model of the ACO where colonies exchange the best solutions they have found after a fixed number of cycles. When a colony receives a solution that is better than its own best solution, an update is carried out. This new information influences the colony because the pheromone trail updates are carried out using this solution. Middendorf et al. [8] studied four strategies for the exchange of information among multiple ant colonies. They show that it may be advantageous for the colonies to avoid exchanging too much information and to avoid too frequent exchanges. Abandoning the idea of exchanging complete pheromone trail matrices, they based their strategies on the exchange of a single solution at a time. They thus obtained efficient parallel implementations. Randall & Lewis [2] proposed five strategies for the parallelization of the ACO and give detailed results obtained using one of these strategies. In the following section, we describe this strategy and the results obtained by Randall & Lewis will be used as a basis for comparison with a shared memory parallelization approach that we ourselves propose.

3 The Randall & Lewis Message Passing Parallelization of the ACO Randall & Lewis [2] developed a parallel ACO to solve the travelling salesman problem on a distributed memory architecture. Their approach of the “parallel ant” type is an internal parallelization composed of a master processor and multiple slave processors. Each of the slaves is assigned to an ant and is tasked with the construction of a tour. The master receives the input from the user, assigns a starting city to each ant, carries out the local and global updates of the pheromone trail and produces the output results. Fig. 2 and 3 describe the activities of the master and the slaves in pseudo-code. It must be noted that the algorithm of Randall & Lewis assigns only one ant to each processor. In their numerical experiments, the number of processors varies from 2 to 8, ρg = 0.1, q0 = 0.9, tMax=1000 and β = 2.


5

Broadcast the algorithm parameters to each ant Broadcast the d matrix to each ant FOR t = 0 to tMax DO Place ant k on a randomly chosen city Send each initial city to each ant FOR i = 1 to n-1 DO Receive each ant’s next city and add it to the corresponding tour Update the pheromone matrix τ using the local update rule Broadcast m pheromone update of the matrix τ to each ant End FOR Receive the length tour L from each ant Update of L+, the best solution so far found Global update of the pheromone trail using L+ Broadcast the pheromone update of the matrix τ to each ant End FOR Broadcast the termination condition to each ant Print the shortest tour and its length L+ Fig. 2. Pseudo-code for the master processor

Receive the algorithm parameters from the master Receive the matrix d from the master Initialize the pheromone matrix τ WHILE the termination condition is met DO Initial_City = City = Initial city receive from the master FOR i = 1 to n-1 Next_City = Choose the next city according the equation (1) Send Next_City to the master L = L + d (City, Next_City) City = Next_City Receive the m pheromone update of the matrix τ from the master End FOR L = L + d (Next_City, Initial_City) Send L to the master Receive the pheromone update of the matrix τ from the master Receive the termination information signal from the master End WHILE Fig. 3. Pseudo-code for the slave processors

4 A Shared Memory Parallelization of the ACO In this section, we propose a shared memory parallelization approach of the internal, fine-grain type. Certain inherent constraints on the sequence of operations must be respected in the parallel version so as to preserve the integrity of the search for solutions. The distribution of ants to the processors must allow the simultaneous

6

P. Delisle et al.

construction of solutions and the local update must be carried out each time a city is added to a tour. Using a local update strictly equivalent to that used in the sequential algorithm would require an onerous synchronization and would be a severe limit on the potential of the parallelization. We have examined the quality of solutions to problems having as many as 2,000 cities [10] and found that the search process is not adversely affected. We therefore conclude that it is desirable to relax this requirement. Moreover, certain precedence relations for the update of the best solution so far found must be preserved. This update can only be done one processor at a time if one is to avoid data integrity problems. Finally, one must ensure that all the processors have constructed, evaluated and compared their solutions to the best known solution before a global update is undertaken within a given cycle. The parallelization proposed is described in Fig. 4. It distinguishes three groups of operations that are treated separately in a single parallel region. The first group of Initialize τ FOR t = 0 to tMax DO Choose randomly the first city for each of the m ants *** Start a parallel region with m processors *** NoProc = Index of the processor FOR i = 1 to n DO FOR k = 1 to m DO If NoProc = = k modulo m Choose the next city to visit for ant k Update the length tour Lk with the city added to the tour *** Critical Zone *** Local update of τ by the processor k End FOR End FOR FOR k = 1 to m DO If NoProc = = k modulo m If Lk < L+ *** Critical Zone *** Update of L+, the best solution so far found End FOR *** Synchronization barrier *** FOR i = 1 to n DO If NoProc = = i modulo m FOR j = 1 to n DO Apply equation (3) for edge (i, j) End FOR End FOR Global update of the pheromone trail using L+ *** End the parallel region *** End FOR Print L+ Fig. 4. Pseudo-code for the parallelization in shared memory


7

parallel operations equitably distributes the ants to processors and imposes a desynchronization of the local update of the pheromone trail. The second group of parallel operations carries out the update of the best solution known so far. By again equitably distributing the ants among processors, we allow a processor that has completed constructing its solutions to update the best known solution even if the other processors have not yet completed the first group of operations. The processor in question compares its best solution to the best solution stored in shared memory and updates it if required. This update must, on the other hand, be carried out within the critical zone to guarantee that a single processor will write to this data structure at a given moment. Finally the third group of parallel operations carries out the global update of the pheromone trail by uniformly distributing the update of the lines of the matrix to different processors. However, this procedure must be preceded by a synchronization barrier that ensures that all the processors have finished the treatment of all the ants and that the best solution has, in fact, been updated. Let us note that, in the three groups of operations in the parallelization, the computing load associated with the tasks that are carried out in parallel is regular and that a static scheduling is therefore sufficient. This parallelization in shared memory may be used with any number of ants. If we use a number of ants equal to the number of processors, we obtain the configuration used by Randall & Lewis. The numerical experiments were carried out using the same conditions as used in the message passing approach so as to obtain a valid comparison.

5 Experimentation and Results The experiments of Randall & Lewis were carried out on eight travelling salesman problems having between 24 and 657 cities, using an IBM SP2 computer having 18 266Mhz RS6000 model 590 processors. However, in the current comparison, only the four biggest problems were retained. The experiments in shared memory were carried out using 16 375Mhz Power3 processors on a NH2 node of an IBM/P 1600. We seek an approach that is as general as possible and not particular to a given type of computer, and our implementation of the parallelization strategy was facilitated through the use of OpenMP which is a new tool that allows the development of programs portable to a number of shared memory parallel computers. Table 1 compares speedup and efficiency, which are the two performance measures used by Randall & Lewis in their paper, for both approaches. However, it is important to note that our performance measures were not produced by following the exact same guidelines. For each problem, Randall & Lewis performed a single execution with a fixed seed and calculated speedup with the following formula: Speedup = Time to solve a problem with the fastest serial code on a specific parallel computer Time to solve the problem with the parallel code using p processors on the same computer

The numerator was measured by CPU time and the denominator was measured by wall clock time. We used the same formula and time measures to calculate speedup. However, our results are the average of ten trials with different seeds, which seemed to be a realistic setup for stochastic algorithms such as ACO. The method of Randall & Lewis for calculating speedup follows more strictly the guidelines outlined by Barr

8

P. Delisle et al.

and Hickman [9], however our method is also generally accepted. For this reason, the results presented here should be interpreted as showing trends rather than comparing strict numerical measures. Table 1. Speedup and efficiency Message-passing (Randall & Lewis) Number of processors 2 3 4 5 6 7 8 1.20 1.44 1.44 1.59 1.61 1.53 1.58 lin318 (318 cities) 0.60 0.48 0.36 0.32 0.27 0.22 0.20 1.42 1.62 1.93 2.18 2.31 2.31 2.35 pcb442 (442 cities) 0.71 0.54 0.48 0.44 0.38 0.33 0.29 1.56 1.78 2.10 2.55 2.77 3.02 3.08 rat575 (575 cities) 0.78 0.59 0.52 0.51 0.46 0.43 0.38 1.67 1.95 2.32 2.89 3.25 3.29 3.30 d657 (657 cities) 0.83 0.65 0.58 0.58 0.54 0.47 0.41 Problems

2 1.65 0.83 1.71 0.86 1.78 0.89 1.74 0.87

Shared-memory Number of processors 3 4 5 6 7 2.39 3.09 3.64 4.15 4.63 0.80 0.77 0.73 0.69 0.66 2.47 3.26 4.02 4.57 5.24 0.82 0.81 0.80 0.76 0.75 2.54 3.23 3.95 4.62 5.14 0.85 0.81 0.79 0.77 0.73 2.62 3.39 4.12 4.83 5.36 0.87 0.85 0.82 0.80 0.77

8 4.77 0.60 5.55 0.69 5.74 0.72 6.14 0.77

We note that the performance of the shared memory approach is superior in all cases. The measures of speedup are larger and increase more rapidly when the number of processors is increased. For efficiency, the measures are also better and their decrease is slower. A number of factors can explain these differences in performance : •

• • •

In the algorithm of Randall & Lewis, for each local update, all of the slaves send a message to the master and wait for the master’s broadcast of the updates to carry out before continuing the construction of their respective solutions. This processor inactivity is minimized in the shared memory approach by the desynchronization of the local update; Management of the access to shared memory and of the synchronizations may be more efficient at the software and hardware levels than is the case for message passing routines for this type of parallelization; The parallel computer used in the shared memory experiments is more technologically advanced and is probably better at managing parallel computations; Various other technical factors such as the compiler and the quality of the code may influence performance.

Even if it is not possible to draw certain conclusions concerning the actual relevance of each of these factors, it would seem reasonable to believe that because of some combination of them, the shared memory approach performs better. In Delisle et al. [10], it was established that it is generally preferable to increase the workload on the processors by assigning several ants to each. It is therefore interesting to consider numerical experiments that do not use a single ant per processor as did Randall & Lewis. The 657 cities problem was therefore solved using 10, 20 and 40 ants shared equally among the processors. The number of cycles carried out in each experiment was, however, reduced so as to maintain the total number of operations obtained in the previous version. Table 2 presents a comparison of the


9

results obtained. We note that the increase in the number of ants assigned to each processor results in a significant increase in the parallel performance. When m = 40, the measures of efficiency vary from 82% to 92% regardless of the number of processors used. Table 2. Solution quality (% of the best known solution), speedup and efficiency obtained by varying the number of ants per processor for the shared memory approach (Problem d657)

m Equal to the number of processors

2 0.294 1.74 0.87 0.342 1.82 0.91 0.368 1.87 0.94 0.371 1.84 0.92

10

20

40

3 0.289 2.62 0.87 0.344 2.34 0.78 0.357 2.61 0.87 0.362 2.66 0.89

Number of processors 4 5 6 0.280 0.292 0.294 3.39 4.12 4.83 0.85 0.82 0.8 0.343 0.333 0.348 2.99 4.09 4.13 0.75 0.82 0.69 0.340 0.355 0.341 3.58 4.29 4.49 0.90 0.86 0.75 0.352 0.344 0.356 3.56 4.35 4.97 0.89 0.87 0.83

7 0.303 5.36 0.77 0.324 4.18 0.60 0.334 5.63 0.80 0.348 5.74 0.82

8 0.303 6.14 0.77 0.321 4.29 0.54 0.332 5.65 0.71 0.348 6.71 0.84

Table 3. Solution quality (% of the best known solution), speedup and efficiency obtained by varying the number of ants per processor for the shared memory approach with a constant number of cycles (Problem d657)

m 10

20

40

2 0.316 1.84 0.92 0.320 1.87 0.93 0.322 1.88 0.94

3 0.311 2.34 0.78 0.321 2.65 0.88 0.324 2.69 0.90

Number of processors 4 5 6 0.300 0.319 0.304 3.04 4.14 4.18 0.76 0.83 0.70 0.325 0.309 0.320 3.63 4.49 4.65 0.91 0.90 0.77 0.329 0.321 0.315 3.68 4.56 5.17 0.92 0.91 0.86

7 0.327 4.21 0.60 0.331 5.78 0.83 0.320 5.96 0.85

8 0.320 4.31 0.54 0.318 5.79 0.72 0.313 7.07 0.88

As to the solution quality, the reduction in the number of cycles to preserve the same number of operations causes a slight reduction in the average quality of solutions obtained. Quality of a particular solution is measured as the percentage gap between the value of the best known solution for this problem (48912) and the value of the solution considered. If the number of ants is fixed at 10, to preserve the same number of evaluations, the number of cycles goes from 200 (for 2 processors) to 800 (for 8 processors). If the number of ants is fixed at 20, for the same reason, the number of cycles will go from 100 (for 2 processors) to 400 (for 8 processors). For 40

10

P. Delisle et al.

ants, the range is from 50 cycles (for 2 processors) to 200 cycles (for 8 processors). If the number of cycles of the algorithm remained constant at 1000 whatever the number of processors used, the results of Table 3 allow us to see that the measures of efficiency increase again and that the quality of the solutions increases as well. The configuration where m=10 allows us to obtain solutions close to the quality of those obtained in Table 2 where the number of ants is equal to the number of processors. In any case numerical experiments where the parameter values are varied allow us to draw better conclusions concerning quality. In Delisle et al. [10], it was also shown that the use of local search methods improved solution quality and preserved good values of efficiency and of speedup.

6 Conclusion The objective of the comparisons presented in this work is not to determine the superiority of one parallel implementation over another. As pointed out in this article, many technical factors can influence performance and prevent the drawing of definitive conclusions about the quality of the parallelization approaches discussed. By reproducing as closely as possible the parallelization context of an ACO via message passing, we sought to show that the shared memory architecture offers a competitive avenue for the parallelization of this metaheuristic. We also wanted to show that it is possible to obtain good results in such an environment while respecting the constraints of the sequential approach. We also showed that a more realistic choice of parameter values as to the number of ants used by the algorithm has a positive impact on the performance when the number of processors varies from 2 to 8. This is a configuration found on currently available and reasonably priced shared memory parallel computers. The comparisons in this paper use a form of internal parallelization, but another popular parallelization strategy for the ACO is the establishment of multiple colonies which may be cooperative or independent. This approach has already been studied in the message passing context [5][8]. In future work, we will explore a parallelization using multiple colonies in a shared memory environment and we will compare results with existing approaches. A further approach that we suggest is the creation of hybrid shared memory/message passing algorithm. This approach would allow, for example, the evolution of a number of cooperative colonies that communicate by message passing on the nodes of an SMP cluster. Each colony could therefore be internally parallelized on each multiprocessor, shared memory node. The potential of such architectures could allow the solution of large scale problems with reasonable computing times.

References 1. Garey, M.S., Johnson, D.S.: Computer and Intractability : A Guide to the Theory of NPCompleteness. New York, W.H. Freeman and Co. (1979) 2. Randall, M., Lewis, A.: A Parallel Implementation of Ant Colony Optimization. Journal of Parallel and Distributed Computing, Academic Press Inc, 62, 2 (2002) 1421-1432


11

3. Dorigo, M., Gambardella, L.M.: Ant colonies for the Traveling Salesman Problem. BioSystems, 43 (1997) 73-81 4. Bullnheimer, B., Kotsis, G., Strauss, C.: Parallelization Strategies for the Ant System. In: R. De Leone, A. Murli, P. Pardalos, and G. Toraldo (Ed.), High Performance Algorithms and Software in Non-linear Optimization , Kluwer Academic Publishers (1998) 87-100 5. Stützle, T.: Parallelization Strategies for Ant Colony Optimization. In: Proceedings of Parallel Problem Solving from Nature -- PPSN-V, Amsterdam, Lecture Notes in Computer Sciences, Springler Verlag, A.E. Eiben, T. Bäck, M. Schoenauer, and H.P Schwefel (ed.) (1998) 722-731 6. Talbi, E.-G., Roux, O., Fonlupt, C., Robillard, D.: Parallel Ant Colonies for Combinatorial Optimization Problems. In: BioSP3 Workshop on Biologically Inspired Solutions to Parallel Processing Systems, IEEE IPPS/SPDP'99 (Int. Parallel Processing Symposium / Symposium on Parallel and Distributed Processing), J. Rolim, San Juan, Puerto Rico, USA, Springer-Verlag (1999) 7. Michel, R., Middendorf, M.: An Ant System for the Shortest Common Supersequence Problem. In New Ideas in optimization, D. Corne, M. Dorigo, F. Glover (ed.) (1999) 51-61 8. Middendorf, M., Reischle, F., Schmeck, H.: Information Exchange in Multi Colony Ant Algorithms. In Parallel and Distributed Computing, Proceedings of the 15 IPDPS 2000 Workshops, J.Rolim, G. Chiola, G. Conte, L.V. Mancini, O.H. Ibarra and H. Nakano (Ed.), Cancun, Mexico, Lecture Notes in Computer Sciences, Springer-Verlag (2000) 645652 9. Barr, H., Hickman, B.: Reporting computational experiments with parallel algorithms: Issues, measures and experts' opinions. ORSA Journal of Computing, 5 (1993) 2-18. 10. Delisle, P., Gravel, M., Krajecki, M., Gagné, C., Price, W.L.: A Shared Memory Parallel Implementation of Ant Colony Optimization. Working Paper, Université du Québec à Chicoutimi (2005)

Comparing Parallelization of an ACO: Message ... - Semantic Scholar

Comparing Parallelization of an ACO: Message ... - Semantic Scholar

Suggest Documents

ACO - Semantic Scholar

CP with ACO - Semantic Scholar

Fastpath Speculative Parallelization - Semantic Scholar

Parallelization of an object-oriented FEM dynamics ... - Semantic Scholar

AEMA: An Aggregated Emergency Message ... - Semantic Scholar

An ACO-inspired Algorithm for Minimizing ... - Semantic Scholar

Survey and Comparison of Parallelization ... - Semantic Scholar

Automatic Parallelization of Recursive Procedures - Semantic Scholar

Coarse grain parallelization of evolutionary ... - Semantic Scholar

Automatic Parallelization of XQuery Programs - Semantic Scholar

Parallelization of non-simultaneous iterative ... - Semantic Scholar

an adaptive parameter control strategy for aco - Semantic Scholar

Parallelization of CABAC Transform Coefficient ... - Semantic Scholar

parallelization of apex airborne imaging ... - Semantic Scholar

Performance Optimization and Parallelization of ... - Semantic Scholar

Automatic Parallelization of Scripting Languages ... - Semantic Scholar

Optimization and Parallelization of Monaural ... - Semantic Scholar

EFFICIENT PARALLELIZATION OF H.264 ... - Semantic Scholar

Toward Automatic Parallelization of Spatial ... - Semantic Scholar

Automatic Parallelization of Non-uniform ... - Semantic Scholar

An Extended Framework for Comparing ... - Semantic Scholar

An Extended Framework for Comparing ... - Semantic Scholar

Extending Automatic Parallelization to Optimize ... - Semantic Scholar

Parallelization and Distribution Techniques for ... - Semantic Scholar