Survey and Comparison of Parallelization ... - Semantic Scholar

Survey and Comparison of Parallelization Techniques for Genetic Algorithms CS 294-1 Final Project Marat Boshernitsan Computer Science Division University of California, Berkeley Berkeley, CA 94720 May 11, 1996

Abstract This paper surveys a number of parallelization techniques for genetic algorithms, focusing on two: the distributed genetic algorithm and the distributed tness computation. While parallelization of serial algorithms usually requires certain amount of tricks and twitches, the genetic algorithms are parallel by their nature. This feature of genetic algorithms is widely exploited and there exists a number of ways in which a given algorithm may be parallelized. In addition to examining several parallel designs, this paper compares the performance of two implementations (the distributed genetic algorithm and distributed tness computation) on dual-prcessor SunSPARC 20 for optimization of a simple function f (x) = ( xc )10 (described in detail in [6]).

0 The author can be reached at: [email protected]. An on-line version of this paper can be found on the WWW page: http://www.cs.berkeley.edu/~maratb/cs294-1/writeup.html.

1

1 Introduction As any other evolutionary algorithm, the genetic algorithms (GA) are conceptually based on simulating the evolution of individual structures via processes of selections, mutation, and reproduction [8]. Since GAs are based on the mechanics of natural selecton and natural genetics, it is only \natural" that they are readily parallelizable. Much like their biological counterparts, the genetic algorithms function by the principle of survival of the ttest, and hence require a considerable amount of computation to decide which solutions are indeed the ttest and which shall perish and be forgotten. By the very nature of evolutionary computation, the tness of an individual solution depends only on the properties of the particular individual and hence the computation of tnesses for each generation can be caried out independently for every solution. This property of the genetic algorithms forms the basis for what John Holland calls implicit parallelism [9]. Because traditionally the GAs have been used to perform searches and optimizations that would take super-polinomial time (NP) [5], the propery of implicit parallelism is very important. The class of NP problems is vast and their applications are numerous, and so, to achieve resonable performance, one needs to use a non-deterministic Turing Machine (NTM) [10] simulator. Such machine operates by performing guesses along its execution path, which lead (eventually) to a correct solution. An ideal non-deterministic Turing Machine would traverse all possible paths in parallel, hence solving the problem in polynomial time. In practice, various randomized techniques are used to simulate NTM. Not surprisingly, GAs come very close to being one of the best simulation techniques precisely because of implicit parallelism. An attractive alternative to exploiting implicit parallelism of GAs is the notion of so-called island or network parallelism. This concept examined by Grefenstette in [7] is based on a seemingly simple idea of having several genetic algorithms run with independent memories, independent genetic operations, and independent function evaluations. Such processes work normally, with the exception that a certain number of \best" individuals in every population is selected after each generation and is migrated to other populations over the network. A number of variations on this technique have been developed, some of which are examined later in this paper. The advances [XXX check word] of parallel computing in the recent years provide a solid ground for experimenting with various kinds of parallelism in genetic algorithms. The availability of relatively inexpensive multi-processor workstation and appropriate software make it simpler than ever to perform evaluations and comparisons of dierent methods. [XXX more here ?] This paper proceeds to discuss various tools (hardware, software, simulators) for GA parallelization, formulate the eciency problems inherent in serial implementationof genetic algorithms, suggest and evaluate two implementations of simple genetic algorithm (SGA) [6], and nally discuss variations on the suggested implementations.

2 Parallelization Tools

2.1 Hardware

The single instruction/single data computers (SISD) are characterized by carrying out single instructino on one dataum at any given instant. While this serial (von Neumann) architecture is still prevalnet among desktop computers 1 , the increasing availablility of relatively inexpensive 1 There are very few \true" von Neumann machines around these days; almost all computers use some small-scale parallelism, such as pre-fetching of instructions, but this is usually hidden from the user.

2

PE

PE

PE

PE

CPU

PE

PE

PE

PE

CPU

CPU

CPU

CPU

MIMD

SIMD

Figure 1: Popular parallel computer architectures. parallel computers is starting to change the look of scienti c computing [12]. Figure 1 shows two popular organizations for parallel computers. In SIMD (Single Instruction/Multiple Data) architecture many simple processing elements (PEs) execute the same instruction, but on dierent data. The PEs are coordinated by a master CPU which broadcasts program instructions to PEs. Each PE is equiped with its own storage and may broadcast its data to other PEs and/or master CPU. The MIMD (Multiple Instructions/Multiple Data) design is characterized by containing several interconnected equi-powerful CPUs that execute individual program 2 . When the number of CPUs is relatively small, all processors share same address space and thus communicate via loading/storing data in memory. Such architecures are dubbed shared memory multiprocessors (SMMP). It is easily seen that performance of both SIMD and MIMD computers highly dependent on interconnection bandwidth; however, this is usually not an issue because the distance between processing units is kept relatively small. As the speed of computer networks is rapidly rising, distributed systems of workstations such as NOWs [1] are proving to be an even more inexpensive and attractive alternatives to massively-parallel procesor (MPP) architecures described above. All simulations in this paper were performed on SunSPARC 20, a dual-processor SMMP and on a NOW consisting of similar machines. Although this proven to be a severely limitted environment, certain results and conclusions can already be drawn as outlined in Sections 6 and 7.

2.2 Software

A host of software systems and libraries exist for development of parallel programs. Of particular interest is Parallel Virtual Machine (PVM) [11] which provides a uni ed framework consisting of programming library and a fully developed parallel process coordinator. PVM is mainly characterized buy its portability and availability on various systems ranging from MPPs to NOWs. Figure 2 depicts a very high-level view of PVM architecture. Each parallel program that is to run under PVM is linked against PVM library which provides basic calls for process control, message passing, etc. A special process called \PVM daemon" is run on each workstation that is to be included into the simulated virtual machine. A particular feature of PVM is that it is successfully functions with multiprocessor workstations, thus creating a coherent environment for experimenting with various paradigms. 2

In practice, these processors often execute the same program, but follow dierent execution paths.

3

Application Program

Application Program

PVM Library

PVM Library

PVM Daemon

PVM Daemon

Figure 2: Overview of PVM architecture. Node 1

Slave 1

Slave 6

Slave 2

Node 6

Node 2

Slave 3

Node 5

Node 3

Master Slave 5

Slave 4

Node 4

A

B

Figure 3: Models of \crowd" computation. A particularly usefull feature of PVM is the support for various models of parallel computation. Of particular interest is the so-called \crowd" model of computation wich can be further categorized as follows:

The Master-slave (or host-mode) model (Figure 3A) in which a special control program

called the \Master" coordinates initialization, collection of resultss, load balancing, etc. The actual computational work is performed by \slave" processes which deliver the results to the \master". In the context of genetic algorithms, this model is best suited for taking advantage of implicit parallelism and districbute the computation of tness functions accross multiple slave processes. This approach has lead to implementation described in Section 4. The Node-only model (Figure 3B) where multiple instances of a single program execute in parallel, perhaps exchanging data and otherwise helping each other's computations. This approach is best t for \network" parallelization of genetic algorithms and is further studied in Section 5

2.3 GA simulations with GALLOPS

All of the simulations in this paper were performed using GALOPPS (Genetic Algorithm Optimized for Portability and Prallelism System) [4]. The GALOPPS is a distant descendant of Simple 4

1 0.9 0.8 0.7

f(x)

0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5 x

0.6

0.7

0.8

0.9

1

Figure 4: Function f(x) = (230x?1) 10. Genetic Algorithm (SGA) described by David Goldberg in [6]. The GALOPP System is capable of simulating both serial and parallel implementations of genetic algorithms. The parallel implementation emulates network parallelism (see Section 5) and is implemented using a sohisticated checkpoint/restart system. Of particular interest is PVM extension to GALOPPS implemented by Vera Bakic [3]. This extension allows GALOPPS to simulate network model by actually running a number of genetic algorithms in parallel and exchanging individuals according to prede ned paths. In order to collect timing information about the execution of the algorithm, we will use a simple timing mechanism which allows us to place checkpoints at arbitrary places in execution path. The timing data is collected in memory and is written to disk at approprate points in execution.

3 Serial Genetic Algorithm In this section we will examine the pro le the execution of simple genetic algorithm on the same function Goldberg uses in [6], namely f(x) = xc n, where c has been chosen to normalize x, and n is taken to be 10. We will use the chromosome of length 30 bits, and so the normalizing coecient c will be 230 ? 1. The graph of this function is shown in Figure 4. We will use the same parameters for our genetic algorithm as Goldberg, who in turns follows De Jong's suggestions in [2]: probability of mutation = 0:0333 probability of crossover = 0:6 population size = 30 5

1 best−of−generation

0.9 0.8 0.7

f(x)

0.6 0.5 0.4

generation average

0.3 0.2 0.1 0 0

5

10

15

20 25 30 generation number

35

40

45

50

Figure 5: Best-of-generation results and generation average results for function f(x) = (230x?1) 10. Since we are interested in collecting pro ling information, we will let the algorithm run for 50 generations, even thought that many runs are frequently unnecessary. Indeed, looking at Figure 5 we notice that we get reasonably close to maximum around 30th generation (to be more precise we obtain x = 0:999961844, which is within 0:0038% of maximum value of 1:0). [XXX grpaphs and timings go here]

4 Exploring Implicit Parallelism 5 Network Parallelism in GAs 6 Future Work 7 Conclusion References [1] Anderson, T. E., Culler, D. E., and Patterson, D. A. \A Case for NOW (Networks of Workstations)". IEEE Micro, 1994. [2] De Jong, K. A. \An analysis of the behavior of a class of genetic adaptive systems." (Doctoral dissertation, University of Michigan). Dissertation Abstracts International 36(10), 5140B. (University Micro lms No. 76-9381. 6

[3] Bakic, V. The PVM GALOPP System 3.0 User's Guide Michigan State University, 1995. [4] Goodman, E. D. GALOPPS: The \Genetic ALgorithm Optimized for Portability and Parallelism" System Technical Report 95-06-01, Michigan State University, 1995. [5] Garey, M. R. and Johnson, D. S. Computers and intractability: a guide to the theory of NP-completeness W. H. Freeman, 1979. [6] Goldberg, D. E. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989. [7] Grefenstette, J. J. \Parallel adaptive algorithms for function optimization". Technical Report CS-81-19. Computer Science Department, Vanderbilt University, 1981. [8] Heitkotter, Jorg and Beasley, David, eds. The Hitch-Hiker's Guide to Evolutionary Computation: A list of Frequently Asked Questions (FAQ) USENET: comp.ai.genetic [9] Holland, J. H. Adaptation in natural and arti cial systems. The University of Michigan Press, 1975. [10] Hopcroft, J. E. and Ullman, J. D. Introduction to automata theory, languages, and computation Addison-Wesley, 1979. [11] Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, Vaidy Sunderam. PVM (Parallel Virtual Machine): A Users' Guide and Tutorial for Networked Parallel Computing MIT Press, 1994. Also available at http://www.netlib.org/pvm3/book/pvmbook.html. [12] Trew, Arthur and Wilson, Greg, eds. Past, present, parallel: a survey of available parallel systems Springer-Verlag, 1991.

7

Survey and Comparison of Parallelization ... - Semantic Scholar

Survey and Comparison of Parallelization ... - Semantic Scholar

Suggest Documents

a comparison of parallelization and performance ... - Semantic Scholar

A comparison of nested loops parallelization ... - Semantic Scholar

Performance Optimization and Parallelization of ... - Semantic Scholar

Optimization and Parallelization of Monaural ... - Semantic Scholar

Fastpath Speculative Parallelization - Semantic Scholar

Parallelization and Distribution Techniques for ... - Semantic Scholar

Automatic Parallelization of Recursive Procedures - Semantic Scholar

Coarse grain parallelization of evolutionary ... - Semantic Scholar

Automatic Parallelization of XQuery Programs - Semantic Scholar

Parallelization of non-simultaneous iterative ... - Semantic Scholar

Parallelization of CABAC Transform Coefficient ... - Semantic Scholar

parallelization of apex airborne imaging ... - Semantic Scholar

Automatic Parallelization of Scripting Languages ... - Semantic Scholar

EFFICIENT PARALLELIZATION OF H.264 ... - Semantic Scholar

Toward Automatic Parallelization of Spatial ... - Semantic Scholar

Automatic Parallelization of Non-uniform ... - Semantic Scholar

A survey and comparison of transformation tools ... - Semantic Scholar

Extending Automatic Parallelization to Optimize ... - Semantic Scholar

Parallelization and scalability issues of a multilevel ... - Semantic Scholar

Towards automated code parallelization through ... - Semantic Scholar

On the Automatic Parallelization of Sparse and ... - Semantic Scholar

A Comparison of Automatic Parallelization Tools ... - CiteSeerX

Parallelization Strategies for Network Interface ... - Semantic Scholar

Parallelization and Performance Analysis of Video ... - Semantic Scholar