Towards an Autonomously Configured Parallel

0 downloads 0 Views 764KB Size Report
probability statistics considering the genome building blocks for best .... Grefenstette [13] ..... Triola [31] we can estimate the population standard deviation ..... multidimensionality is not an issue as it is commonly found .... Elementary statistics.
2016 2nd IEEE International Conference on Computer and Communications

Towards an Autonomously Configured Parallel Genetic Algorithm: Deme Size and Deme Number

Bjorn. Johnson, Yanzhen Qu School of Computer Science Colorado Technical University Colorado Springs, USA e-mail: [email protected]; [email protected]

Abstract-Parallel

genetic

algorithms

have

been

used

solution as well as the time it takes to converge [1]. Deme size is essential to parallel GAs ability to converge. If the deme size is too small, not enough of the solution space will be searched and the algorithm may not converge to the global optimum. If the deme size is too large, the algorithm will waste time evaluating superfluous genomes resulting in extended and inefficient run times. The problem then arises on how to select the GA parameter in order to assure convergence and efficiency. A prominent solution to this problem was provided by Cantu-Paz [2], discussed in the Section III, and will be a boundary condition for the control group in this experimental study since it proves superior over traditional trial and error and early predictive methods. Initial work on parallel GAs was conducted on multiple computer systems such as Beowulf clusters. Therefore, the predictive algorithms for deme number and size were based on the island method for GAs and assume a single processor for each sub-population. The emergence of General Purpose Graphical Processing Unit (GPGPU) use towards calculation has provided architecture to conduct highly parallel computation on commodity hardware. This has led to increased research in parallel GAs and their application to a multitude of problems. The main difficulty with designing an algorithm to the GPGPU architecture is matching the algorithm to the hardware. Discussed in detail below, many studies using GPGPU acceleration choose deme number and sizes based on the number of compute blocks available. This limits the portability of the algorithm and has the propensity to nullify the predictive calculations of Cantu-Paz. The purpose of this study is to measure the effects of deme number and size on the processing of GAs that comprise considerations for convergence as well as GPU architecture and to demonstrate an increase or decrease of convergence and efficiency due to the static GPGPU architecture by comparing the results of a GA experiment using Cantu-Paz's predictive method and one based on the GPGPU architecture as the two boundary conditions. A thorough discussion on the effects of setting predictive variables to static values will be provided and then a new set of predictive algorithms will be derived to provide a closer model for fitting parallel GAs to the GPGPU architecture based on principles provided in the experimental group.

to

optimize many functions in science and engineering. However, they have not received wide acceptance due to their complex and

often

genetic

inefficient

algorithms

configuration

have

been

requirements.

ported

to

general

Parallel purpose

graphical processing units on a limited basis, again due to the difficulty in configuration. Most applications of parallel genetic algorithms

on

graphical

processing

units

take

only

computational performance into account. This is in contrast with tradition parallel genetic algorithm work that takes only convergence

into

consideration.

Each

genre

has

a

strong

following but there is no combined algorithm that takes both convergence and performance into account. This paper will present a method to determine efficient deme size and deme number that can be calculated during runtime with no human interaction. It takes convergence theory methodology as the basis to center the deme size and deme number to a well converging

algorithm

then

makes

accommodation

for

graphical processing unit architecture to allow for efficient computational performance. To this end, this study locates a common optimal between convergence and optimization then presents a method to incorporate the remaining variables into the function.

Keywords-genetic algorithms; graphical processing unit; parallel processing; deme size; deme number; convergence; performance; configuration automation

I.

INTRODUCTION

Genetic algorithms (GAs) are stochastic algorithms that take direct advantage of the principles found in nature for genetics and natural selection. GAs use the genetic operations of crossover, selection and mutation to create new alleles in each genome in order to randomly cover the search space. The outcome of each generation is determined by testing each genome to a fitness function; those most likely to lead to the solution are then afforded the ability to reproduce to create the next generation. The introduction of parallelization has led to the migration of parallel algorithms to this new architecture. As such, research in parallel GA processing has led to advances in efficiency and speed of processing. One key attribute to the adoption of parallel processing towards GAs is the selection of the number of demes, or subpopulations, in the population. The size of the deme is also important because it determines if the GA will be able to converge to a good

978-1-4673-9026-2116/$31.00 ©2016 IEEE

II.

278

TOPIC OVERVIEW

Problems in science can be related to the Smith-Nash conflict when considering mUltiple factors. Smith proposed an economic solution for game theory competition by deciding that the best outcome will come when each member makes decisions that will benefit their own interests over those of the group [3]. Nash revised this theory by proving that the best results will come when members make decisions that will benefit themselves and the group [4]. This is paralleled in the study of deme size on parallel genetic algorithms. On one side of the problem you have a consortium of researchers that posit the best genetic algorithm deme size will arrive from configuring genetic algorithms for best convergence. This argument does not consider computational performance. The other side of the argument posits that the best genetic algorithm is one that will perfonn best when configured to the architecture of the computational media. They only consider temporal performance and do not take into consideration convergence efficiency [2]. In the first case most deme size decisions are based on probability statistics considering the genome building blocks for best convergence. Building blocks in a genetic algorithm are segments of the genome that have a propensity to find a solution. Through statistical methods and probability theory, deme number and size are derived based on providing the highest number of genomes with the best building block material. Though this should reduce the number of generations and thus the computational time, there are no active decisions made based on system architecture. The second case bases the deme size solely on the desire for the best computational performance on the system. This method bases deme number and size on the number of compute nodes available and the amount of memory sharing and message passing that needs to occur. The consideration for convergence is not a factor in this method though it can be substantial. Consider a genetic algorithm that is designed very well for a GPU memory architecture. It does not pass unnecessary data to global memory and utilizes the minimum number of compute blocks. Even though it may run as efficiently as possible, it will still suffer performance delays if the deme number and size does not support efficient convergence to the solution. One of the main benefits of using genetic algorithms is they are not limited as other search algorithms. Unlike random search algorithms the GA is directed towards the optimum and not meandering through the search space. Unlike calculus base search algorithms, the function does not need to be continuous and a derivative does not need to be found. This gives GAs a fundamental advantage over its contemporary rivals. III.

migration selection. The solution to step one of the five problems in setting GA parameters in this research will allow for a much broader adoption of the GA to research, academia and industry as it is spurious deme number and sizes that often cause GAs to fail. The importance of this research lies in the ability to provide an algorithm that removes the trial and error methodology from configuring deme size and number. This will allow for the autonomous configuration of deme size and number and make the algorithms much more user friendly. For example, in the field of vertical cavity surface emitting lasers, such as those used in fiber optic transmission, to develop the epi-Iayer of the silicon wafer which defines the quantum well you must simultaneously optimize the field equation, heat equation, Poisson's equation, and Schrodinger's equations. This allows the developer to identifY the material composition of the required layers and how those layers are alternated. Currently used commercial systems using calculus based search algorithms or brute force iterative methods do not refine to close enough solution and the company is required to test several different material set-ups. This poses a great cost to the company. Parallel genetic algorithms are able to solve each of these equations and optimize them in concert with each other. However so far genetic algorithms are not used in this case because the configuration of the genetic algorithm, by trial and error, is too cumbersome a process. When the solution space is as large as containing several trillion possible solutions, having to run mUltiple genetic algorithms to optimize their configuration is not practically tenable to the company. B. Problem Statement

The configuration of parallel genetic algorithms is left to trial and error methods for deme size and number determination that do not take into account both convergence and computational perfonnance; this causes genetic algorithms to be difficult to setup and configure and often results in algorithms that do not converge to solution or are computationally untenable, thus excluding them from frequent use. e. Hypothesis Statement

A continuous function of the deme size and deme number, [(xdn,xds), for a parallel genetic algorithm exists which allows for the optimization of convergence and performance of a genetic algorithm run on a graphical processing unit. D. Resaerch Questions

There are three research questions that can be answered through conducting the experiments. First would be how does convergence react to changes in deme size and number from the lower to the upper bound? Since the Cantu-Paz method does not take architecture into account over use of the global memory can slow down the algorithm considerably. Therefore, it can be hypothesized that traditional use of the Cantu-Paz method will not perform as well at the combined method presented in this research. Conversely, the GPU method does not take into consideration convergence. How will perfonnance react to

PURPOSE, PROBLEM AND HYPOTHESIS

A. Research Purpose

This research is being pursued to further the effort to produce an autonomously configured parallel genetic algorithm. There are six parameters crucial to the performance of a GA: deme number and size, migration rate, migration frequency, topology, interconnectivity, and

279

of the phenomena that resulted from the variance of these parameters. He noticed that convergence was faster in a population separated into demes than a single panmictic population. However, he showed that if demes remained isolated for the entire experiment, the solution quality was poorer than that of the single panmictic population. In biology, population size has been directly correlated with a species ability to adapt to external environmental changes and ensure survivability. Theoretical and empirical validation of this can be found in the fields of evolutionary biology [15, 16], population genetics [17] and conservation biology [18]. A computational evolution is directly analogous to biological evolution in its dependence on population size. In the computational paradigm, popUlation size ensures a comprehensive search of the solution space. Insufficiently sized populations may not search enough of the solution space to converge to an acceptable solution whilst populations that are too large will expend computational resources at an excessive and inefficient level [19]. Goldberg, Deb and Clarke [20] proposed that the deme size can be estimated according to the complexity of the problem to be solved. Though this approach suggests a general principal for the optimal deme size, it has not been successful in application due to the inability to estimate the complexity of many problems. In the absence of a general [mite optimization size, Arabas et al. [21] proposed a varying population method. The algorithm tracked the age of individuals and replaced older individuals with new random ones. This caused a modulating fluctuation in the population size over time. This modulation was refined by Back et al. [22] who developed a steady state out--flux of old individuals and in-flux on new ones. In their Adaptive Population size, Genetic Algorithm (APGA) they proposed a varying deme size and showed it did increase convergence probability. They still, however, had the initial problem of identifying original deme size and number. If the original size and number were not accurate, steady-state variance still would not perform optimally. APGA was expanded upon by Fernandes and Rosa [23] by basing the steady state variance on a diversity driven reproduction factor. Smith et al. [24] used selection error to trim or expand population size. Golberg [6] determined performance can be increased by using a pool of elite individuals in which to augment populations. The algorithm would randomly re­ initialize the populations with a combination of elite and randomly generated individuals. Though this method did help to reduce the likelihood of convergence to a local optimum, it had the propensity to destroy high fitness individuals that did not yet make it into the elite pool. In this same manner, Harik, Lobos and Goldberg [25] re-initialize the population based on a parameter less GA. Several varying sized demes would run in parallel with less fit smaller populations being replaced by individuals from more fit larger populations. This however, could lead to the loss of highly fit individuals that are temporarily stuck in low fitness demes. This loss of information was recognized by Stanley and Miikkulainen [26]; who later developed a method called speciation in which individuals would be weighted based on

variation in deme size and deme nwnber from the upper to the lower bound? Discovery of this will provide validity to the need for a combined method. Finally, are the two results indirectly proportional in so much that they provide an intersection which can be exploited as an optimization for both convergence and performance? As stated in the hypothesis, the presence of this intersection is likely due to the expected difference in slope of the convergence and performance surfaces. IV.

RELEATED

WORKS

Genetic algorithms where created by John Holland in the 1960s to study the evolution of species in biology [5]. They were later implemented to solve scientific problems in other areas of science and engineering [6]. This branching of genetic algorithms to other areas of science has led to the development of research directly on the fundamental abilities of genetic algorithms and how they can be better configured and implemented. Parallel genetic algorithms have received much attention in research and application in the past decade due to their ability to approximate difficult and sometimes untenable problems [7]. A fundamental problem with GAs is the methods for deciding the parameters that govern their performance. This is complicated in the parallel paradigm since there are now many more parameters that need to be determined. The main issues to be determined for a parallel GA are (1) the size and nwnber of the demes, (2) the topology and interconnectivity of the demes, (3) the migration rate for controlling the nwnber of individuals that will migrate, (4) the migration rate frequency or the parameter that determines how often individual genomes migrate, and (5) the policy that determines which individuals migrate, where they migrate and which genomes are replaced at the migration terminus location [8]. Since this study is focused on question (1) pertaining to deme size and number, a rigorous literature review was conducted to identify and evaluate the body of knowledge concerning this problem. The earliest proposal for multiple demes was offered by Bossert [9]. The proposed populations competed against each other for survival. In a manner analogous with Wright's demes [10] the survival of the demes was based on fitness. To accomplish this competition, two mechanisms were established. First, Bossert eliminated the least fit deme at random intervals and replace it with arbitrary populations. The second was to introduce diversity through migration and altering the fitness measure over time. The dynamic environment created by this shifting balance model often lead to increased problems with the GA, however, Grefenstette [11] and Oppacher and Wineberg [12] were able to show that the judicious use of shifting balance was effective in improving GA performance. Grefenstette [13] developed a multiple deme system were the best individuals migrated each generation to all populations in the system. His work drew interesting questions about the effects of migration, population size, topology, number of populations, and migration rate and frequency. These questions, presented above, are the grounds for most work in parallel GA parameterization. Grosso [14] was the first to observe several

280

their time in the system and their ancestry. The tracking of ancestry allowed individuals that had the propensity to develop acceptable solutions to remain in the deme long enough to develop. The consistent trend in the above research is to adapt the deme size dynamically at runtime. This is an advantageous approach that can lead to trimming and sizing of the GA for optimal performance but does not address the problems of estimating the initial size and number problem that is randomly chosen in most cases. Cantu-Paz (2001) [1] derives a general algorithm for initial size and number of demes for a parallel genetic algorithm. V.

where A and B are domain-dependent constants, g is the number of generations for convergence and gamma is a ratio between Tf the time to evaluate and Tc the estimated communication time. This algorithm was also chosen on the premise of a lack of consolidating algorithms in the literature and as a basis for deme number that can be built upon in a hybrid algorithm. Thus it follows that the optimal deme size in terms of the optimal deme number is given by Equation (3).

rei =Ar*B

RESEARCH DESIGN

The Cantu-Paz algorithm for deme size and number, begins with a common parallel problem in computer science, the Gambler's Ruin. Cantu-Paz begins his derivation with the idea that the number of quality building blocks in a solution will determine the propensity for convergence. The first question is to determine the target quality required in each deme, P. He takes a conservative approach to this and uses the required solution quality, Q, but states that with multiple demes, the chance that at least one of them succeeding increases with the number of demes utilized therefore the per-deme target quality can be relaxed. Thus the quality of the solution is the number of demes that converge correctly, Q. This probability, Pbb, is exemplified in the Gambler's Ruin problem and is dependent on the fact that the m partitions are independent of each other. This results in a binomial distribution with parameters m and Pbb• This line of thought then produces an expected solution quality based on the distribution and represented by Equation (4):

This research will be an experimental methodology to find a continuous function that optimizes both convergence and performance. The experiment will consist of trials on each experiment to solve the test function. The experiment encodes each experiment into an algorithm executed at runtime to compute the number of demes and the deme size. The boundaries for the experiments are the two methods for deciding deme size and deme number described in detail below. A.

Cantu-Paz method for Deme Size and Deme Number

One of the control groups for this study is the predictive algorithms derived by Cantu-Paz for determining the best deme size and number and the GPU method for optimizing demes to the architecture. The premise of these algorithms is that each allele has a set of building blocks that lead it to convergence. A building block is the smallest segment of the genome that provides for the convergence of a satisfactory solution. The probability of a genome in a deme having adequate building block material is the starting point for the derivation. A full derivation can be found in Cantu-Paz (1999) [2]. The resulting algorithms for the deme size and number are based on a fully connected topology with a high migration rate. Since this work is to study the effects of deme size, these migratory and topological parameters will be carried out throughout the experimentation for consistency. Equation (1) shows the formulation for determining the optimal number of demes for convergence.

E(Q)

,AB 9 Tf _1 ) -1 8 Tc



---

=

_1 (ABgy) -1 8 -

mPbb

(4)

(5) Since there is no closed order form for a set containing more than 5 units, a Guassian distribution was utilized to normalize the number of correct partitions as: (6)

where k is the order of the building block, P is the target quality, p is the single trial probability and q is the required solution quality. This equation is chosen since it is the only calculable form of a deme size algorithm present in the literature, verified in the work of Lawerence Davis's [27] Handbook or Genetic Algorithms. It provides a basis to begin deme sizing and a beginning point for a hybrid method that incorporates convergence considerations. The algorithm for determining the size of each deme is given in Equation (2). =

=

It should be noted that due to the distribution, some demes reach better solutions than others. This allowed Cantu-Paz to then write the qualities of the solutions of the r demes in ascending order. This represents the ordered statistics of the solution quality.

(1)

r*

(3)

Thus obtained, the expected quality of the best deme reduces to E(Zr:r) = Ilr:n where Ilr:r denotes the mean of the highest-order statistic of the Gaussian distribution. In this case Ilr:r grows very slowly with increases of rand Pbb (1 Pbb) is maximal at 0.5 thus giving the solution quality the form: -

(7) Ignoring the inequalities and solving for Pbb leads into the realization that P decreases very slowly with respect to r.

. (2)

281

Thus using his derivation from the Gambler's Ruin problem, Cantu-Paz solves for nd and develops the equation for the required number of demes to reach an acceptable solution, Eq. (7). This bounds our deme size and number to the solutions of the Cantu-Paz algorithm that are defmite. In fact, for each of the trials since the tests will be done on a single function, the results of the Cantu-Paz algorithm will be the same. The variance then comes from the random selection of the initial chromosomes in each deme. This will ensure that the trivial solution for the Cantu-Paz method, where each trial is a copy of the fust, does not occur and each trial will have variance.

VI.

EXPERIMENTS RESULTS

A. Experimental Design

For the experimental runs, the control and experimental group will be evaluated for their ability to solve the test function shown in equation

Jo( x2

+

y2) + 0.1 11 - xl

+

0.1 11 - yl

(8)

where 10 is the Bessel function and x and y are coordinates in the solution space. This Bessel function provides an infmitely large search space and also an infmitely large number of local optimizations. Figure (1) provides a truncated -20 < x, y < 20 graphical depiction of the solution space which lies from -100 < x, y < 100 at the boundaries. The boundaries were selected in order to provide a long enough compute time for meaningful evaluation. The minimal point in the solution space is f (1.0, 1.6606, 0.3356). This function is difficult for many function optimization strategies due to the level of oscillation. In general, hill climbing strategies that begin their climbs at randomly chosen locations rapidly become trapped in a local optimal in suboptimal oscillations. Since only one of the multitude of oscillations holds the global optimal, the chances of choosing this oscillation at random are not great and increase exponentially with the increase in solution space. The parameters for each run were detennined by the corresponding sizing algorithm for traditional, GPU and new methods. The use of the Bessel function allows for the generalization of the hybrid procedure to all similar problems. In the Bessel function the required inputs are a set of coordinates, in this research three. This is not a bound and a function with 11 dimensions will be able to benefit from this algorithm. The only requirement is that it has a definable fitness function, in the case of optimization this is only a comparison among individuals for which is higher or lower.

B. Deme Sizing/or GPU Architecture

The methodology for assigning deme size and number for GPUs is very straightforward and based on the GPU architecture. Given the number of available blocks and the number of cores in each block the deme number is based to utilized these fully [28, 29]. This is prominent in the later literature on applications of genetic algorithms. Most application set the deme number to the number of cores available with each core designated to each genome. The deme size in the perfonnance model is based on the size of the GPU memory. The goal is to limit the use of global memory and size the genomes and demes so that all memory calls only travel to the local cache in each compute block [30]. This increased efficiency of memory usage but has the side effect of using all the compute blocks for each application. There could be examples, especially in mobile applications, where one would not want to continuously use the entire GPU to save power.

=

C. Design Experiments to Answer Research Questions

To answer the first research question, the experiment will be designed to collect perfonnance data from the entire problem space from the upper to the lower bound. This data will represent a surface that can then be analyzed and described by an equation. The equation will allow for derivation of the surface to determine the slope, if any, and how it trends compared to increasing deme size and deme number. To answer the second research question, the experiment will also be designed to collect perfonnance data fonn the entire problem space from the upper and the lower bound. Similar to the convergence data this data will be analyzed and described via equation. The equation can then be derived to provide a slope of the surface. This slope will intern tell how the performance reacts relative to the deme size and deme number. Finally, to answer the third research question, and thus prove our hypothesis, the two data sets will be compared to show they are indirectly proportional. The proportionality of the two data sets should provide an intersection of a form able to be equated. This intersection will be the maximum optimization of convergence and performance available. The data sets are expected to be two planes, thus providing an intersection that is relatively linear. This linearity can then be exploited to provide a basis for determining deme size from deme number or vice versa.

B. Population and Sample

Figure I.

Graphical representation of the Besel test function.

Since genetic algorithms lend themselves to almost all optimization problems, the population for this research is the complete set of parallel genetic algorithms used to solve any optimization problem. This popUlation is quite untenable to

282

First, and overall time will be taken on each trial. This overall time will begin at the first generation population creation. It was originally planned not to include original population creation, as this is a one-time set up function. Later analysis determined that this is a crucial time interval and valid to the experiment since the number and size of the demes to be used will either lengthen or shorten this initial step. Second, a data point for the overall fitness of a trial population will be taken at each data location. This will allow the comparison and determination of best convergence. We will also track the best average fitness for each trial as well as the best overall time for each trial. These two data points will further prove our hypothesis by providing a confidence factor of 95% for the data sets as well as present unexpected results that can lead to further research such as population trimming, migration control and topology.

research completely. Due to the delimitation of this research, though, the limited nature of the scope allows the assumption that deme size will affect all genetic algorithms the same. In order to conduct a valid study of the problem space a sample population needed to be created. The sample population is the genetic algorithm developed to solve the particular optimization problem. In this case the Bessel function described above. This is a typical type problem a GA would be set against and generalizes the method across all genetic algorithms.

1)

Sampling procedure

The samples will be generated via the experimental program. To determine the sample size an estimate equation will be used from Triola [31]. Equation (9) shows how the sample size is to be calculated. n

=

[Za�2ar

(9)

C. Data Collection

Data was collected in several vectors during program runtime. These vectors were identified by generation and saved throughout the entire experimental process. Final values were sent to XML files on the hard drive for data analysis. The complete number, form and identification of the vectors were determined during experimentation programming. It has been determined that vector population and XML file generation will not be included in the overall time since they are administrative functions of data collection and not genetic algorithm functioning. Also, due to the large amount of data points being collected, over 17.2 million, the individual trials at each data location were averaged during runtime and a single value exported into the results matrix.

where z is the confidence factor, sigma is the population standard deviation and E is the desired margin of error. This function was used in order to determine the number of trials needed to provide a 95% confidence factor. In this case the number of samples is not dependent on the population size but on the confidence factor. For this study I will use a confident factor of 95% which equates to a z=1.96. E is 0.10 because I want the sample mean to be within 0.50 of the population mean. Since the range of convergence is approximately 25 (most convergences from previous studies were within 25 generations of each other), according to Triola [31] we can estimate the population standard deviation to 6.25. This results in a sample size of 600.25 trials. This experiment will round down to 600 samples since conducting 0.25 percent of a trial is not useful. This results in 600 trials for each of the three methods. Considering that each method could take as long as 5 minutes, each experimental run should take no more than two days compute time which is reasonable for the system hardware and the scope of this research. The sample size allows me to claim 95% confidence in the accuracy of the sample to its population. This is important and adds validity to the study. Larger sample sizes would waste time and money without much more granularity in detail while smaller samples could result in a sample that does not correctly reflect the population. For this study a sample element will be considered the entire genetic algorithm process to converge to within 10% of the solution. This means that there will be variance in the number of generation for each sample element. This variance is precisely what the study is designed to measure and this sampling procedure works well with the designed goals and required outcomes of the study. This then bounds the number of trials based on a proven population sampling methodology and an objective sample size equation. 2)

D. Experimental Structure and Boundary Conditions

The experiment, as noted above, set the genetic algorithm against the Bessel test function. Each set of independent variables received 600 trials. The trials were averaged during runtime to reduce and condition the data prior to export. The data was passed to MatLab for analysis. Prior to running the experiment, boundary conditions were set utilizing the Cantu-Paz and GPU methods for determining deme size and number. Solving the Cantu-Paz method for an 85% probability of quality building blocks the lower bound of the experiment was 356 demes with each deme containing 583 members. According to Cantu-Paz this number allows for best convergence of the genetic algorithm. To set the upper bound, hardware specification of the GPU card was assessed. First the maximum number of compute blocks was determined setting the number of demes, for the upper bound, at 512. Then the local memory was evaluated. For a 16-bit genome the local memory can hold up to 768 genomes before having to source to global memory. The need to reduce global memory calls is a major contributor to fast performance. Thus the upper bounds are 512 demes with each deme having 768 members. Given these bounds, the experiment ran trials at each deme number and deme size; a 186x157 matrix for performance and convergence was generated for analysis.

Instrumentations

Instrumentation for this study is embedded in the experimental programming code. There are several points of interest that need to be monitored in order to provide the adequate data for the study.

283

The data sets were limited to two standard deviations to assist in smoothing the data and maintaining well-formed surfaces. This was necessary do to the fine granularity of the data and the expected planar structure. Presentation of the data will be through several figures. First each data set, convergence and performance, will be presented and discussed. Noted items will be pointed out and anomalies will be explained. Following this discussion, the data sets will be combined to show an intersection and solution space on which to formulate a function that can be optimized. In order to perfonn this a surface was fit to each data set. This surface is the basis of the optimization function and key to its discovery since the discrete data points do not intersect. The surfaces allow for the extrapolation of the data point to intersection. The surface fitting was conducted using MatLab's cfiool and best evaluates the surface to be equidistant to the data points. Finally, using topological methods the intersection is transformed into a function that fully describes the space and allows for extrapolation beyond the current boundaries. This helps to generalize the theorem and lessens the limitations noted above. All efforts were made to ensure a continuous function. The continuity of the function allows for the fust derivative to be taken and an optimal identified at the area of minimal slope, ideally this would be zero. VII.

concentrates member fitness in a smaller area, thus more members in each deme have quality building blocks. B. Performance Data

In a similar fashion to the convergence data, the perfonnance data was collected, smoothed and exported for analysis. The performance data shows an indirectly proportional relationship with the convergence data. This relationship is beneficial as it provides the opportunity for intersection of the two data sets.

01

01-

Deme size Figure 2 .

Graphical plot of convergence data.

The data was graphed in three dimensions to acquire a surface plot. Figure (3) shows the graphed data set. The perfonnance data shows all the same characteristics as the convergence data. The perfonnance of the genetic algorithm shows an increase in perfonnance as the deme size and number move toward the full capacity of the GPu. This is due to the ability of the GPU to pass large amounts of data and computation across the compute blocks. A key component of this is the setting of the deme size to the local memory cache thus limiting global memory calls. This linear increase in perfonnance is confumed in the work of Posphichal [33, 34] and Wong [30]. Both groups of researchers showed that perfonnance increases as you approach the upper limits of the GPU, thus validating the collected data for performance for this study.

EXPERIMENT RESULTS ANALYSIS

Our experiment results have shown that there is a common optimization for deme size and deme number that provide sufficient convergence and perfonnance. In this section we will evaluate and present a method for taking this new information and combine it with the remaining GA parameters to provide a unified, continuous algorithm for determining the optimized parameters for GAs. A.

Demenumber ..

Convergence Data

For the convergence, each deme size and number was evaluated to find the best fitness after lOO generations. The data was plotted in Figure 2. This plot shows a gradual decrease in convergence as the deme size and number increase. This was expected due to the results of Cantu-Paz [32] and confumed in the work of Hu et al [8]. This decrease, though gradual was significant in that at the upper bound, less than half of the demes held quality building Compared to the building block quality at the lower, Cantu-Paz, bound, almost 93% of blocks demes contained members with quality building blocks that could lead to solution. The primary mechanism for this disparity is that fact that with a larger population the number of members goes up exponentially. Since each member is randomly set at the beginning of the trial, the solution space is directly proportional to the number of demes and the size of the demes. This increase in deme size and number slows the evolution as more members and demes must be evaluated. It should be noted that the best convergence was with low deme size and high deme number. This is due to the fact that more solution space is being evaluated at one time but the number of demes remains low making the growth linear not exponential. This low deme size, high deme number

..

Performance

Demenumber

Figure 3 .

Graphical representation o f performance data.

C. Data Intersection: Optimization of the Convergence and

Performance

284

The two data sets show well-fonned data that concurs with expectations of this study and results of previous research on this subject. To prove the hypothesis, an intersection of the two data sets must be found. This intersection describes the point, or points were convergences and performances are optimized in relation to each other. To demonstrate this, the two data sets were plotted against each other. Figure (4) shows the plot of performance and convergence. It is clear to see an intersection of the two data sets, supporting the hypothesis. In order to find the intersection, the data was converted into continuous planes. Appendix 1 and 2 provide the derivation for each plane. The two planes were confinned to be best-fit functions of the data in MatLab. The equation for the convergence solution space thus becomes:

Figure 4 .

Z

=

-14.4x+75.5y-8628 .3 28860

(10)

The equation for the perfonnance solution space thus becomes: Z

75.9 83.6x-46y-149 -= ------''-28860

(11)

These two planes were again plotted against each other to reveal a continuous intersection. Figure 4 shows the plot of convergence surface and performance surface. The solution for the optimization of both convergence and perfonnance is then described as:

f (Xdn' Xds ) _

83.6xdn-46xds-14975.9 -14.4xdn+75.5xds-8628.3 U 28860 28860 (12)

Graphical representation of intersection of convergence and perfonnance data.

Figure 5.

2D intersection of convergence and performance.

x_ S y=_ - 1.7 1.76

The intersection in Eq. (13) is represented by a straight line through the solution space that incorporates the values of deme number and deme size. This line will provide the optimal deme size given the deme number and vice versa. The equation for the line of intersection is below:

(13)

Figure 5 shows an extrapolated plot of this function for deme sized from zero to 200. The overarching goal of this research though is to provide a continuous equation that will

285

Similar equations were generated for the remammg variables. It should be noted domain constants are number consecutively throughout the set of equations to make combining easier.

describe the solution space for optimization of deme size and deme number. The continuous equation was solved and presented below:

1 768,Xds=512 f ( xdn' Xds) - 1.76 Xdn - Xds - 517IXdn= . xdn=583,Xds=356

(14)

(17)

This space represents the optimization space for the convergence and perfonnance of the genetic algorithm. Appendix E presents a full derivation of the solution intersection. Any point on the line will provide a good solution and performance. This is important in later sections when this line is combined with the solution space of the other variables for future work. In order to combine this solution with other variable, it is extrapolated over all deme size and deme number making it general as follows:

(18) The next step is to combine the equations into a singular equation. This was considered first to add the equations. Due to the multiple dimensions of the equations, multiplying the separate equations together will provide a better solution space. Also, for determining the constants for each case, a vector of the experimental variables was provided, for instance deme size for the combined equation was evaluated at 583-756. Given this information, a continuous equation can be developed from each of the variable equations to provide a solution space that can be optimized.

(15) In this case C1 and C2 are domain dependent constants. This general form can now be used for other hardware with more or less compute blocks as well as functions that require less demes or smaller demes in accordance with the Cantu­ Paz algorithm. As our hypothesis was validated by the production of a descriptive equation linking deme size and deme number with performance and convergence. It was further supported by the ability to derive a continuous equation that optimized both convergence and performance. When considering the limitation of this study though, the remaining independent variables play a major role in the convergence and performance of the GA. Though the purpose of this study was to focus on deme size and number, the next several paragraphs present a completion of the overall goal of a unified method for parameterization of GAs. With the goal of a unified theory in mind, the outcome of a continued line of research should provide a continuous function in the form of f (Xdn,Xds,Xmr>Xms,Xmi,Xt,Xc), which represents the full set of parameters for a GA. In order to provide the methodology for generating this equation, evaluation of previous studies on the remaining variables was conducted to generate example equations. In the case were there was no solid data for the variable, sample data was generated. The first variables to look at involve migration. Migration encompasses three parameters, rate, interval and selection. The rate is how many individuals migrate at each migration event. The interval is how often a migration event occurs and the selection is which individuals migrate. The selection is of individuals is trivial in most cases as the best genetic material should be sent to other demes. This leaves the rate and interval. Tanese (1989) [35] performed a tedious study of the migration in GAs. Though only focuses on performance it is a solid basis to represent migration infonnation. Perfonning a similar analysis on the data as that conducted for deme size and number, a solution space representation was developed:

Figure 6. Graphical representation of unified solution space from simulated data. Unit less axis resulting from the combination of variables .

This equation can then be used to select the best set of parameters that will provide for convergence and performance of the GA. Figure 6 presents a graphical representation of the solution space given the above parameters and constants from experimentation data. As apparent in Figure 6, the solution space is continuous and has many local minimums and maximums. This surface though provides a solution to each of the variables. For purposes of this demonstration the variables were assigned a range of 1-200 in order to represent it in 3D space. Due to the multidimensionality of the equation though, true graphical representation would not normally be tenable. This multidimensionality is not an issue as it is commonly found in many fields of applied mathematics such as finance and physics.

286

VIII.

CONCLUSION AND FUTURE WORK

This study used an experimental methodo to evaluate the convergence and performance of a genetic algorithm based on varying deme size and deme number. It has shown that deme size and deme number provide convergence and performance results that are indirectly proportional to each other. This proportionality was then used to find an intersection that optimized both parameters to the highest extent. This data was then taken and used to derive an equation for choosing one parameter given the other. Figure 5 demonstrates this ability, for instance if you are given the deme number you can easily pick the deme size for best convergence and performance. Vice versa you can solve the equation for deme size and then pick the best deme number to optimize convergence and performance. The [mal analysis of the data was taken to provide a continuous and general equation for optimizing both variables simultaneously. This equation provides a solution space that can be searched for a best combination of deme size and deme number. This continuous equation leads then into the possibility of a unified equation for all parameters. The [mal findings presented a methodology to determine this unified equation and the remaining future work that needs to be accomplished to see it to fruition. Fundamental to the future of this research and the fruition of a unified theory, each of the remaining variables must be studied for their optimization with regards to performance and convergence. For instance, the example data used in the demonstration of a methodology for creating a unified theory was drawn from studies that mainly focused on performance. A similar study to the one performed in this research must be conducted on each of the remaining variable to provide validated solution space equations for combination. Once this is complete, research into generalizing the unified theory needs to be conducted. Experimenting on several different GPU architectures can carry this out. There also needs to be experimentation on several more test functions to generalize the theory over a wide range of problem sets. This should incorporate continuous and non­ continuous functions to demonstrate the efficacy of the unified theory in varying solution spaces.

E. Cantu-Paz, "Migration policies, selection pressure, and parallel evolutionary algorithms", Journal of heuristics, 7(4), pp. 3 1 1 -334, 200 l .

[2]

E . Cantu-Paz, "Designing efficient and accurate parallel genetic algorithms", 1 999.

[3]

L. Montes, "Adam Smith in context: A critical reassessment of some central components of his thought", Palgrave Macmillan, 2004.

[4]

[6]

Goldberg, D. ( 1 989). Genetic Algorithms in optimization, search and machine learning. Addison Wesley, New York, 1989.

[7]

Q. Yu, C . Chen, & Z. Pan, "Parallel genetic algorithms on programmable graphics hardware", In Advances in Natural Computation, pp. 1 05 1 - 1 059, Springer Berlin Heidelberg, 2005 .

G. P. Wagner, & L. Altenberg, "Perspective: Complex adaptations and the evolution of evolvability", Evolution, pp. 967-976, 1 996.

[ 1 2] M. Wineberg, and F . Oppacher, "Enhancing the GA's Ability to Cope with Dynamic Environments", In GECCO (pp. 3 - 1 0), 2000. [13] D. E. Goldberg, and J. H. Holland, "Genetic algorithms and machine learning", Machine learning, 3(2), 1 988, pp.95-99. [ 1 4] M. Grosso, "The final choice: Playing the survival game", Stillpoint Publishing, 1 9 8 5 . [ 1 5] 1. Kennedy, and W. M. Spears, "Matching algorithms t o problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator", In Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1 998 IEEE International Conference on (pp. 78-83). IEEE. [ 1 6] T. Ohta, "Population size and rate of evolution", Journal of Molecular Evolution, 1 (4), 1 972, pp. 3 05-3 1 4 . [ 1 7] S . Wright, "Evolution and the genetics of populations", volume 3 : experimental results and evolutionary deductions (Vol. 3). University of Chicago press, 1 984. [ 1 8] M. E. Soule, and B . A. Wilcox, "Conservation biology. An evolutionary-ecological perspective ", Sinauer Associates, Inc. , 1980. [ 1 9] F . G. Lobo, and C. F . Lima, "A review of adaptive population sizing schemes in genetic algorithms", In Proceedings of the 2005 workshops on Genetic and evolutionary computation (pp . 228-234). ACM, 2005 [20] D. E. Goldberg, K. Deb, and 1. H. Clark, "Accounting for Noise in the Sizing of Populations", In FOGA (pp . 1 2 7 - 1 40), 1 992. [2 1 ] J. Arabas, Z. Michalewicz, and J. Mulawka, J. "GAVaPS-a genetic algorithm with varying population size" In Evolutionary Computation, 1994. iEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on (pp . 73-78). IEEE, June 1 994. [22] F . Herrera, and M. Lozano, "Gradual distributed real-coded genetic algorithms", Evolutionary Computation, IEEE Transactions on, 4( 1), 2000, pp .43 -63 . [23] C . Fernandes, and A . Rosa, "Self-regulated population size in evolutionary algorithms", In Parallel Problem Solving from Nature­ PPSN IX (pp. 920-929) . Springer Berlin Heidelberg, 2006. [24] R. E. Smith, S . Forrest, S., and A. S . Perelson, "Searching for diverse, cooperative populations with genetic algorithms", Evolutionary computation, 1 (2), 1 2 7 - 1 49, 1 993 . [25] G. R. Harik, F. G. Lobo, and D. E. Goldberg, "The compact genetic algorithm", Evolutionary Computation, iEEE Transactions on, 3(4), 1 999, pp .287-297 . [26] K. O. Stanley, and R. Miikkulainen, "Evolving neural networks through augmenting topologies", Evolutionary computation, 1 0(2), 2002, pp .99 - 1 2 7 . [27] L. Davis, "Handbook of genetic algorithms". Van Nostrand Reinhold. New York, 1 99 1 .

X. Vives, "Nash equilibrium with strategic complementarities" J. H. Holland, "Genetic algorithms and the optimal allocation of trials", SIAM Journal on Computing, 2(2), 8 8 - 1 0 5 , 1973 .

[9]

[ 1 1 ] E . Alba & J . M . Troya ( 1 999). "A survey o f parallel distributed genetic algorithms", Complexity, 4(4), 1 999, pp. 3 1 -52.

[28] M . Oiso, Y . Matsumura, T . Yasuda, and K . Ohkura, "Implementing genetic algorithms to CUDA environment using data parallelization", Tehnicki vjesnik, 18(4), 20 1 1 , pp. 5 1 1 -5 1 7 .

Journal ofMathematical Economics, 1 9(3), 3 0 5 -32 1 , 1 990.

[5]

T. Hu, S . Harding, & W. Banzhaf, "Variable population size and evolution acceleration: a case study with a parallel evolutionary algorithm", Genetic Programming and Evolvable Machines, 1 1 (2), 20 1 0, pp. 205-22 5 .

[ 1 0] H. Muhlenbein, "Evolution in time and space-the parallel genetic algorithm", In Foundations ofgenetic algorithms, 1 99 1 .

REFERENCES [I]

[8]

[29] S . Debattisti, N. Marlat, L . Mussi, and S . Cagnoni, "Implementation of a simple genetic algorithm within the cuda architecture", In The Genetic and Evolutionary Computation Conference, 2009. [30] Wong, M. L., T. T. Wong, and K. L. Fok, "Parallel evolutionary algorithms on graphics processing unit", In Evolutionary

287

[34] P. Pospichal, J. Jaros, and J. Schwarz, J. "Parallel genetic algorithm on the cud a architecture", In Applications of Evolutionary Computation (pp . 442-4 5 1 ) . Springer Berlin Heidelberg, 2 0 1 0 .

Computation, 2005. The 2005 iEEE Congress on (Vol. 3, pp. 22862293) . IEEE, 2005 .

[3 1 ] M. F. Triola, M. F . , Goodman, W. M., LaB ute, G., Law, R., & MacKay, L. (2006) . Elementary statistics. Pearson/Addison-Wesley, 2006.

[3 5] R. Tanese, "Distributed genetic algorithms", I n Proceedings of the third international conference on Genetic algorithms (pp . 434-439). Morgan Kaufinann Publishers Inc. , 1989.

[32] E. Cantu-Paz, "Efficient and accurate parallel genetic algorithms (Vol. 1). Springer, 2000. [33] P. Pospichal, and J. Jaros, "Gpu-based acceleration of the genetic algorithm", GECCO competition, 2009.

288