A Search for Counterexamples to Two Conjectures ... - Semantic Scholar

3 downloads 738 Views 141KB Size Report
This work was done while this author was a Master's student at the University of .... Note that M is homogeneous of degree 2, i.e., M( x) = 2M(x) for any scalar .
A Search for Counterexamples to Two Conjectures on the Simple Genetic Algorithm Alden H. Wright

Computer Science Dept. The University of Montana Missoula, MT 59812-1008 [email protected]

Garrett Bidwell

BBN Systems and Technologies Cambridge, MA [email protected]

Abstract We empirically searched for cycling and chaotic behavior in the in nite population Simple Genetic Algorithm. We found examples of period 2 cycling (which we expected) and long period cycling (which we didn't expect). These examples had mutation and crossover distributions which do not correspond to the way that mutation and crossover are normally used in practice. We also searched unsuccessfully for stable polymorphic xed points in the zero mutation case.

1 Introduction

Vose (1990) introduced a rigorous dynamical system model for the binary-representation genetic algorithm with proportional selection, with the simplifying assumption of an in nite population size. This model has been further extended in Vose & Liepins (1991), Vose & Wright (1994), and Vose (1996). If the string length is `, the model is de ned in terms of a di erentiable mapping G from R into itself, where n = 2 . The mapping G describes how a population changes from one generation to the next. Conjecture 1: The iterates of G always converge to a xed point. This conjecture was posed in Vose & Wright (1995b). The conjecture is known to be true when the tness function is linear or very close to linear, as shown in Vose & Wright (1994). Hastings (1981) gives an empirically derived counterexample for the diploid discrete-time model of biological population genetics, and Akin (1982) proves the existence of cycling for the continuous-time diploid model. n

 This

`

work was done while this author was a Master's student at the University of Montana

The conjecture is obviously false for general mutation. To illustrate, consider a 1-bit model with a mutation rate of 1. Then on each generation, each 0 individual will be mutated to a 1 individual and vice versa. Thus, under any nonzero tnesses, G will map the vector population [1; 0] to [0; 1] and vice versa, giving a stable cycle of period 2. We empirically show the existence of more complex cycling behavior. Conjecture 2: If mutation is zero, the only asymptotically stable xed points of G correspond to populations containing a single type of individual. This conjecture was also stated as Conjecture 4.4 in Vose and Wright (1995b). A xed point x of G is asymptotically stable if if there is a neighborhood V of x such that lim !1 G (y) = x for all y 2 V . If every eigenvalue of the di erential matrix dG has modulus less than one, then x is asymptotically stable, see Guckenheimer (1983). Our empirical search for counterexamples of this conjecture was unsuccessful. There has been extensive work on methods for maintaining diversity in genetic algorithms. The dissertation of Mahfoud (1995) gives an excellent survey of this work. If conjecture 2 is true, this suggests that a \standard" genetic algorithm cannot maintain a stable population that includes individuals near two or more local maxima of the tness function. (While the results of this paper are for in nite populations and no mutation, one should expect that increasing the population size and removing mutation would decrease noise and hence lead to an improved ability to maintain stable populations near two or more local maxima of the tness function.) This gives justi cation for special techniques, such as sharing and niching methods, for maintaining stable diverse populations on multimodal tness landscapes. T

T

t

t

x

2 The Simple Genetic Algorithm

The Simple Genetic Algorithm is a standard genetic algorithm over xed length binary strings that uses proportional selection and various types of crossover and mutation.

2.1 The Vose Model We consider a generalization of the in nite population model of the Simple Genetic Algorithm introduced in Vose (1990), which is the theoretical framework used in Vose (1996). Other references include Vose and Wright (1994) and Vose (1995). The domain is the set of length ` binary strings. Let n = 2 and note that elements of correspond to integers in the range [0; n). They are thereby thought of interchangeably as integers or as bit strings or as column vectors with entries from f0; 1g. Let 1 denote the vector of all ones (which corresponds to the integer n ? 1). Let e denote the kth column of the n  n identity matrix. Let  denote the bitwise exclusive-OR operation, and let denote the bitwise AND operation on . For x 2 , the one's-complement of x is denoted by x. Note that x = 1  x. If expr is an expression that is either true or false, then [expr] = 1 if expr is true and [expr] = 0 otherwise. Let  = [i = j]. The n  n permutation matrix whose i; j th entry is   is denoted by  . Note that ( x) = x  . A population is a real-valued probability vector x indexed over ; the probability (or fraction) of string i in population x is x . The set of all populations is the unit simplex  = fx 2 (R0 ) : 1 x = 1g, where R0 denotes the non-negative reals. The e are vertices of  and correspond to populations consisting entirely of one string type (namely k). `

k

ij

i

k;j

k

k

i

i

k

i

n

T

k

A n  n mixing matrix M encodes mutation and crossover. M is de ned so that M is the probability of obtaining the 0 string by doing mutation and crossover on the parent strings i and j. Thus, x Mx is the probability (or fraction) of the 0 string as a the result of doing crossover and mutation to population x. The formula for the M matrix is given in Vose (1996) and is repeated here for completeness. Considering k 2 as a crossover mask used with parents i; j 2 , the children are (i k)  (j k) and (j k)  (i k). We assume one child is kept (with equal probability). Let  denote the probability that mask k is used. Let  , k 2 , be the probability that a string x is mutated into the string x  k. Under general crossover and mutation, referred to as mixing, M is given by: X  + M =   2 [((i  u) k)  ((j  v) k) = 0] (1) 2

i;j

T

k

k

k

i;j

u

k

v

u;v;k

Note that M is symmetric and nonnegative. We will refer to the vector  2  as a crossover probability distribution, or just as a crossover distribution. The crossover distribution is determined by the type of crossover. For example, one-point crossover with a crossover rate of C corresponds to the crossover distribution 8 if i = 0 < 1?C  = C=(` ? 1) if 9k = 1; 2; : : :` ? 1 such that i = 2 ? 1 :0 otherwise k

i

Similarly, uniform crossover with a crossover rate of C is given by  ? if i = 0  = 1 ??C + C2 C2 if i > 0 `

i

`

We will refer to the vector  2  as a mutation probability distribution, or just as a mutation distribution. The mutation distribution is determined by the type of mutation. For example,

mutation de ned by a per-bit mutation rate R corresponds to the mutation distribution given by T T  = R1 (1 ? R) ?1 i

i

`

i

The mixing function M :  ?!  gives the population after crossover and mutation. (If x corresponds to a nite population, then M(x) is the expected population after crossover and mutation.) M has component functions de ned by: M (x) = x  M x The mixing function can also be written in the form: X X M (x) = e M(x) = x x M = xxM  (2) T

i

i

T i

u;v

2

u

i

v

i

i

i

u;v

u;v

2

u

v

u

i;v

i

It is shown in Vose (1996) that M (x) can be interpreted as the probability of string k in the population that is obtained by applying mutation and crossover to population x. Note that M is homogeneous of degree 2, i.e., M( x) = 2 M(x) for any scalar . k

The formula for the entries of the di erential dM follows by direct computation from equation (2). X (dM ) = @x@ xxM  2

X = ( x +  x )M   X X = xM  + x M  X = 2 x M  x

x i;j

u

j

v

u

i;v

i

u;v

u;j

v

v;j

u

u

i;v

i

u;v

v

j

i;v

i

u

u

u

i;j

i

u

u

i

j;u

i

u

where  = 1 if u = j and  = 0 otherwise. Assuming a tness function f : ?! R+ , proportional selection is the mapping F :  ?!  de ned by F (x) = Fx=1 Fx, where F is the n  n diagonal matrix F =  f(i). The tness function is regarded as a vector through the correspondence f = f(i). The transition from one generation to the next of the in nite population simple genetic algorithm is given by the mapping G = M  F :  ?! : u;j

u;j

T

i;j

i;j

i

The Vose in nite population model described above has a close relationship to earlier population genetics models. For example, Geiringer (1944) described a multilocus model based on recombination (crossover) only. She introduced the concept of a \linkage distribution", which is essentially another name for a probability distribution over crossover masks. Some more recent papers that describe related work in population genetics include Karlin and Liberman (1978), (1979), and (1990). These models are diploid models that do not include mutation, whereas the Vose model is haploid and does include mutation.

2.2 The di erential of G

In order to determine the stability of xed points of G , we need to be able to compute the di erential of G at xed points. The material in this section is derived from Vose (1996). Let P denote the projection F I ? x 11 Fx T

T

Lemma 2.1

dG = 1 1Fx dM x

1T FP

F x=

T

Fx

Proof. Consider the function h(x) = x=1 x. Its di erential is dh = 1 I x ? x (11 x)2 T

T

x

T

T

Since G (x) = M(h(Fx)), the chain rule gives dG = dM ( ) dh F   = dM ( ) 1 IFx ? Fx (1 1Fx)2 F = 1 1Fx dM ( ) FP x

h Fx

Fx

T

h Fx

T

T

h Fx

T

2

3 Constructing Fixed Points

Let y be a xed point of G . Then we have:

  G (y) = M 1FyFy = y Let x = Fy=(1 Fy)2 . Then 1 x = 1 Fy=(1 Fy)2 = 1=(1 Fy) and  Fy   Fy  1 M(x) = M (1 Fy)2 = (1 Fy)2 M 1 Fy T

T

T

T

T

T

T

T

T

y ?1 Fy ?1 (3) = (1 Fy) 2 = F (1 Fy)2 = F x Let X be the n  n diagonal matrix X =  x . and let g be the n vector g = 1=f = 1=f(i). Then this equation can be rewritten as M(x) = F ?1x = Xg Our method for generating random xed points is to start with a random point x in the simplex. (The detailed methodology for choosing x will be given later.) Given x, we can compute g by g = X ?1 M(x). The tnesses are easily computed by f(i) = g?1 . From x, we also compute the corresponding xed point y by y = F ?1x=(1 x)2 T

T

i;j

i;j

i

i

i

i

T

Finally, we note that if a point y of the simplex is to be a xed point of G , then x = Fy=(1 Fy)2 must satisfy equation (3), so the tness f as computed above must be the unique tness that will make y a xed point. T

4 Searching for Cyclic Behavior

4.1 Methodology

Hastings (1981) demonstrated stable cyclic behavior in a discrete-time, constant tness, two-locus, two-allele, diploid, in nite-population model. This model includes crossover but not mutation. His methodology was to do a numerical search for a stable Hopf bifurcation. In other words, he searched over populations and crossover rates for a point where the differential of the transition map had a real eigenvalue between ?1 and 1 and a pair of complex eigenvalues with modulus close to 1. For each population and crossover rate, he generated a corresponding tness. When he found a stable Hopf bifurcation, he parameterized the system over the crossover rate, and then iterated the system to look for stable cycling behavior. He found several examples, some of which are described in detail in Hastings (1981).

Our procedure for looking for a counterexample to conjecture 1 followed a methodology similar to that of Hastings. For each trial, we generated random mutation and crossover distributions, and a random population vector x. We computed the mixing matrix M that corresponds to the mutation and crossover distributions, and followed the procedure given at the end of Section 2 to compute the tness and xed point y corresponding to M and x. We found the eigenvalues of of dG using the QR algorithm as implemented following Chapter 11 of Press (1992). If all real eigenvalues had absolute value less than 1 and some complex eigenvalue had modulus greater than 1, we iterated the system 30 times starting from perturbations of the xed point y. We detected three kinds of outcomes from the iteration: convergence to a xed point, convergence to a cycle of period 2, and non-convergence. In cases of non-convergence, we then looked in more detail for interesting behavior. The intuitive motivation for this procedure is as follows. We are searching for an unstable xed point where the behavior near the xed point is cyclic and repelling in 2 (or possibly more) dimensions and is contracting in the remaining dimensions. The behavior would be cyclic and repelling in the eigenspace of the complex eigenvalues with modulus greater than 1, and contracting in the eigenspace of the remaining eigenvectors. The hope is that trajectories starting near this unstable xed point will converge to a stable periodic cycle. We used two procedures for generating random mutation/crossover distributions and random population vectors. The rst method was to generate the n components of the vector randomly from a uniform distribution over [0; 1], and then normalize the vector so that the sum of the components was 1. For the second method, we rst randomly chose between 1 and n ? 2 components of the vector to be zero, and then chose the remaining components as with the rst method. When we applied this method to choosing the population vector x, the \zero" components were set to 10?8, which avoided over ow problems with nding eigenvalues. y

4.2 Results

We found examples of both approximate period 2 cycling and longer period cycling. All examples of cycling were for genetic algorithms with 3 bits or more, and were generated by using mutation distributions that had many zero components. To guard against the possibility that these examples were the artifacts of nite precision

oating point arithmetic, these examples were run on Maple using 200 decimal-digit precision arithmetic. It should be noted that all quantities involved in the computation are positive, so there should be no possibility of \catastrophic cancelation". To guard against the possibility that there were small divergent components in the results, the examples were run for 100,000 generations. After a convergence period of a few thousand generations, the graphs look the same at the beginning of the simulation as at the end. Of course, computer oating-point computations cannot mathematically prove the existence of cycling. We give two speci c examples. First is a 3-bit example that exhibits both approximate period 2 cycling and longer period cycling. Figure 1 shows the long-period cycling by plotting of the odd numbered generations from generation 1 to generation 1500. The long period length is approximately 800

generations. A plot of the even numbered generations would look essentially the same, except shifted by half of the long period. Figure 2 shows the odd-numbered generations from 98,500 to 100,000. This example was generated using the following parameters: Mutation distribution: [0; 0; 0; 0;0;0:92431012295;0:07568987705;0]. Crossover distribution: [0; 0; 0:00045333862;0:18649248799;0:402363733; 0; 0;0:403053606] Fitness: [1:14095516458; 1:54743922498; 0:21910897310; 0:11026697012; 0:60205233386;1:11500442255;0:43395246731;3:27024330619] Fixed point: [0:20829345545;0:15689801943;0:09281778305;0:03711709950; 0:22026606341;0:19491636472;0:05222667592;0:03746453852] We did not nd any 3-bit examples that did not converge whose rst component of the mutation distribution was not zero. The second example, shown in Figure 2, is a 4-bit example that exhibits long period cycling without period 2 cycling. Figure 3 shows every generation from generation 99,500 to generation 100,000. The period length is approximately 200 generations. This example was generated using the following parameters: Mutation distribution: [0:163082130; 0;0;0; 0:56124617479; 0; 0:27574300391; 0; 0; 0; 0; 0; 0; 0;0;0]. Crossover distribution: [0:07413297401; 0; 0;0;0:06496689448;0:17438274378;0; 0; 0:12760739880;0;0:27383219607;0:28507779287;0;0; 0; 0] Fitness: [0:26931619062; 0:00000011160; 0:82567205904; 0:00000013171; 3:73619740388;5:41096045807;0:00000044376;5:94155773254; 0:00000007687;0:06241539118;0:00000008612;0:00000012359; 3:75146639703;6:96087252897;5:18366848032;0:00000060560] Fixed point: [0:08262271198;0:08960499858;0:06337908163;0:07592443302; 0:02850140995;0:02188019848;0:02253466654;0:01719816329; 0:13009475647;0:14208881473;0:11611842899;0:08091208872; 0:04073704669;0:03601416632;0:03587653907;0:01651249549;] This example has nonzero rst component of the mutation distribution.

5 Stable Fixed points with No Mutation We followed almost the same methodology as in the previous section. The mutation distribution was always [1; 0; 0; :: :; 0]. We generated the crossover distribution, the random population vector x, and the tness in the same way. The xed point generated in this way always had more than one nonzero component, and so was not at a vertex of the simplex. We looked for cases where all eigenvalues had modulus less than or equal to 1, and where all components of the tness vector were not equal. In other words, this was a direct search for counterexamples, rather than the more indirect search of the previous section. We ran 4,000,000 trials with each of 3 bits, 4 bits, and 5 bits. We did not nd any counterexamples to conjecture 2.

We did nd a number of \almost" counterexamples: xed points in the interior of the simplex with real eigenvalues whose absolute values were equal to 1. In these examples, all nonzero components of the xed point vector corresponded to a component of the tness vector of value 1, which was the maximum tness value. These xed points correspond to populations consisting entirely of individuals of equal and maximum tness. These xed points are \nonhyperbolic". In other words, the linear approximation determined by the di erential does not determine the stability. We would guess that there is a space (such as a curve or surface) of such xed points that connects vertices of the simplex corresponding to the strings of maximum tness.

6 Conclusion

We searched for counterexamples to conjectures 1 and 2. We found empirical counterexamples to conjecture 1: namely a number of instances of stable cycling of both period 2 and longer periods for the in nite population simple genetic algorithm with mutation. These examples have mutation distributions that do not correspond to the way that mutation is normally used in genetic algorithms. We did not nd any counterexamples to conjecture 2. Thus, our results give empirical evidence that both conjectures are true for models that correspond to genetic algorithms as they are normally used in practice. A nite population GA that used the same parameters (crossover and mutation distributions and tness function) as the in nite population cycling examples should also exhibit cycling behavior if the population size is suciently large. It is unclear whether this could happen for population sizes as they are currently used in practice. It may well be that the \noise" due to the nite population size would quickly move the system away from the stable cyclic attractor of the in nite population system. This work suggests that a rigorous proof of cycling behavior might be possible.

Acknowledgments This research was partially supported by a Montana's NSF EPSCoR Program grant and a University of Montana grant.

References

Akin, E. (1982) \Cycling in Simple Genetic Systems", J. Math. Biology, 13, 305-324. Geiringer, H. (1944) \On the probability theory of linkage in Mendelian heredity", Annals of Mathematical Statistics, 15, 25-57. Guckenheimer, J., & Holmes, P. (1983). Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, Springer Verlag. Hastings, A. (1981). \Stable cycling in discrete-time genetic models", Proc. Nat. Acad. Sci. USA, 78, 7224-7225. Karlin, S. & Liberman, U. (1978) \Classi cations and comparisons of multilocus recombination distributions", Proc. Nat. Acad. Sci. USA, 75, 6332-6336. Karlin, S. & Liberman, U. (1979) \Central equilibria in multilocus systems I. Generalized nonepistatic selection regimes", Genetics, 91, 777-798. Karlin, S. & Liberman, U. (1990) \Global convergence properties in multilocus viability selection models: the additive model and the Hardy-Weinberg law", Mathematical Biology, 29, 161-176.

Mahfoud, S. (1995). \Niching Methods for Genetic Algorithms", IlliGAL Technical Report No. 95001, Illinois Genetic Algorithms Laboratory, University of Illinois at UrbanaChampaign, Urbana, IL 61801. Nagylaki, T. (1992). Introduction to Theoretical Population Genetics, Springer Verlag. Press, W. H., Teukolsky, S. A, Vetterling, W. T., & Flannery, B. P. (1982). Numerical Recipes in C: The Art of Scienti c Computing, second edition, Cambridge University Press. Vose, M. D. (1990). \Formalizing Genetic Algorithms", Proc. IEEE wksp. on G.A.s, N.N.s, & S.A. applied to problems in Signal & Image Processing, May 1990, Glasgow, U.K. Vose, M. D. (1995). \Modeling Simple Genetic Algorithms", Evolutionary Computation, 3, 453-472. Vose, M. D. (1996). The Simple Genetic Algorithm: foundations and theory. MIT Press, (to appear). Vose, M. D. & Liepins, G. E. (1991). \Punctuated Equilibria In Genetic Search", Complex Systems, 5, 31-44. Vose, M. D. & Wright, A. H. (1994). \Simple Genetic Algorithms with Linear Fitness", Evolutionary Computation, 2(4), 347-368. Wright, A. H. & Vose, M. D. (1995a). \Finiteness of the Fixed Point Set for the Simple Genetic Algorithm", Evolutionary Computation, 3(3). Vose, M. D. & Wright, A. H. (1995b). \Stability of Vertex Fixed Points and Applications", Foundations of Genetic Algorithms 3, edited by L. D. Whitley and M. D. Vose, Morgan Kaufmann Publishers, Inc., San Francisco, CA.

Suggest Documents