Optimizing an Arbitrary Function is Hard for the Genetic Algorithm

12 downloads 0 Views 176KB Size Report
Abstract. The Genetic Algorithm (GA) is generally portrayed as a search pro- cedure which can optimize pseudo-boolean functions based on a limited sample of ...
Optimizing an Arbitrary Function is Hard for the Genetic Algorithm  William E. Hart

Richard K. Belew

[email protected]

[email protected]

Cognitive Computer Science Research Group Computer Science and Engineering (C-014) University of California { San Diego La Jolla, CA 92093

Abstract

The Genetic Algorithm (GA) is generally portrayed as a search procedure which can optimize pseudo-boolean functions based on a limited sample of the function's values. There have been many attempts to analyze the computational behavior of the GA. For the most part, these attempts have tacitly assumed that the algorithmic parameters of the GA (e.g. population size, choice of genetic operators, etc.) can be isolated from the characteristics of the class of functions being optimized. In the following, we demonstrate why this assumption is inappropriate. We consider the class, F, of all deterministic pseudo-boolean functions whose values range over the integers. We then consider the Genetic Algorithm as a combinatorial optimization problem over f0; 1g and demonstrate that the computational problem it attempts to solve is NP-hard relative to this class of functions. Using standard performance measures, we also give evidence that the Genetic Algorithm will not be able to eciently approximate this optimization problem. These results imply that there does not exist a xed set of algorithmic parameters which enable the GA to optimize an arbitrary function in F. We conclude that theoretical and experimental analyses of the GA which do not specify the class of functions being optimized can make few claims regarding the eciency of the genetic algorithm for an arbitrary tness function. When analyzing the computational complexity of the Genetic Algorithm, classes (or distributions) of functions should be analyzed relative to the algorithmic parameters chosen for the GA. l

 In Proc. Fourth Intl. Conf. on Genetic Algorithms. R. K. Belew, L. B. Booker (ed.). Morgan Kaufmann, San Mateo, CA. 1991. pp. 190-195.

1

1 Introduction

The Genetic Algorithm [11,7] is a method of stochastic optimization which has attracted signi cant attention in recent years. There have been many attempts to analyze the computational behavior of the GA, with Holland's schema theorem [11] central to much of this analysis. Using it, we can justify how and why certain bit patterns (schemata) will be propagated from one generation to the next. This can be used to analyze the e ectiveness of di erent genetic operators (see for example [15]). Related analysis with Walsh functions has also proven very rewarding. Walsh functions can be used to analyze the e ectiveness of genetic operators, as well as analyze the diculty of the function being optimized.[5,6] While these approaches provide some understanding of the class of functions which the GA can eciently optimize, they fall short of providing an analysis of the computational complexity of the GA. Holland's Building-Block Hypothesis suggests that the GA will do well on functions in which low order schemata correctly predict the values of high order schemata. Conversely, the work on deceptivity uses Walsh functions to measure how hard a function might be for the GA. Unfortunately, neither of these analyses have been able to specify exactly what class of functions the GA does eciently optimize. Any discussion of the computational complexity of the GA must be relative to a speci c class of functions. The assumptions that can be made about this class of functions are often critical to establishing interesting complexity bounds. To illustrate the importance of selecting an appropriate class of functions, we analyze the computational complexity of the GA relative to a very broad class of functions. We consider F, the class of all deterministic pseudo-boolean functions, functions f such that f : f0; 1gl ! R. To allow for our complexity analysis, we restrict F to functions assuming integer values. We present a formalism which theoretically describes the optimization problem which the GA is solving for functions in F. We then demonstrate that unless NP=RP, there is no version of the GA which allows it to eciently optimize an arbitrary function from this class. At this point, one could argue that while the GA is not able to solve the problem exactly, it might o er a reasonable approximation which performs acceptably well. However, we consider several standard performance guarantees and show that unless NP=RP, there can exist no version of the GA which can satisfy any of them. We conclude by noting that the GA must be analyzed relative to much smaller classes of functions. Before proceeding, we introduce some notation and de nitions. We let B = f0; 1g and let B l refer to binary strings of length l. We assume that the reader is familiar with formal language theory; we generally follow the notational conventions of [12]. Particularly notice that the expressions surrounded by h i's refer to encodings used when de ning formal languages; for an integer k, hki would refer to its encoding. We remind the reader that P refers to the class 2

of formal languages which can be recognized by a deterministic Turing machine (TM) in polynomial time. Additionally, both NP and RP refer to the classes of formal languages which can be recognized by nondeterministic Turing machines in polynomial time. The distinction between the two is that for languages in NP there must exist at least one path of computation (sequence of machine states) which leads to an acceptance of the language, whereas for languages in RP at least half of all computation paths must lead to accepting states. It is known that P  NP , RP  NP and P \ RP 6= , and it is widely believed that the two inclusions are proper.1 We consider an algorithm to be ecient if it completes its computation in polynomial time. In other words, a TM M is ecient if the language it accepts is in P . A formal de nition of the Genetic Algorithm is not necessary for the following presentation. We refer the interested reader to [7] for an excellent presentation of the GA and its applications.

2 Statement of the Problem

To analyze the computational complexity of the Genetic Algorithm, we need to formalize the task which it accomplishes. Intuitively, the Genetic Algorithm takes a xed tness function (usually chosen by the experimenter) and searches for a population which is most t with respect to this function. There are three important elements to this rough description:  the de nition of the class of functions from which the tness function might be chosen  the de nition of search space of populations which is being searched  the de nition of tness for elements (populations) in that search space. We consider each of these separately before formalizing the problem which the Genetic Algorithm attempts to solve. As we noted before, we will consider the class of deterministic pseudoboolean functions which assume integer values, F. We further restrict F to functions which are computable in polynomial time. It would be unreasonable to expect the GA to eciently nd optimal values for functions which couldn't compute those values eciently themselves. There are a number of ways in which we can characterize the GA's search space. Perhaps the most natural of these is the set of populations P , of binary strings of length l, such that jP j = k. This is a reasonable search space since both the population size and string length are typically xed for a GA simulation. If, however, the tness of the population is not dependent on the size of the population itself, the set of binary strings of length l may suce. 1 The reader is referred to [3] for an excellent discussion of the complexity di erences between P and NP , and to [4] for an exposition of probabilistic computation.

3

The meaning of a `most t' population within this search space is somewhat ambiguous. Simulations of the GA are typically halted at the researcher's discretion and not at some well de ned convergence criterion. Often several measures of the population's tness are provided. Perhaps the two most common of these are (1) the value of the maximally t individual in the population and (2) the average value of all individuals in the population. In what follows, we de ne the tness of a population to be the tness of the maximally t individual in that population (relative to a xed tness function). We use a search space consisting of binary strings of length l. This choice of a search space is appropriate since we have chosen a tness criteria which is not dependent on the size of the population. Furthermore, it is worth noting that our choice of a search space does not preclude the generality of our results. An analysis using the rst search space is virtually identical to ours. We choose the alternative search space for ease of presentation alone, since the population size would become an extraneous parameter in our presentation. Having de ned the search space for the Genetic Algorithm as well as the tness criteria it uses for a population, we can now formalize the problem which the Genetic Algorithm attempts to solve as a combinatorial optimization problem, DGA-MAX (following the format of [13]).

4

De nition 1 DGA-MAX

The Genetic Algorithm combinatorial maximization problem which (1) uses a deterministic tness function f and (2) assigns the tness of the maximally t individual in a population to the tness of the population itself. An instance of DGA-MAX consists of the following two parts: 1) an integer l de ning the combinatorial space B l 2) an encoding of a TM Mf , which de nes a function f : Bl ! Z

3 Complexity Results

In order to determine the complexity of DGA-MAX , we need to de ne a version of this problem as a formal language (using the format of [3]). De nition 2 DGA-MAX INSTANCE: a string encoding integers l, and , and a Turing machine Mf which computes a function f : B l ! Z in polynomial time. QUESTION: Does there exist an x 2 B l s.t. f (x) > ? Note that the optimization version of DGA-MAX is more powerful than the formal language version of DGA-MAX . Given a TM which solves the optimization version, we can clearly solve the formal language version. However, it is unknown whether the opposite is true (see [13] for further details). Thus, the optimization version is at least as dicult as the formal language version of DGA-MAX . We now demonstrate that the formal language version of DGA-MAX is very dicult to solve. To do this, we use the following de nition (recall that SAT is NP-complete). De nition 3 SATISFIABILITY (SAT)

INSTANCE: Set U of variables, clauses C of clauses over U QUESTION: Is there a satisfying truth assignment? Proposition 1 DGA-MAX is NP-complete. Proof: First observe that DGA-MAX is in NP. We can construct a NTM

M which does the following on input w = hk; ; Mf i: M randomly guesses a string x, evaluates it with the function f , and compares the result to . Since f is computed polynomially in l, M is in polynomial time in jwj. Next we reduce SAT to DGA-MAX . We use a TM M 0 which outputs the encoding hl; 0; Mf i on input w. Here, l is the number of variables in U . On input w0 2 B l , the Turing machine Mf assigns the value of the ith bit of w0 to the ith variable in U . It then evaluates the boolean expression  constructed from a conjunction of the clauses in C . This reduction only requires memory to store l. Clearly this requires no more than O(log jwj), so M 0 is a log- space reduction. 5

Now if M 0(w) 2 DGA-MAX , then there exists w0 s.t. f (w0 ) = 1. But this implies that w represents a set of satisfying assignments for . Thus, w 2 SAT . If w 2 SAT then there is a set of assignments to the l variables which satisfy . Thus there exists a w0 in B l s.t. f (w0 ) = 1. But if this is true, then M 0(w) 2 DGA-MAX since B l has an element (w0) whose value is greater than zero (). We conclude that w 2 SAT () M 0(w) 2 DGA-MAX . If P 6= NP , as is widely suspected, this result implies that there does not exist an ecient Turing machine which recognizes DGA-MAX . Corollary 1 The optimization version of DGA-MAX is NP-hard. Proof: As we noted above, the optimization version of DGA-MAX is at least as dicult as the formal language version. Since the formal language version is NP-complete, the optimization version must be NP-hard. This last result indicates that there probably does not exist an ecient algorithm to solve the optimization version of DGA-MAX . However, this result only applies to the deterministic algorithms. Since the Genetic Algorithm is nondeterministic, it could be the case that its nondeterminism allows it to ef ciently solve either of the versions of DGA-MAX . For example, it is known that there are languages which can be solved more eciently by probabilistic Turing machines than by deterministic Turing machines.[4] The following corollary demonstrates that even though Genetic Algorithms are stochastic, they still require super-polynomial time to solve DGA-MAX unless RP = NP . Corollary 2 If RP 6= NP , then DGA-MAX is not in RP. Proof: This follows immediately since NP-hardness of DGA-MAX implies that DGA- MAX cannot be in in RP unless RP = NP . The remainder of the paper considers only the optimizationversion of DGA-MAX .

4 Performance Guarantees

The previous section demonstrated that it is highly unlikely that there exists an ecient algorithm which solves DGA-MAX , whether it be deterministic or nondeterministic. Given this, it is important to consider what other performance guarantees can or cannot be made for DGA-MAX . It is often the case that even if a problem is NP-complete one can still guarantee certain performance bounds which allow it to be e ectively solved in practice. As an example, consider the Traveling Salesman Problem. Here the goal is to nd a circuit/tour through n cities at a minimal cost. If the costs between the cities satisfy the triangle inequality (e.g. let the cost be the Euclidean distance between the cities) then the problem remains NP-complete. Even so, there exists an algorithm which can produce a tour through the cities which is at most twice the value of the optimal tour.[3] 6

Before continuing, we introduce some notation (from [3]). Let Opt(I) refer to the value of the optimal value for instance I , and let A(I) refer to the value that algorithm A returns for instance I (we assume that A is an ecient algorithm). We are considering a maximization problem, so Opt(I )  A(I ) for all algorithms A, and we assume that A(I )  0 for all algorithms and for all instances. There are a number of performance guarantees de ned in the literature. Among them are the following (from [3,14]):  Relative error : jA(I ) ? Opt(I )j =Opt(I )  Absolute error : jA(I ) ? Opt(I )j  Convergence rates  Absolute and Asymptotic performance ratios In the following, we consider the absolute and asymptotic performance ratios to analyze the diculty of DGA-MAX . We take the following de nitions from [3]. Let the ratio RA (I ) = Opt(I )=A(I ). We de ne  Absolute Performance Ratio RA: RA = inf fr  1 j RA (I )  r; 8I 2 DGA-MAX g

 Asymptotic Performance Ratio R1 A: >0 R1 A = inf fr  1 j 9N 2 Z s.t. 8I 2 DGA-MAX ; Opt(I )  N; RA(I )  rg  Best Achievable Asymptotic Performance Ratio RMIN (DGA-MAX ): RMIN (DGA-MAX ) = inf fr  1 j there exists a polynomial time algorithm A for DGA-MAX with R1 A = rg

R1 A indicates whether we can determine a bound on RA(I ) above some value N , while RA indicates whether we can determine a bound on RA (I ) for values above N = 0. RMIN (DGA-MAX ) is the smallest value of R1 A over all possible algorithms A. It is this last performance ratio that we analyze. In the following, we show that RMIN (DGA-MAX ) = 1 which implies that no deterministic algorithm can provide a performance guarantee on RA (I ) s.t. RA (I ) is less than some xed r. This is true even if we consider only instances which have optima above xed thresholds. 7

Proposition 2 If P 6= NP , then RMIN (DGA-MAX ) = 1. Proof: We assume towards a contradiction that RMIN (DGA-MAX ) < 1. Thus, there exists a polynomial time algorithm A, and a xed K s.t. RA  K .[3] We now show that we can solve SAT . Recall that an instance of SAT consists of U variables together with a set of J clauses, C , over these variables. Consider the TM M which does the following. On input w, M outputs a reduction of SAT to DGA-MAX on a spare input tape. Then M simulates A using this tape as its input tape. The reduction used is very similar to that of Proposition 1. On input v, M 0 (1) Counts the number of variables in U , stores the value (as l), and outputs that value, (2) outputs an encoding for a Turing machine Mf ; on input w0 2 B l , Mf assigns the ith variable the value of the ith bit of w0, counts the number of satis ed boolean clauses in C , and then multiplies this value by K . If the value generated by A is equal to J , M accepts, otherwise it rejects. The reduction M 0 only requires memory to store l. This is clearly no more than O(log jwj) so M 0 is in polynomial time. Since we assume that A is polynomial in time, M is also polynomial in time. Let I = M 0(w). Now, if w is satis able Opt(I ) = J  K , otherwise Opt(I ) < J  K . But A is guaranteed to get RA(I ) = Opt(I )=A(I )  K , so A(I )  J implies that Opt(I )  J  K so w is satis able. Conversely, if w is satis able then Opt(I ) = J  K , so A(I )  J and M accepts. Thus M accepts i w is satis able. Now it is important to note that R1 A is one of the weakest performance guarantees discussed in the literature. Hence, it is signi cant in that we have shown that even R1 A cannot be satis ed by any algorithm. Given this result, we can easily demonstrate that other stronger performance results are not possible. Corollary 3 If P 6= NP , then no polynomial time algorithm A can guarantee that Opt(I ) ? A(I )  ; 8I  0 for a constant  2 R . Proof: We assume towards a contradiction that such an algorithm A exists. Now Opt(I )=A(I )  =A(I ) + 1: Since A(I ) assumes only integer values, RA (I ) = Opt(I )=A(I )   + 1 for A(I ) > 0. Thus R1 A =  +1 which contradicts the fact that RMIN (DGA-MAX ) = 1. We conclude that no such TM A exists.

8

The previous two proofs considered deterministic performance guarantees. They were based on the assumption that P 6= NP and are not directly applicable to an analysis of the Genetic Algorithm. Fortunately, the proofs for the probabilistic case are virtually identical to those above. The only di erence we need to make in the de nitions of our performance guarantees is that A(I ) is the value generated by A with probability greater than 1=2. The only di erence in the proofs of the following lie in the justi cations that the reductions created are in RP . Proposition 3 If RP 6= NP , then

RMIN (DGA-MAX ) = 1.

Corollary 4 If RP 6= NP , then no probabilistic polynomial time algorithm A

can guarantee that

for a constant  2 R0.

Opt(I ) ? A(I )  ; 8I

5 Discussion

To analyze the computational complexity of the Genetic Algorithm, we have had to make a number of assumptions. While we have carefully cast the Genetic Algorithm as both an optimization problem and formal language, to do so required that we restrict our analysis to the class of pseudo-boolean functions with ranges over the integers. This is clearly undesirable, though for the moment it will have to suce. Only recently have models been proposed in automata theory which can appropriately consider computation over real numbers (see for example [1]). Another important assumption is that we consider only deterministic functions. There is no particular reason our analysis should exclude the probabilistic case, though we expect it would require di erent proof techniques from those presented here. The key to the hardness of DGA-MAX is that we have allowed the Turing machine Mf to be chosen from a very large class of Turing machines. Because we choose Mf from such a broad class, we can make few assumptions regarding the nature of f . Thus, it is very dicult to eciently optimize an arbitrary function from this class. In some sense, our analysis of DGA-MAX provides a worst case scenario for a complexity analysis of the GA. In fact, it would be surprising to nd that the GA, or any other algorithm, could eciently optimize over such a broad class of functions. The conclusion we should draw here is that our complexity analyses of the GA must incorporate a class of functions for which the complexity is analyzed. As we noted earlier, this element is missing from current computational analyses of the Genetic Algorithm. 9

Finally, we comment on the performance guarantees we have considered. Each of the performance guarantees analyzed above is a `worst case' guarantee. Each of these guarantees only bounds the performance of the worst instance of the class. Other performance measures which consider `average case' performance may prove much more interesting. Empirically, the Genetic Algorithm seems to perform well on a broad range of functions. Thus we expect that average case analyses will prove a valuable technique for the GA.

6 Conclusion

We have considered the Genetic Algorithm as a combinatorial optimization problem and have demonstrated that the computational problem it attempts to solve is NP-hard relative to a very broad class of functions, F. Additionally, we have shown that a number of important performance guarantees cannot be be satis ed relative to this class. In light of these negative results, it should not be surprising that there are classes of deceptive functions[5,6] in F for which the Genetic Algorithm is a poor optimizer. In fact there will be subclasses of F which are deceptive relative to any set of algorithmic parameters, simply because the underlying combinatorial space is so dicult to optimize over. For example, while genetic operators like bitwise reordering may help the GA to optimize functions which it would otherwise nd deceptive [8], there will necessarily exist other functions which this modi ed GA will not be able to eciently optimize. These results suggest that future analyses of the Genetic Algorithm must pay close attention to relationship between the the algorithmic parameters of the GA and the function space from which the tness function is selected. Since any xed set of algorithmic parameters cannot enable the GA to eciently optimize an arbitrary function in a broad class like the one we have considered, we must consider smaller classes or distributions of functions for which those parameters are most appropriate. Alternatively, if we have a function to optimize, we must carefully select our algorithmic parameters if we wish to optimize the function eciently with the Genetic Algorithm. We conclude by noting that the literature on pseudo-boolean optimization has already examined the computational complexity of a large number of pseudo-boolean function classes.[10,2,9] This literature should prove a valuable resource to GA researchers since the computational complexity of many of these function classes has already been established.

Acknowledgements

This work was supported in part through INCOR Grant #486209-26961 through Los Alamos National Laboratory and the University of California. We would like to thank Christos Papadimitriou and Brian Bartell for their helpful discussions 10

of these ideas. We would also like to thank our reviewers for their comments, particularly those pointing out the similarity between pseudo-boolean optimization and our presentation of DGA-MAX.

References

[1] Lenore Blum. Lectures on a theory of computation and complexity over the reals (or an arbitrary ring). In Erica Jen, editor, 1989 Lectures in Complex Systems, pages 1{48. Addison-Wesley, 1990. [2] Yves Crama. Recognition problems for special classes of polynomials in 0-1 variables. Mathematical Programming, 44:139{155, 1989. [3] Michael R. Garey and David S. Johnson. Computers and Intractability - A guide to the theory of NP-completeness. W.H. Freeman and Co., 1979. [4] John Gill. Computational complexity of probabilistic turing machines. SIAM Journal of Computation, 6(4):675{695, 1977. [5] D.E. Goldberg. Genetic algorithms and walsh functions: Part I, a gentle introduction. Complex Systems, 3:129{152, 1989. [6] D.E. Goldberg. Genetic algorithms and walsh functions: Part II, deception and its analysis. Complex Systems, 3:153{171, 1989. [7] D.E Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Publishing Co., Inc., 1989. [8] D.E. Goldberg and C.L. Bridges. An analysis of a reordering operator on a GA-hard problem. Biological Cybernetics, 62:397{405, 1990. [9] P.L. Hammer, P. Hansen, and B. Simeone. Roof duality, complementation and persistency in quadratic 0-1 optimization. Mathematical Programming, 28:121{155, 1984. [10] Pierre Hansen and Bruno Simeone. Unimodular functions. Discrete Applied Mathematics, 14:269{281, 1986. [11] J.H. Holland. Adaptation in Natural and Arti cial Systems. The University of Michigan Press, 1976. [12] John E. Hopcroft and Je rey D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Pub. Co., 1979. [13] Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization - Algorithms and Complexity. Prentice Hall, Inc., 1982. 11

[14] Alex Rinnooy Kan. Probabilistic analysis of algorithms. Annals of Discrete Mathematics, 31:365{384, 1987. [15] Gilbert Syswerda. Uniform crossover in genetic algorithms. In Proceedings of the Third International Conference on Genetic Algorithms, pages 2{9, 1989.

12

Suggest Documents