Document not found! Please try again

Single Parent Genetic Programming - Semantic Scholar

16 downloads 70 Views 120KB Size Report
Programming, volume 2, pages 111–134. The MIT Press, Cambridge,. MA, 1996. [18] John Dickinson Terence Soule, James A. Foster. Code growth in genetic.
Single Parent Genetic Programming Wendy Ashlock Roseheart Biomaths 41 Lambert Crescent Guelph, Ontario N1G 2R4, Canada [email protected] A BSTRACT The most controversial part of genetic programming is its highly disruptive and potentially innovative subtree crossover operator. The clearest problem with the crossover operator is its potential to induce defensive metaselection for large parse trees, a process usually termed “bloat.” Single parent genetic programming is a form of genetic programming in which bloat is reduced by doing subtree crossover with a fixed population of ancestor trees. Analysis of mean tree size growth demonstrates that this fixed and limited set of crossover partners provides implicit, automatic control on tree size in the evolving population, reducing the need for additionally disruptive trimming of large trees. The choice of ancestor trees can also incorporate expert knowledge into the genetic programming system. The system is tested on four problems: plus-one-recall-store (PORS), odd parity, plus-times-half (PTH) and a bioinformatic model fitting problem (NIPs). The effectiveness of the technique varies with the problem and choice of ancestor set. At the extremes, improvements in time to solution in excess of 4700-fold were observed for the PORS problem, and no significant improvements for the PTH problem were observed. I. I NTRODUCTION This study presents a new technique for use with genetic programming. It was inspired by a science fiction novel [7] about an all-male society in which children had only a father, and the female contribution came from artificial womb technology and egg cultures donated when the society was founded. In single parent genetic programming, children have only one parent, and crossover is done with a fixed set of unchanging ancestors. This technique simultaneously limits parse tree growth in the evolving population (since the ancestors don’t grow) and provides a means of embedding either expert knowledge or the results of previous evolutionary runs into an evolving population. Selection still causes the parse trees to grow, but the single parent technique slows that growth significantly. Since the ancestor set doesn’t change, no information in it is ever lost, which can help keep the algorithm from getting stuck and can speed time to solution. Four test problems are treated in this study: plus-one-recall-store (PORS), odd parity, plus-

Dan Ashlock Mathematics and Statistics University of Guelph Guelph, Ontario N1G 2W1, Canada, [email protected] times-half (PTH), and a bioinformatic model fitting problem (NIPs). A. Subtree crossover and bloat Genetic programming [14], [11], [12] is a type of evolutionary computation that uses a variable-sized representation, typically in the form of a parse tree, which represents a mathematical formula. The binary variation operator most commonly used in genetic programming is subtree crossover. During selection and reproduction, a pair of parent trees are picked by the selection algorithm and copied. A subtree from each parent is selected at random from the set of all its subtrees, and their root nodes are exchanged in the copies. The number of possible outcomes is equal to the product of the nodes in the two trees. The sizes of the new trees vary from one node to the sum of the number of nodes in the starting trees minus one. The upper bound on the size of trees in a population grows exponentially, and in many experiments trees approach that upper bound [18]. In practice, this means that the average size of trees in a population grows sufficiently that total size or depth of trees in a population must be controlled to keep the computer’s memory from being overrun and to ensure that the program terminates in a reasonable amount of time [8]. Any size control measure is, itself, a new source of disruption in reproduction. Bloat [6] is characterized by the growth of parse trees in a population to near whatever bound in placed on their size. Bloated trees typically have a large amount of material that does not contribute directly to their fitness. It is a result of the shape of the underlying search space (more long solutions than short ones) and of the need to shield against destructive crossover [15]. A tree might, for example, have the functionally equivalent form: good solution + 0 * ineffective material. Subtree crossover in the ineffective material does not change the solution’s fitness. Bloat has at least two bad side effects. First it ensures that computer memory and time spent managing it is wasted to an extent close to the maximum possible within the bounds set by the programmer. Second, it is a means whereby the evolving population manages to reduce the power of subtree crossover as a search operator [16] [6]. The ineffective material causing

bloat may retain useful chunks of material that can be restored to function via crossover, and so bloat need not be all bad [17], but on average it hampers. In addition, if the goal of a given exercise in genetic programming is to understand solutions to a problem, then bloat substantially complicates analysis. B. Single parents and gentle size control The presence of bloat indicates selection pressure for larger trees. In order to generate large trees, the subtrees participating in crossover must be large at least some of the time. In single parent genetic programming, every tree placed back in the population is the result of crossover with a tree of unchanging size. Thus, the size increase of the trees is placed under far tighter control than in a standard genetic programming environment. In Section II, we show that the upper bound on tree size changes from exponential to linear in this case. In single parent genetic programming a standard genetic programming environment is modified as follows. A collection of trees called the ancestor set is chosen. This ancestor set may be selected from past evolutionary runs or may be designed by the user. The genetic programming environment proceeds as usual, except that any subtree crossover is between a member of the population and an ancestor. Selection of the population member is as usual, and the ancestor is chosen at random from the ancestor set. Crossover produces only one tree, the population member with one of its subtrees replaced by a subtree of the ancestor. This results in a variation operator which is not quite crossover and not quite mutation; it uses information from two individuals, but only one is a member of the population. Notice that any material available in the ancestors is available indefinitely. This means that the system is constructively unable to lose any operation, terminal, or “building block” present in the ancestor set. Single parent genetic programming is reminiscent of Angeline and Pollack’s module acquisition[1] but is substantially simpler. II. T HE THEORY

OF SINGLE PARENT SIZE CONTROL

Suppose we are using a steady-state evolutionary algorithm to do genetic programming. Pairs of parents are selected, offspring generated, and then those offspring are placed back in the population before the next set of parents are selected. Each such selection and replacement is called a mating event. Theorem 1: In standard genetic programming without any form of tree trimming, the maximum possible size of a tree is an exponential function of the number of mating events. In single parent genetic programming, this bound is linear. Proof: Suppose we start with a population of trees in which the largest tree has n nodes. This tree can produce another tree the same size or larger in one mating event. As a result, we may assume two trees of size n are possible after one mating event. In the next generation, these two trees can produce a tree of size 2n − 1 via subtree crossover where the root of one

tree is exchanged with the leaf of the other. This means that a sequence f (t) of tree sizes can be generated that obeys the recursion f (t + 2) = 2f (t) − 1. Solving this recursion we obtain the formula √ f (t) = A · ( 2)t + 1. Thus, the maximum possible size of a tree after t mating events must be at least as large as this exponential function: √ A · ( 2)t + 1 ≤ maximum tree size. The greatest possible size increase in the size of a tree comes from taking the largest two trees of size n and m ≤ n and crossing the root of one with the leaf of the other. Thus the maximum increase in size must be less than double. This yields an upper bound of g(t) = 2t on maximum tree size: maximum tree size ≤ 2t . Therefore, maximum possible tree size in standard genetic programming is an exponential function of the number of mating events t which is between the functions f (t) and g(t). √ A · ( 2)t + 1 ≤ maximum tree size ≤ 2t . In single parent genetic programming, trees are crossed over with ancestors. As a result the maximum size increase comes from crossing the root of an ancestor into a leaf of the single parent. This size increase is one less than the size z of the largest ancestor. Tree size is thus bounded above by h(t) = n + t(z − 1) where n is the size of the largest tree present in the initial population. maximum tree size ≤ n + t(z − 1). Therefore, the growth of trees in single parent genetic programming is at most linear. 2 III. E XPERIMENTS Single parent genetic programming was compared to standard genetic programming on four problems: the plus-onerecall-store (PORS) problem [3], odd parity [14], plus-timeshalf (PTH), and a bioinformatic model-fitting problem [9]. To keep things simple and to concentrate on the impact of the single parent technique, standard parse trees were used without any other techniques known to be useful, like automatically defined functions (ADFs) [14]. Populations which failed to find solutions after a fixed number of mating events were said to “timeout,” and their results were not included in the averages. This means that results that include many timeouts are underestimates of the actual mean time to solution.

109477

n 14 15 16 17 18 19

82107 Base 15

Base 14

mean baseline 65,000 76,300 62,600 1,420,000 8,758,000 241,000

mean single parent 667 880 3330 1400 1850 4320

baseline timeouts 0 0 0 38 350 1

Base 16

TABLE I

54738

R ATIOS OF MEAN TIME TO AND THE SINGLE PARENT

n ≥ 17 THESE ARE

27369

SOLUTION FOR THE BASELINE

A4 14

A4 15

A4 16

A4 17

A4 18

A4 19

A4 20

A4 21

A4 22

A4 23

A4 24

PORS RUNS

PORS RUNS FOR n = 14, 15, . . . , 19. F OR

UNDERESTIMATES OF THE RATIO BECAUSE SOME

BASELINE RUNS TIMEOUT AT

0

ratio 97.5 86.7 18.8 1010 4730 55.8

10,000,000 MATING EVENTS .

A4 25

Fig. 1. The above shows 95% confidence intervals for time to solution for the baseline PORS problem for n = 14, 15, and 16 as well as the results of using single parent genetic programming with the ancestor set A={(+ (Sto (+ (Sto (+ (Sto (+ 1 1)) Rcl)) Rcl)) Rcl)} for n = 14, 15, . . . 25. Baseline runs are labels “Base N” while the single parent runs are denoted “A4 N”.

10,000,000 mating events. 400 populations were run for each problem case. 7130 A4 25 A4 24

A. Plus-one-recall-store To test the ability of single parent genetic programming to embed expert information in the population, we chose a simple test problem for which solutions are known and for which the relationships between solutions for one value of n to the solutions of other values of n are known. The plus-one-recall-store (PORS) problem is described in detail in [3]. It is a maximum problem with a small operation set and a calculator-style memory. The goal of the test problem, called the PORS efficient node use problem, is to find parse trees with a fixed maximum number of nodes that generate the largest integer result possible. The language has two operations: integer addition + and a store operation Sto that places its argument in an external memory location and returns the value it stores. It has two terminals: the integer 1 and recall Rcl from the external memory. Standard genetic programming experiments were run for n = 14 to 19 nodes. (The hardest baseline case is n = 18; the easiest is n = 16.) Fitness was the value of the parse tree. The initial population was composed of randomly generated trees with exactly n nodes. A tree that evaluated to the largest possible number was considered successful (these numbers are computed in [3]). Crossover was performed by the usual subtree exchange [13]. If this produced a tree with more than n nodes, then a subtree of the root node of the tree iteratively replaced the tree until it had less than n nodes. This size control operation is called chopping; it was chosen to avoid the problem other size control methods have of limiting the effects of crossover to nodes far from the root [10]. Both the baseline and single parent versions used chopping. Mutation was performed by replacing a subtree picked uniformly at random with a new random subtree of the same size for each new tree produced. All experiments used double tournament selection with tournament size 7. Populations timed out at

5348 A4 22

A4 23

A4 19

A4 21

3565 A4 16

A4 20

A4 18

1782 A4 17 A4 15 A4 14

0

Fig. 2. The above shows 95% confidence intervals for time to solution for the PORS problem using single parent genetic programming with the ancestor set A={(+ (Sto (+ (Sto (+ (Sto (+ 1 1)) Rcl)) Rcl)) Rcl)} for n = 14, 15, . . . 25.

In choosing the ancestor set for the single parent version, knowledge about the character of the problem from [3] was used. The solution to the PORS efficient node use problem varies according to the congruence class (mod 3) of the number of nodes permitted. When n is 0 (mod 3), there is a unique solution made up of building blocks that look like (+ (Sto (+ 1 1) Rcl)). For n=1(mod 3) or n=2(mod 3) there are multiple solutions which use the (+ (Sto (+ 1 1) Rcl) building blocks and also building blocks that look like (+ Rcl Rcl). This means that the solution to n=12 contains building blocks needed for all n, and solutions to n=11 contain all the building blocks needed for all solutions for all n. At first it was thought that solutions to n=11 would make the best ancestors, since they had all the building blocks. However, it turned out that the solution to n=12 was an excellent ancestor, because its building blocks are the most common in all the solutions, and crossover and mutation could easily construct the other building blocks. The solution to n = 12 is as follows: (+ (Sto (+ (Sto (+ (Sto (+ 1 1)) Rcl)) Rcl)) Rcl).

(AND (OR v0 v1) (NAND (AND (OR v2 v1) (NAND (AND (OR v0 v2) (NAND (NOR (AND v0 v1) (NOR (NOR (AND v2 v1) (NOR (NOR (AND v0 v2) (NOR

#4 #3 #2

v0)) v2)) v0)) v1)) v1)) v2))

Fig. 4. Ancestor set for 3-parity consisting of modified solutions for 2-parity.

#1 600

v1 v1 v2 v0 v2 v0

baseline

95000

Fig. 3. Confidence intervals for mean time to solution for the single-parent n=14 PORS experiment using four different ancestor sets. Ancestor set #1 contains 4 nonoptimal PORS parse trees with 11 nodes; ancestor set #2 contains 3 optimal PORS parse trees with 11 nodes; ancestor set #3 contains 3 optimal PORS parse trees with 11 nodes and 6 optimal PORS parse trees with 12 nodes; ancestor set #4 contains 1 optimal PORS parse tree with 12 nodes.

Figure 1 shows 95% confidence intervals on the time to solution for the baseline and single parent runs with the above ancestor. Table I compares mean times to solution and shows the number of timeouts. The single parent runs are shown by themselves in Figure 2 so that the compression of vertical scale caused by the baseline runs does not prevent their comparison. The baseline results for n > 16 are not pictured due to problems of scale and the substantial number of timeouts for those cases; none of the single parent experiments had timeouts. Figure 2 shows that the single parent technique using this ancestor changes which problem cases are harder: n=16 becomes harder than n=18. Clearly, the choice of ancestors is very important. Figure 3 shows the results for four different ancestor sets for n=14. It is possible to choose ancestors which degrade the performance of the algorithm. For example, for n=14 an ancestor set with four nonoptimal parse trees with 11 nodes has an average time to solution of 70,015 mating events as compared to 64,967 mating events for the baseline and 666 mating events for the single parent algorithm using the solution to n=12 as the ancestor set.

mean baseline 29292 305710 477197

n 3 4 5

ratio 1.96 2.62 2.26

baseline timeouts 0.5% 48.5% 96.0%

single parent timeouts 0.5% 17.5% 31.0%

TABLE II C OMPARISON OF TIME TO

SOLUTION FOR ODD PARITY FOR

n = 3, 4, 5 N OTE THAT

WITH AND WITHOUT USING THE SINGLE PARENT TECHNIQUE .

THE CHANCE OF FINDING A SOLUTION BEFORE TIMING OUT IS MUCH HIGHER USING THE SINGLE PARENT TECHNIQUE .

variables. The experiments used double tournament selection with tournament size 7. Mutation was performed by replacing a subtree picked uniformly at random with a new random subtree of the same size for each new tree produced. The timeout limit was 1,000,000. 400 populations were run for the single parent experiments and for the baseline experiments for n=3 and n=4; for the baseline for n=5, 200 populations were run. The ancestor set for the single parent version for n was created by making n copies of several short solutions generated for n − 1 and modifying each copy so that it used a different subset of n − 1 variables out of the n variables. The ancestor set for 3-parity appears in Figure 4. In the first experiment, the times to solution were compared for the baseline and single parent versions for n = 3, 4, and 5. The chop operator for the baseline was set so that the solutions would be of similar length to those produced by the single parent experiments (50 for n = 3; 400 for n = 4; 2000 for n = 5). For the single parent version, when crossover resulted

5-parity single parent

B. Odd parity Because PORs trees must use all their nodes to achieve maximum fitness, they have no problem with bloat. The odd parity problem was chosen to test the size control feature of the single parent technique. The odd parity problem is a standard logic function induction problem. It maps a collection of n boolean variables {x0 , x1 , . . . , xn−1 } onto the truth value of the proposition that an odd number of the variables are true. It is a standard test problem for genetic programming [13]. Experiments were performed for standard and single parent genetic programming on the odd parity problem. The operations used were and, or, nand, and nor, and the terminals were the n boolean input

mean single parent 14945 116583 210780

5-parity baseline 4-parity single parent 4-parity baseline 3-parity single parent 3-parity baseline 0

600000

Fig. 5. The above shows 95% confidence intervals for time to solution for single parent and baseline versions of the n-parity problem.

baseline avg.size 41 320 1696

n 3 4 5

baseline 95% CI (40.4,41.6) (312, 328) (1503,1889)

single parent avg. size 42 238 854

single parent 95% CI (39.7,44.3) (221,255) (798,910)

* *

TABLE III C OMPARISON OF THE

AVERAGE SIZE OF SOLUTIONS FOUND BY BASELINE

AND SINGLE PARENT VERSIONS FOR

*

+

n- PARITY. C HOP SET TO 50 FOR N =3;

+

+

+

400 FOR N =4; 2000 FOR N =5.

+

+

+

+

+

+

+

+

hhhhhhhhhhhhhhhh

4-parity sp 4-parity baseline

Fig. 7. An optimal PTH tree of depth 4; it evaluates to 16. Plus is represented by +; times is represented by *; one-half is represented by h. n

3-parity sp

3 4 5 6

3-parity baseline

mean baseline 58 814 2769 7334

mean single parent 49 585 1950 5935

ratio 1.18 1.39 1.42 1.24

TABLE IV

0

330

C OMPARISON OF THE

TIME TO SOLUTION FOUND BY BASELINE AND

SINGLE PARENT VERSIONS OF

Fig. 6. The above shows 95% confidence intervals of parse tree sizes for single parent and baseline versions of 3-parity and 4-parity.

in trees larger than 1000 nodes for n=3 and n=4 and 2000 nodes for n=5, the parents were returned to the population unchanged. The times to solution were better for the single parent version as shown in Table II. However, the difference was not as much as for the PORS problem, probably because the ancestor set did not contain as much information. The chance of finding a solution before timing out was greatly improved, especially for n = 5. For n = 5 the baseline ran very slowly and timed out most of the time resulting in just 8 solutions out of 200 runs. Table III shows the sizes of the solutions found. For the baseline version, changing the chop operator to allow the parse trees to be larger causes the solutions to be larger, and changing the chop operator to keep the parse trees smaller causes the solutions to be smaller. For example, for 3-parity, if the chop operator is set to 400 instead of 50, then the best solutions average size 175 (instead of 41); if it is set to 31, then the best solutions average size 28. There is no similar way to tweak the size for the single parent version. In the baseline version, trees quickly grow and the space of parse trees close to the chop limit is well explored. In the single parent version, the trees grow more slowly, so there is a greater chance of finding solutions of a variety of sizes. For example, for 5-parity, the single-parent version found solutions ranging from 201 nodes to 1995 nodes, while the baseline (chopped at 2000) found solutions ranging from 1101 nodes to 1971 nodes.

PTH FOR TREES OF DEPTH n USING THE

SOLUTION AS THE ANCESTOR .

C. Plus-times-half (PTH) A problem on which the single parent technique did not seem to help much was the plus-times-half (PTH) problem. This problem uses trees with limited depth, and like PORs requires all possible nodes be used to achieve an optimal solution. The object is to find a parse tree of depth n which evaluates to the maximum number possible using the binary arithmetic operations plus and times, and the constant onehalf as a terminal. Our experiments used double tournament selection with tournament size 7. Mutation was performed by replacing a subtree picked uniformly at random with a new random subtree of the same size for each new tree produced. The timeout limit was 1,000,000. 400 populations were run for each problem case. Trees were not allowed to grow beyond depth n. If crossover resulted in a parse tree of depth greater than n, then any subtree at depth n was replaced with the terminal one-half. We tried various ancestor sets none of which improved performance significantly. The ancestor set with which one would most expect to improve performance consists of the solution itself. Remember that the ancestor is not part of the population, so this does not lead to an instant result. For PORs, however, the optimum was always found quickly when given this type of ancestor set. Not so for PTH. As can be seen in Table IV, even feeding the PTH problem its own solution as an ancestor did not significantly improve performance. The best ancestor sets included parse trees which

had depth greater than n. For example, for n=4 using an optimal parse tree of depth 7 as an ancestor resulted in a mean time to solution of 551 (compare to baseline time of 814); using the same ancestor for n=6 resulted in a mean time to solution of 5541 (compare to baseline time of 7334). A reason that the single parent technique doesn’t work well for this problem could be that the solution depends on the structure of the entire tree (pluses near the bottom, times on top) instead of having various subtrees contributing parts of the solution. (See Figure 7 for an example solution.)



(Mul 85.3 (Nex (Mul 0.1 X1)))



(Mul 85.3 (Nex (Mul 0.2 X1)))



(Mul 85.3 (Nex (Mul 0.3 X1)))



(Mul 85.3 (Nex (Mul 0.4 X1)))



(Mul 85.3 (Nex (Mul 0.5 X1)))



(Mul (Add 85.3 (Mul 55 X1)) (Nex (Mul 0.2 X1)))



(Mul (Add 85.3 (Mul 55 X1)) (Nex (Mul 0.3 X1)))



(Mul (Add 85.3 (Mul 55 X1)) (Nex (Mul 0.4 X1)))



(Mul (Add 85.3 (Mul 55 X1)) (Nex (Mul 0.5 X1)))



(Mul (Add 85.3 (Mul 55 X1)) (Nex (Mul 0.6 X1)))

D. Modeling nearly identical paralogs Paralogs are pairs or sets of genes thought to be copies of a single ancestral gene. Duplication of genes and subsequent variation of the copies is thought to be an important source of new genetic function. In [9], a collection of paralogs differing in a very small number of base positions were documented and biologically validated. Biological validation was required to ensure that the apparent paralogs were real rather than the result of sequencing errors. (These nearly identical paralogs or NIPs were discovered because of a coincident pattern of apparent sequencing errors in a large-scale genomic assembly.) The existence of NIPs implies the existence of paralogs with no sequence divergence, totally identical paralogs or TIPS. In an effort to estimate the number of TIPS, the available NIPs data were modeled in [9] using a simple parameter estimation evolutionary algorithm. The NIPs data is given as the count of NIPs which vary in a given number of positions. The model is thus the number of NIPs as a function of the number of positions in which they vary, and the zero of the model is its estimate of the number of TIPS. The model used for the NIPs data was of the form

Fig. 8.

Ancestor set for the NIPs experiment.

95% Confidence Interval Mean RMS error Single Parent Baseline 1.80

2.20

N (p) = e−rp f (p),

Fig. 9. Comparison of fitnesses for single parent and baseline experiments for the NIPs problem with the number of nodes restricted to 40.

where N (p) is the number of NIPs exhibiting p polymorphisms and f (p) is a polynomial. The results of the evolutionary algorithm estimating parameters for this model were used in the ancestor set for the GP evolutionary algorithm. The parse trees used the unary operations Neg (negate), Sin (sine), Cos (cosine), Atn (arctangent), Sqr (square), and Nex (e−|x| ), and the binary operations Add (addition), Sub (subtraction), Mul (multiplication), and Div (division). There was one variable and ephemeral constants ranging from -100 to 100. Experiments used double tournament selection with tournament size 7. Mutation was performed by replacing a subtree picked uniformly at random with a new random subtree of the same size for each new tree produced. 100 populations were run for each problem case. The fitness function was RMS error (to be minimized). The timeout limit was 1,000,000. The ancestor set, shown in Figure 8, for the single parent version consisted of 10 ancestors generated from the best result from an evolutionary algorithm attempting to fit the model N (p) = Ae−rp using a gene with two parameters, A and r, and the best result from an evolutionary algorithm attempting to fit the model N (p) = (Ap + B)e−rp using a gene with three parameters, A, B, and r. The results were rounded to

the nearest tenth, and five versions of each solution were used with r varying by one-tenth centered on the solution. The algorithm was run for 2000 generations. Parse tree size was limited to 20 nodes in one experiment and 40 nodes in another. In the baseline version, parse trees were chopped when they grew too large; in the single parent version, when crossover resulted in a parse tree which was too large, the parent tree was returned to the population unchanged. The single parent technique proved to be effective both in improving fitness and in limiting size. The mean RMS error with the single parent version was 1.843 as compared to 2.105 with the baseline version. Confidence intervals are shown in Figure 9. The size of the trees averaged 38.3 for the baseline, but only 31.2 for the single parent version. Confidence intervals are shown in Figure 10. The character of the solutions found were different for the baseline and single parent experiments. Table V shows the proportion of each operation found in the solutions as well as the proportions in the ancestors. The single parent solutions favor the operations found in the ancestors (especially Mul and

95% Confidence Interval Mean tree size Single Parent Baseline 28.00

40.00

Fig. 10. Comparison of tree sizes for single parent and baseline experiments for the NIPs problem with the number of nodes restricted to 40.







(Sub (Div (Add -12.8846 (Mul (Div X1 (Sub X1 (Div 46.246 22.1128))) X1)) -2.64225) (Div -77.3788 X1)) (Sub (Div 81.2515 X1) (Div (Neg (Cos (Div X1 5.96716))) (Sqr (Sqr (Div (Sub (Div X1 X1) X1) X1))))) (Mul 108.207 (Div (Cos (Nex (Mul (Mul (Mul X1 X1) (Mul (Sin (Sqr X1)) (Nex X1))) (Mul X1 X1)))) X1))

Fig. 11. Examples of baseline results for NIPs data fit using a chop operator with a node limit of 20.







(Mul (Add 77.2381 (Div 29.7946 X1)) (Nex (Mul 0.689292 (Mul 0.4 X1)))) (Sub (Mul 84.579 (Nex (Mul 0.309174 (Mul 1.05926 X1)))) (Div -19.926 X1)) (Mul 109.477 (Nex (Mul (Mul 1.00154 X1) (Mul 0.2 1.66559))))

Fig. 12. Examples of single parent results for NIPs data fit with parse trees restricted to no more than 20 nodes.

Op Add Sub Mul Div Sqr Nex Sin Cos Atn Neg

Baseline 16% 18% 12% 23% 6% 3% 4% 6% 6% 6%

Single Parent 16% 7% 40% 8% 2% 14% 5% 4% 2% 2%

Ancestors 12.5% 0% 62.5% 0% 0% 25% 0% 0% 0% 0%

TABLE V O PERATION USE IN ANCESTORS AND

SOLUTIONS TO THE

NIP S PROBLEM

USING STANDARD AND SINGLE PARENT GENETIC PROGRAMMING .

Nex); the baseline solutions use more Div and Sub operations. Examination of the results from the single parent runs shows that the building blocks in the ancestors are preserved. Every solution found by the single parent algorithm used a Nex operation, in contrast to the baseline in which only 26% of the solutions used a Nex operation. In the single parent runs, both (Mul 106.899 (Nex (Mul 0.32675 X1))) and (Mul 109.202 (Nex (Mul -0.332429 X1))) appeared as solutions. These have exactly the same form as the ancestors, just with different constants. Solutions similar to each other as well as similar to the ancestors were found by different single parent runs. For example, (Add (Mul 84.7384 (Nex (Mul -0.32777 X1))) (Div 19.8396 X1)) and (Add (Mul 84.1693 (Nex (Mul -0.325772 X1))) (Div 20.0159 X1)). Such duplicates did not seem to be present in the baseline results. The solution with the best fitness (1.079) found by the single parent algorithm limited to 20 nodes (written in more standard notation, with constants rounded and simplified) was   75 (105 + 7 · X · cos ) · e−|0.3·X| sin X In the single parent algorithm limited to 40 nodes, the solution with best fitness (0.807) was 2

3.33·(26+cos(−100+sin(86·e−X −sin X))2 ·2X)·e−|0.3·X|. In contrast, the solution with best fitness (1.700) for the baseline version with the chop operator set to 20 was 80 −39 + (arctan(11 − X) − 1) · 1.6 + , X − X2 X and the solution with best fitness (0.958) for the baseline with the chop operator set to 40 was − sin(−80 + arctan(−0.3X)) · 0.4 · X

(−13 cos X − 49) sin2 (X 2 − X) − 80 . X Not all the single parent solutions are so similar to the ancestors. For example, this solution with fitness of 1.867 looks somewhat different: X + X) · e−|0.3(cos(arctan(X)−X)−X)| . (81 + X − 10 The single parent algorithm is doing a more directed search than the baseline is doing. Instead of exploring the entire space of possible solutions encoded by its parse trees, it is looking only for solutions that are of the general form −

f (x) · e−|g(x)| .

It is important to note that some of the models found by this algorithm with good fitnesses may not actually be good models of the data. They may be overfitted to the particular data set they evolved to model. To find the best models an additional step needs to be taken: cross validation with other NIPs data.

E. Discussion The single parent technique seems to be useful both as a way of limiting bloat with low computational cost and as a way of incorporating expert information into the evolutionary computation. Also, it is useful as a way of preserving building blocks and of directing the algorithm’s search. The information in the ancestors is never lost to the algorithm. This can speed time to solution and reduce the number of timeouts. It can also affect the degree of exploration vs. exploitation; the single parent technique sways the balance toward exploration near ancestors. As the results for the PTH problem show, the single parent technique is not appropriate for every problem. The choice of ancestor set is crucial. Some ancestor sets degrade performance, others enhance it. Focusing the search near the ancestors is a two-edged sword that can buy rapid results at the cost of breadth of search. The greatest advantage to the technique seems to be in controlling the growth of parse trees. In the baseline experiments, the parse trees tended to quickly grow as large as possible, meaning that most of the search was of large parse trees. The single parent technique caused the parse trees to grow more slowly, hence allowing a better search of diverse tree sizes. The choice of baseline for comparison (standard genetic programming with chopping used for size control) was somewhat arbitrary. Future experiments could compare single parent genetic programming with other size control measures like incorporating size into the fitness evaluation [8] [18] or only using the smaller of the two children produced by crossover [10]. Single parent genetic programming has the advantage of computational simplicity and of also incorporating expert knowledge into the system. It would also be interesting to compare the single parent technique to other techniques which improve performance in genetic programming, such as ADFs [14] and population seeding. Experiments with ADFs would be particularly interesting because the ADFs for one problem case might make good ancestors for another problem case. It would also be interesting to test the single parent technique with other kinds of genetic programming like ISAc lists [2] or finite state automata or graph-based evolutionary algorithms [4]. It would be good to be able to categorize which sorts of problems work well with the single parent technique and which don’t. Single parent genetic programming could be used with competitive problems (such as Prisoner’s Dilemma [5]) instead of optimization problems. Would a population of single parent creatures beat a population of standard creatures, or vice versa? Would the solutions be more or less robust? Another thing which would be interesting to try would be allowing the ancestor set to change. One idea is to promote highly fit creatures to the ancestor set as evolution progresses. Another idea is to let the ancestor set itself evolve. Fitness of an ancestor would be the average change in fitness of creatures in the main population when crossed over with it. However, this might reduce the size control effect of the technique unless the criterion for promotion included small size.

R EFERENCES [1] Peter J. Angeline and Jordan B. Pollack. Coevolving high-level representations. In Christopher Langton, editor, Artificial Life III, volume 17 of Santa Fe Institute Studies in the Sciences of Complexity, pages 55–71, Reading, 1994. Addison-Wesley. [2] D. Ashlock and M. Joenks. ISAc lists: A different program induction method. In Genetic Programming 1998, pages 18–26, 1998. [3] Daniel Ashlock and James I Lathrop. A full characterized test suite for genetic programming. In Proceedings of the Seventh Annual Conference on Evolutionary Programming, pages 537–546, 1998. [4] Daniel A. Ashlock. Evolutionary Computation for Modeling and Optimization. Springer-Verlag New York, Inc., Secaucus, NJ, 2005. [5] Robert Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984. [6] Wolfgang Banzhaf, Peter Nordin, Robert E. Keller, and Frank D. Francone. Genetic Programming : An Introduction. Morgan Kaufmann, San Francisco, 1998. [7] Lois McMaster Bujold. Ethan of Athos. Baen Books, Riverdale, NY, 1986. [8] Edwin D. DeJong and Jordan B. Pollack. Multi-objective methods for tree size control. Genetic Programming and Evolvable Machines, 4:211– 233, 2003. [9] Yan Fu, Tsui-Jung Wen, Ling Guo, Debbie Chen, Karthik Viswanathan, Mu Zhang, Yefim Ronin, David Mester, Abraham Korol, Daniel A. Ashlock, and Patrick S. Schnable. Genetic structure analysis of maize recombinant inbred lines. in preparation, 2005. [10] Chris Gathercole and Peter Ross. An adverse interaction between crossover and restricted tree depth in genetic programming. In Genetic Programming 1996, pages 291–296, 1996. [11] Kenneth Kinnear. Advances in Genetic Programming. The MIT Press, Cambridge, MA, 1994. [12] Kenneth Kinnear and Peter Angeline. Advances in Genetic Programming, Volume 2. The MIT Press, Cambridge, MA, 1996. [13] John R. Koza. Genetic Programming. The MIT Press, Cambridge, MA, 1992. [14] John R. Koza. Genetic Programming II. The MIT Press, Cambridge, MA, 1994. [15] W.B. Langdon, T. Soule, R. Poli, and J.A. Foster. The evolution of size and shape. In Lee Spector, William B. Langdon, Una-May O’Reilly, and Peter J. Angeline, editors, Advances in Genetic Programming, volume 3, pages 163–190. The MIT Press, Cambridge, MA, 1999. [16] W.B. Langton. Quadratic bloat in genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference 2000, pages 451–458, 2000. [17] Peter Nordin, Frank Francone, and Wolfgang Banzhof. Explicitly defined introns and destructive crossover in genetic programming. In Peter Angeline and Kenneth E. Kinnear Jr., editors, Advances in Genetic Programming, volume 2, pages 111–134. The MIT Press, Cambridge, MA, 1996. [18] John Dickinson Terence Soule, James A. Foster. Code growth in genetic programming. In Genetic Programming 1996, pages 215–223, 1996.