Developmental Evaluation in Genetic Programming: the Preliminary Results McKay, RI (Bob)1, Hoang, Tuan Hao2, Essam, Daryl2, Nguyen, Xuan Hoai3 1 School of Computer Science & Engineering, College of Engineering, Seoul National University, San 56-1, Sinlim-dong, Gwanak-gu, Seoul 151-744, Korea Email: [email protected] 2 School of ITEE, University of New South Wales @ Australian Defence Force Academy, Canberra, Australia. Email: Hao: [email protected] Daryl: [email protected] 3 Vietnamese Military Technical Academy, Hanoi, Vietnam. Email: [email protected]

Abstract. This paper investigates developmental evaluation in Genetic Programming (GP). Extant GP systems, including developmental GP systems, typically exhibit modular and hierarchical structure only to the degree it is builtin by the designer; by contrast, biological systems exhibit a high degree of organization in their genotypes. We hypothesise that even when GP systems are subject to changing environments, for which the adaptability arising from modular structure would be advantageous, the benefit is at the species rather than individual level, so that selection is very weak. By contrast, biological systems are selected repeatedly throughout their development process. We suggest that this difference is crucial; that if an individual is evaluated multiple times throughout its development, then modular structure can provide an adaptive advantage to that individual, and hence can be selected for by evolution. We investigate this hypothesis using Tree Adjoining Grammar Guided Genetic Programming (TAG3P) [1], which has good properties for supporting evaluation during incremental development. Our preliminary results show that developmental TAG3P outperforms both original TAG3P and standard tree-based GP on an appropriate problem, in ways which suggest that modular solutions may have been developed.

1 Introduction Genetic Programming (GP) was developed by Koza [2] in 1992 It is an automatic programming methodology using simulation of evolution to discover functional programs to solve a problem. Genetic programming breeds a population of trial solutions using biologically inspired operators, which include reproduction, crossover (sexual recombination), mutation, and forms of natural selection. In essence, it uses evolutionary search methods to search for solutions to given problems within an in-

principle unbounded space of expressions. However, the solutions found are generally poorly structured and highly disorganised, exhibiting no hierarchical or modular structure. An individual in a genetic programming system is generally expected to solve problems immediately, without the benefit of a developmental phase. By contrast, the natural evolutionary systems on which it is based are able to evolve hierarchical modular structure (e.g. the homeobox gene complex). Generating hierarchical, modular structures would greatly benefit GP, potentially dramatically increasing the scalability of GP applications, as well as the adaptability of GP solutions. There have been a wide range of approaches to solving this problem in GP. For example, Angeline [3] developed a technique called Module Acquisition, which is based on the creation and administration of a library of modules for the automatic generation of subroutines. Other studies have investigated Automatically Defined Functions (ADF) [4], which is probably the most popular of the modularization methods used in GP. Rosca investigated an Adaptive Representation [5], which is based on the discovery of useful building blocks of code. This approach greatly improved search efficiency on the problem’s considered. However, all these approaches involve some level of programmer intervention, thus imposing a level of modularity that nature has been able to evolve for itself. Recently, interest in developmental approaches in Evolvable Hardware has begun to increase. Haddow et al. [6] used Lindenmayer systems for digital circuit design, while Miller [7] developed Cartesian Genetic Programming for the automatic evolution of digital circuits, and attempted to evolve a cell that could construct a larger program by iteration of the cell’s program. Nevertheless, modular structure has not been clearly demonstrated in existing developmental GP systems. We argue that this is because modular structure, if used for a single evaluation as in most artificial developmental systems, only has adaptive advantages to entire species, not to particular individuals, and hence imposes very weak selection pressure in evolution. In developmental biological systems, on the other hand, evaluation is continuous throughout development (if the individual is insufficiently fit to survive at a particular stage of development, the fitness it would exhibit at later stages is immaterial). A modular structure, which allows biological sub-systems to develop in synchrony throughout development, can thus provide a selective advantage to the individual. Our working hypothesis is that, if the individual is evaluated on multiple problems at different stages of development, then modular structure can provide an adaptive advantage to that particular individual, and hence can be selected for by evolution. This hypothesis is investigated using the Tree Adjoining Grammar Guided GP (TAG3P) representation, which has ideal properties for supporting evaluation during incremental development. In particular, this representation has a feasibility property, allowing any expression tree to be evaluated, regardless of the detachment of any number of its sub-trees. This means that smaller sections of the tree can easily be tested on simpler problems, providing a straightforward way to test our hypothesis at relatively low implementation cost. In these experiments, the developmental process is extremely naïve, consisting in effect of undirected growth of each individual (in implementation, we evolve the whole tree but evaluate increasing portions of it). We do not propose this as a serious

developmental model; we deliberately use a minimal developmental model to emphasise the crux of this paper, namely the effect of evaluation during development. The paper outline is as follows. The next section briefly describes Tree Adjoining Grammars (TAGs) and TAG based Genetic Programming. Section 3 introduces our Developmental Evaluation method based on Tree Adjoining Grammar Guided Genetic Programming (DEVTAG). Experimental setups are described in section 4. Section 5 and 6 provide the results and discussion. Conclusions and future work are laid out in the last section.

2 Tree Adjoining Grammar, TAG based Genetic Programming The following section gives a brief, somewhat intuitive introduction to TAG; a fuller description of TAG may be found in [1]. 2.1 Tree Adjoining Grammars (TAGs) TAGs are tree-generating and analysis systems, first proposed by Joshi [8] for Natural Language Processing (NLP) purposes. The aim of TAG is to more directly represent the structure of natural languages than is possible in Chomsky languages, and in particular, to represent the process by which natural language sentences can be built up from a relatively small set of basic linguistic units by inclusion of insertable sub-structures. Thus ‘The cat sat on the mat’ becomes ‘The black cat sat lazily on the mat’ by the subsequent insertion of the elements ‘black’, and ‘lazily’. In more detail, a tree-adjoining grammar comprises of a quintuple (T, V, I, A, S), where: - T is a finite set of terminal symbols. - V is a finite set of non-terminal symbols (T  V = ). - S  V is a distinguished symbol called the start symbol. - I is a set of initial trees, characterised by all interior nodes being labeled by non-terminal symbols, while the nodes on the frontier are labeled by terminals. - A are auxiliary trees, characterised by all internal nodes being labeled by nonterminal symbols, while nodes on the frontier are labeled by terminals, except for one special node called the foot node. A foot node must be labeled with the same nonterminal symbol as that labeling the tree’s root node. The convention of marking the foot node with an asterisk (*) is followed here. The trees in E = I  A are called elementary trees. Initial trees and auxiliary trees are indicated as  and  respectively. A tree with root labeled by non-terminal symbol X is called an X-type elementary tree.

The key operation used with TAG is adjunction. Adjunction builds a new tree  from an auxiliary tree  and a tree  by inserting  into  at a specified place. Adjunction is illustrated in Figure 1. More formally, if a tree  has an interior node labeled A, and  is an A-type tree, the adjunction of  into  to produce  is as follows: Firstly, the sub-tree 1 rooted at A is temporarily disconnected from  (consider Figure 1.a). Next,  is attached to  to replace the sub-tree 1(1.b). Finally, the process of building  is completed when 1 is attached back to the foot node of  (1.c).

Fig. 1. An example of the Adjunction operator 2.2 TAG based Genetic Programming Tree Adjoining Grammar Guided Genetic Programming (TAG3P) [1] is a grammar guided genetic programming system. One of the most important of the TAG representation’s properties is a feasibility property, namely that any rooted subtree of a valid TAG tree is also a valid TAG tree. Thanks to the feasibility property, in growing a derivation tree from the root, one can stop at any time and still have a valid derivation tree as well as a valid derived tree. For example, if a derivation tree consisted of 1 adjoined to  (from figure 3), we could either stop at  before considering 1, generating the derived tree x, or consider the entire tree and generate x+x.

3 Developmental Evaluation based on TAG3P The problem chosen for investigating our hypothesis is the symbolic regression problem with simple polynomials as target functions. This kind of symbolic regression problem is well-known for its increasing difficulty with polynomial degree [2, 9]. In particular we experimented with the series of polynomial functions as follows: F1 = X, F2 = X2+ X, F3 = X3+X2+X, F4 = X4+ X3+X2+X … F9 = X9+X8+X7+X6+X5+X4+X3+X2+X. We expect this increasing difficulty could allow us to exploit the developmental evaluation approach. To fulfil the requirement of tackling increasingly difficult problems throughout development, the individual is separated into multiple layers, with more of the individ-

ual being used for the more difficult fitness functions. Specifically, the individual is separated as below: Depth 2 for function F1 = X Depth 4 for function F2 = X2+X … Depth 18 for function F9 = X9+X8+X7+X6+X5+X4+X3+X2+X. We use tournament selection, which only requires a fitness ordering of individuals. For DEVTAG, we use a special multi-stage comparison to generate this ordering. Corresponding to the insight that later-stage fitness is only important if the individual survives earlier stages, we compare individuals on simpler problems first; only if they are roughly equivalent on the simpler problems do we evaluate them on more complex ones. We denote the fitness of an individual I evaluated at stage j by F(I,j). For two individuals (I1, I2), the comparison process (for minimisation) is: i := 1; While |F(I1, i) - F(I2, i)| <  i := i + 1; if (F(I1, i) < F(I2, i)) then I1 wins else I2 wins An example of this algorithm is shown in Figure 2, comparing the individuals I1 and I2 with fitness value arrays (corresponding to the 9 different stages), I1(10.05, 14.67… , 20.35), and I2 (10.06, 14.66, … , 10.35). In this case, I2 would be chosen for evolution.

Fig. 2. An example of comparing two individual in DEVTAG

The context-free grammar G for this problem has a function set including unary and binary operators {+, - ,*, /, sin, cos, log, EXP}. The terminal set is X. Formally: G = (N,T,P,S} S = EXP – the start symbol N = {EXP, PRE, OP, VAR} T = {X, sin, cos, lg, ep, +, -, *, /}, (ep is exponential, lg is log function) The corresponding LTAG Glex is shown overleaf

Glex= {N={EXP, PRE, OP,VAR},T={X, sin, cos, log, ep,+, -, *, /, (, )}, I, A) where I A is as in Figure 3.

Fig. 3. Elementary trees for Glex

4 Experimental setups Table 1. Parameter settings for the symbolic regression problem


Find a function that exactly fits a given sample of 20 (xi, yi) data points.

Success Predicate

Sum of errors over 20 points <  = 0.01

Terminal sets

X - the independent variable

Operators( Function set)

+,-,*,/, sin, cos, exp, log

Fitness Cases

The sample of 20 points in the interval [-1..+1].


Sum of the errors over 20 fitness cases.

Genetic Operators

Tournament selection(3), sub-tree crossovers and sub-tree mutations using on TAG3P, normal standard crossovers and muations using on GP


The crossover probability is 0.9. The mutation probability is 0.1.

Min/Max initial zise on TAG3P

2 to 1000

Max depth using for GP


To investigate the effect of developmental evaluation on TAG3P, three experimental settings have been used with different population sizes (POPSIZE = 100, 250, 500 and 1000), with the maximum generation size (MAXGEN) changing correspondingly to keep a constant budget of 229,500 (9x51x500) function evaluations: 1. DEVTAG: using developmental evaluation, as described above 2. GP: A standard Koza-style tree-based GP run for evolving F9, using population size POPSIZE, evolving until the evaluation budget is used. 3. TAG: This treatment is designed to address a potential issue, that any differences might arise from differences in representation. The GP experiment is repeated using TAG representation, but otherwise a standard tree-based GP algorithm (the TAG3P system).

5 Results Table 2. Successful runs (from 100 runs)


POPSIZE=100 13 3 0

POPSIZE=250 33 8 0

POPSIZE=500 27 9 0

POPSIZE=1000 3 4 0


Cumulative Frequency


F2 F3


F4 F5


F6 F7


F8 F9

20 0 1


17 25 33 41 49 57 65 73 81 89 97 Generations

Fig. 4. Cumulative success frequency of DEVTAG and TAG against number of function evaluations

Fig. 5. Cumulative success frequency of DEVTAG on each of the 9 problems

Table 2 shows the absolute number of successful runs out of 100 for each of the three treatments and four different population sizes. Note that the 0 entries mean that the GP runs were never successful. Figure 4 shows the cumulative probability of success of the two successful treatments (for the setting where the population size is 250), plotted against the number of function evaluations used in the evolution. To help in understanding how incrementally DEVTAG solves the problems, figure 5 shows the cumulative probability of success of DEVTAG, for all 9 symbolic regession problems, for the particular case of population size 250.

6 Discussion From table 2, it is clear that developmental evaluation is very effective at finding exact or near-exact solutions to the problem, over a wide range of population size settings (for population size 1000, the very short number of generations – 26 – gives DEVTAG no realistic chance of finding all nine functions, F1 through F9). At population size 250, DEVTAG’s probability of success was 33%, well above that achieved by the other treatments. It is also clear that this is an extremely difficult task for standard GP. It is worth noting that DEVTAG gives us solutions to all the other eight functions, at no additional computational cost. From Figure 4, we see that it takes DEVTAG some time to find solutions at all, but once it does so, it rapidly finds more. We interpret this as DEVTAG needing a number of evaluations to get evolution running well at the lower levels, but once it does, solutions to F9 follow rapidly. We note that the stepped evaluation method of DEVTAG means that many of the higher functions do not need to be evaluated in the earlier generations, so that DEVTAG does not actually use its whole budget of evaluations. This is why the DEVTAG plot stops early in figure 4. In fact, the contrast in total computational cost is even greater than these figures suggest. TAG3P generates far larger individuals than DEVTAG. For example, the average size of the phenotype of the best-of-run individual for the TAG3P runs (i.e. what in Koza-style GP is known as the s-expression tree) with population 250 was 533.2 nodes, while that for DEVTAG was 31.88 nodes Since the computational cost of evaluating an s-expression is generally proportionate to the number of nodes evaluated, it is already clear that DEVTAG has much lower computational cost, per evaluation, than TAG3P (or GP). Yet even so, DEVTAG has a further computational cost advantage. An individual is only ever fully evaluated if it has near-identical function values for the lower level functions with some individual that it meets in a tournament. Since this is an unusual occurrence in the dynamic phases of evolution, most individuals will only ever be partially evaluated, reducing the computational cost still further. These parsimony issues will be investigated in greater detail in later papers, where we hope to present detailed results on the number of nodes actually evaluated

in a run. They also raise interesting issues regarding optimal tournament size and diversity mechanisms for DEVTAG, which we plan to investigate in future work. Figure 5 appears to confirm our interpretation, of gradually finding lower-level solutions, with the solutions of higher complexity following fairly rapidly. More detailed analyses, which we have insufficient room to present in detail here, show that DEVTAG virtually never finds a solution to function Fi without previously having found a solution to Fi-1. Further, there is a strong suggestion from the very closeness of the curves, that once DEVTAG has found building blocks for lower-level solutions, they are quickly assembled to form higher-level solutions. We conjecture that DEVTAG is achieving this by replicating building blocks and creating modularity; testing this hypothesis primarily awaits our determining an adequate empirical test for modularity. At the very least, the results strongly support the view that incremental learning of a family of increasingly difficult functions has been demonstrated.

7 Conclusions and future works The results of developmental evaluation using TAG representation clearly demonstrate a form of problem-driven incremental learning. DEVTAG has been provided with a family of related problems of increasing difficulty, and it has proceeded to solve them incrementally. We believe this is a landmark in itself. The computational cost of the approach is also worth noting (though it is not the primary focus of this work), DEVTAG being much less expensive than the other approaches in computational cost, as well as yielding much more (a family of functions rather than just one) in return for that computational investment. Equally important, the results strongly suggest that developmental evaluation has promoted the evolution of modular structure, and this is certainly our impression on viewing the evolved genotypes. Confirming it is primarily a matter of developing an operational measure for modularity applicable to the TAG representation. Hornby [10] has recently considered this question, but his metrics are based on an assumption of explicit representation of modularity, hence it is not easy to see how to extend them to our work. Finding an appropriate metric for modularity and code re-use is our primary short-term goal. The work reported here is primarily a pilot study for a larger-scale approach with a more sophisticated developmental process. The TAG representation is crucial to this, because it removes any difficulty in ensuring that intermediate developmental stages can be evaluated. We plan to replace DEVTAG’s trivial developmental process with a more sophisticated approach based on a TAG analogue to L-systems. We aim to apply this system to a range of problems, and to analyse its behaviour, particularly in terms of the modularity and complexity of evolved solutions.

