Key words: genetic programming, grammar, context free, tree adjunct, ... Distribution Algorithms for GP, grammar-based approaches constitute the majority of the .... A second issue with GE is that it does not fulfil the causality principle, that small ..... Foundations of Genetic Algorithms 4, Morgan Kaufmann, 379-408, 1997.
Section 1 Grammars in Genetic Programming: A Brief Review MCKAY, Robert I1, NGUYEN, Xuan Hoai1,2,, WHIGHAM, Peter Alexander3 & SHAN, Yin1 (1 School of Information Technology and Electrical Engineering, University of New South Wales at the Australian Defence Force Academy, Canberra 2600, Australia) (2 Department of Computer Science, Army Technical Academy, Hanoi, Viet Nam) (3 Department of Information Science, University of Otago, Dunedin, New Zealand)
Abstract: Grammars have become one of the most important formalisms in the field of genetic programming. We briefly survey the history of their application in genetic programming, and discuss their relevance to the field. Key words: genetic programming, grammar, context free, tree adjunct, estimation of distributions
1
Introduction
Genetic Programming (GP) was first described by Cramer 1 and fully developed by Koza 2 . Over the succeeding years, it has become increasingly important, and has been applied to a wide range of problems. In essence, GP uses evolutionary search methods to search an in-principle unbounded space of expressions for solutions to problems. GP is important as one of the few modelling techniques able to learn solutions of potentially unbounded complexity, as opposed to the bounded complexity assumed by most methods. The other approaches able to naturally handle unbounded complexity approaches are Inductive Logic Programming (ILP) and Recurrent Neural Networks (RNNs) – in the ILP literature, the distinction is often drawn by contrasting first-order expression languages with zeroth order, or attribute-based, expression languages. Along with ILP, GP offers white-box, human interpretable solutions, as opposed to the black-box solutions provided by RNNs, so that ILP forms its most direct competitor in domains where human-comprehensible solutions are desired. While the relative strengths of ILP and GP have not been studied extensively (with the limited exception of Tang 3), principally because their different representation languages render it difficult to guarantee fair comparisons, it appears that ILP has been most successfully applied in learning in relatively cleanly constrained domains, while GP has demonstrated strength primarily in noisy, complex, and poorly delineated search domains. GP has been successfully applied in a range of real-World problems, ranging from ecological modelling4 to software cost estimation5, from radar antenna design6 to quantum circuit design7, from DNA motif detection8 to digital logic design8. Grammars have been used with GP almost from its inception; at first, grammar-based approaches formed a small segment of the GP world, but their role has expanded to the point where Grammatical Evolution (GE9) is now one of the most widely applied GP methods, and in particular areas, notably Estimation of Distribution Algorithms for GP, grammar-based approaches constitute the majority of the current research effort. However the literature on grammar-based GP is widely scattered, with significant portions only available in difficult-to-obtain conference proceedings, resulting in many researchers (including ourselves) re-discovering the wheel. It seems that grammar-based GP has now reached a stage of maturity justifying – indeed, almost demanding – a concerted effort to draw together the threads of the field. In section 2, we will provide a historical framework for the field, and fit the published work into this framework. In section 3, we will attempt to unravel the advantages – and disadvantages – that grammar based approaches as a whole, and particular implementations in more detail, bring to GP. We will summarise our conclusions in section 4.
2 2.1
History Foreshadowings
Antonisse10 appears to be the first researcher to propose the use of context free grammars for generating grammar-based chromosomes in GAs. The proposed system, known as grammar-based GAs, uses as chromosomes the strings from the language generated by a CFG G. When crossover is to be carried out between two chosen individuals, they must be parsed into two derivation trees of G. Then, the crossover acts in a similar way as in the Grammar-Guided GP (GGGP) systems, using a tree based representation, described in the next subsection. The use of the string set as the chromosome, and the consequent need to parse strings, creates two important difficulties in this approach (perhaps explaining the dearth of implementations). Firstly, parsing is an expensive (and as will subsequently be seen, unnecessary) process, greatly increasing the computational cost of this approach. Secondly, if the grammar is ambiguous, for each string, there might be more than one derivation tree. Therefore, Antonisse suggested that the use of ambiguous grammars be forbidden. He argued that this restriction was not problematic, as ambiguous grammars would be bad for evolution in any case – a conclusion cast in considerable doubt by a recent publication of Nguyen et al 11. Some years after Antonisse’ proposal, a number of researchers suggested that grammars might be used to control the structure of programs in GP. Stefanski 12 proposed the use of abstract syntax trees to set a declarative bias for GP; Roston13 demonstrated how a formal grammar might be used to specify constraints for GP in the context of egineering design; and Mizoguchi et al14 suggested the use of production rules to generate hardware language descriptions during the evolutionary process. Roston used the grammar to generate the initial population only, while the other proposals gave only a theoretical description and do not appear to have been implemented. None of the systems maintained the derivation trees of the grammar, either re-parsing from the string set as for Antonisse (Stefanski) or combining with Strongly Typed GP15 (Roston).
2.2
Grammar-Guided Genetic Programming 2.2.1 Tree Based GGGP
The first fully-fledged GGGP systems were independently invented at about the same time by three different researchers. Whigham 1617 proposed the CFG-GP system, in which a context-free gramar was used to generate the population, consisting of derivation trees of the CFG. Schultz18 derived his very similar GGGP system for learning knowledge rules for expert systems, which differs mainly in the algorithm used to initialise the population 19 . Wong and Leung 20212223242526 proposed the LOGENPRO system, which uses PROLOG Definite Clause Grammars (DCGs) to generate (logic) programs to learn first order relations. DCGs are somewhat more expressive than CFGs, being able to generate some context-sensitive languages. This is the primary difference between the systems; in other respects, LOGENPRO and CFG-GP are very similar. Given the similarity of the three systems, we base the description below primarily on CFG-GP. The five basic components of a GGGP system are as follows: 1. Program Representation Each program is a derivation tree generated by a grammar G 2. Procedure for initialising population. Whigham17 proposed a simple algorithm to generate random derivation trees up to a depth bound, based on a procedure for labeling the production rules with the minimum tree depth required to produce a string of terminals. Bohm and Schultz19 derived an algorithm for initialising the population based on the derivation-step-uniform distribution. The initialisation procedure for LOGENPRO was ‘embedded in PROLOG’27 3. Fitness evaluation Fitness evaluation is carried out on the individuals by reading the expression tree from the leaves of the grammar derivation tree, then evaluating it as in standard GP. 4. Genetic Operators The genetic operators are the selection mechanism, reproduction, crossover, and
mutation. Selection and reproduction are as in standard GP. • In crossover, two internal nodes labelled with the same nonterminal symbol of the grammar are chosen at random, and the two sub-derivation trees underneath them are exchanged. In other words, the crossover acts the same as standard GP subtree crossover, with the additional constraint that crossover points are required to have the same grammar label. Genetic recombinations in biological systems are usually homologous – the genetic materials are not exchanged in completely random fashion, but between genes that have similar function28. By analogy, we term crossovers exchanging sub-derivation trees stemming from the same nonterminal symbols ‘homologous crossovers’. • Mutation is performed by selecting an internal node at random. The nonterminal on this node is noted, and the sub-derivation tree rooted there is deleted. A new sub-derivation tree starting from the same nonterminal, and generating according to the grammar, is randomly generated to replace the deleted one. 5. Parameters Parameters include population size (POPSIZE), maximal number of generations (MAXGEN), maximal depth for the individuals, and probabilities for the application of the operators. Subsequent to these systems, there have been a number of similar grammar guided genetic programming systems using derivation trees from a grammar as the representation for individuals. Gruau 29 presented some strong arguments for the use of context-free grammars as a tool to set language bias on individuals. His resultant grammar guided genetic programming system is very similar to Whigham's CFG-GP. However, no depth was used to restrict the size of program, potentially leading to code bloat problems. Keijzer30 and Ratle and Sebag 31 introduced a grammar guided genetic programming system similar to Whigham's CFG-GP, but with a different initialisation process, to solve some industrial problems, the resultant system was known as dimensionally-aware genetic programming. Some recent variants of these approaches were also implemented in specific programming or web languages, with the CFG being represented in Backus-Naur Form (BNF), by Tanev 323334, MacCallum35, and Rodrigues & Pozo 36 introduced Automatically Defined Functions (ADFs) into GGGP.
2.2.2 Alternative Grammar Representations 2.2.2.1
CFG Extensions
GGGP systems using CFGs are the commonest, and in some sense the most basic, form. However there has been a wide range of grammar representations used. In many, the major extension is to use a more expressive grammar representation, so as to represent a wider class of problem spaces than Context Free Languages (CFLs) – although it is noticeable that few of the problems studied with these systems actually go beyond CFLs. Of course, the most notable of these, in terms of both research priority and breadth of practical application, is the LOGENPRO system previously discussed. Others include Bruhn and Schultz’37 CFGs with linear constraints; Hussain & Browse’s 3839 and Zvada & Vanyi’s 40 attribute grammars; and Ross’41 Definite Clause Translation Grammars (DCTG) - an extension of the DCG grammars used in Wong and Leung's LOGENPRO system to incorporate semantics
2.2.2.2
Tree Adjoining Grammar Representation
The grammar representations previously mentioned are all closely related to the Chomsky grammars of the 1950s. The differences between them relate essentially to the sub-class of formal languages that they can represent, and the means by which they may incorporate semantic constraints. However another form of representation, Tree Adjoining Grammars (TAGs) has become increasingly important in Natural Language Processing (NLP) since their introduction in the 1970s by Joshi et al42. The aim of TAG representation is to more directly represent the structure of natural languages than is possible in Chomsky languages, and in particular, to represent the process by which natural language sentences can be built up from a relatively small set of basic linguistic units by inclusion of insertable sub-structures. Thus ‘The cat sat on the mat’ becomes ‘The big black cat sat lazily on the comfortable mat which it had commandeered’ by insertion of
the elements ‘big’, ‘black’, ‘lazily’, ‘comfortable’, ‘which it had commandeered’. In CFG representation, the relationship between these two sentences can only be discerned by detailed analysis of their derivation trees; in TAG representation, the derivation tree of the latter simply extends the frontier of the former. To put it another way, the edit distance between the derivation trees of these closely related sentences is much smaller in TAG representation than in CFG representation. The valuable properties which TAG representation introduces into NLP are arguably also of value in GP; in a series of papers, Nguyen has introduced two TAG-based GGGP systems, using both linear 43 and tree-structured 44454647 TAG representation.
2.2.3 Linearised GGGP Linear representations offer a number of seductive advantages over tree-based representations, and especially the highly constrained tree-based representations arising from Chomsky grammars. Most obviously, linear representations permit the application of a vast background of both theory and practice from the much wider fields of evolutionary computation with linear representations, notably Genetic Algorithms (GA) and Evolution Strategies (ES). Thus linear representations have become increasingly important in the non-grammar-based GP world, including such approaches as linear GP 48 , machine coded GP 49 and stack-based GP50. Hence it is not surprising that there have been a number of approaches to linearising the representation of programs in GGGP. All use a genotype to phenotype mapping, where the genotype is a linear sequence, and the phenotype is the derivation tree of the grammar. In the earliest work of Keller & Banzhaf51, Paterson & Livesey 5253 and Freeman 54, the genotype was a fixed string used to encode the indices for derivation rules in G. In these methods, the translation from a genotype to a phenotype is carried out from left to right, and the phenotype (G derivation tree) is built correspondingly. At each step, if there is still an uncompleted branch in the phenotype marked by a nonterminal A, a gene (number of bits) is read and intepreted as a key, indicating which, among the rules in the rule set P of G having left hand side A, will be used to extend that branch of the phenotype. If the phenotype is completed while there are still unused genes in the genotype, they are ignored (considered as introns). In the event that the translation uses all the genes, but the phenotype is still incomplete, some random or default sub-derivation trees will be used to complete the phenotype. Because the genotype is now a linear string, the system can simply use any genetic operators from GAs. The most widely used linear GGGP system is grammatical evolution (GE)55956. GE is an extension of the GGGP systems with linear representation described above. Three innovations were incorporated in GE: variable length, redundancy using the MOD rule, and the wrapping operation. The chromosome in GE has variable length rather than being fixed. Each gene is an 8-bit binary number, which is used, as in the previous systems, to determine the rule for a nonterminal symbol when it is expanded. However the modulo operator is now used to bring the key within range of the number of rules with the given LHS. The wrapping operation is used when the translation from genotype to phenotype has run out of genes while the phenotype is still incomplete. The translation process re-uses the gene from left to right. If the number of wrappings exceeds a predetermined maximal bound, then the translation finishes and the (invalid) individual is assigned a very low fitness. Since it was proposed, there has been a wide range of on-going research to develop, extend, and apply GE in many ways, including sudy of the effect of GE crossover on the phenotype57, alternatives to the MOD rule in genotype-phenotype translation 58 , different search strategies 59 new representations based on the GE representation aiming to reduce the effects of positional dependence60, implementation of GAs through GE using an attribute grammar61, and so on. There are two key issues with the GE representation. Firstly, an apparently valid genotype may code for an infeasible (i.e. not fitness calculable) phenotype. Although the problem can be handled by assigning these
individuals very low fitness values, it introduces a source of irregularity into the search space, and consitutes an obstacle for evolution and search on the GE representation if the proportion of genotypes coding for infeasible phenotypes is large. A second issue with GE is that it does not fulfil the causality principle, that small changes in genotype should result in small changes in phenotype. Although it is easy to define GE operators that make small (bounded) and controllable changes on the genotype, the resulting changes on the phenotype are unpredictable (and uncontrollable). A small change in a gene at one position may completely change the expressiveness (coding or non-coding), or even the meaning (if there is more than one nonterminal in the grammar) of all the genes following that position. In the extreme, it may change the corresponding phenotype from feasible to infeasible.
2.3
Estimation of Distribution Algorithms using Grammars
If grammar representations have a substantial presence in GP, it is in the extension of Estimation of Distribution Algorithms (EDAs) to GP problem spaces that grammar representations have become predominant. EDAs, and the closely related ant algorithms, rely on a probability model to record the progress of evolution. Stochastic grammars have shown themselves to be an excellent candidate for this probability model.
2.3.1 Background 2.3.1.1
Linear Estimation of Distribution Algorithms
One common theory to explain the behaviour of GA, as well as GP, is that through the genetic operations especially crossover - building blocks, which are high quality partial solutions, are discovered and combined to form better solutions6263. The underlying assumption of EDA is that, if we can learn these building blocks directly, instead of applying the semi-blind genetic operators (which may destroy building blocks) we may hope to significantly improve the performance of GA and GP6465 66; a probabilistic model of the set of solutions may also, in many cases, be more useful than a single solution. EDA builds upon this idea. It uses a probabilistic model to estimate the distribution of promising solutions, and thus to guide further exploration of the search space. Through iteratively building and sampling this probabilistic model, the distribution of good solutions is (hopefully) approximated. Due to the lack of knowledge of the true distribution, we have to introduce a probabilistic model to approximate the distribution. The form of probabilistic model is strongly related to specific assumptions about building blocks. EDAs work as follows. Assume Z is the vector of variables we are interested in. DH(Z) is the probability distribution of individuals whose fitnesses exceed some threshold H. If we know DHopt(Z) for the optimal fitness Hopt, we can find a solution by simply drawing a sample from this distribution. However, usually we do not know this distribution. Hence we start from a uniform distribution. In the commonest form of the algorithm, we generate a population P with n individuals, and then select a set of good individuals G from P. Since G contains only selected individuals, it represents that subset of the search space which is worth further investigation. We now estimate a probabilistic model M(ζ ,θ ) from G . ζ is the structure of the probabilistic model M while θ is the associated parameter vector. With this model M, we can obtain an approximation D¯HM(Z) of the true distribution DHopt(Z). To further explore the search space, we sample distribution D¯HM(Z), and the new samples are then re-integrated into population P by some replacement mechanism. This starts a new iteration. There are many variants of EDA. For example, each iteration may create a new population, or may simply replace part of the old population with newly generated individuals. The system may learn a model from scratch or simply update the previous model. In the field of EDA, because we have no prior knowledge of the true distribution, we have to assume that the distribution follows some well-studied model. Therefore, the accuracy of the model we choose with respect
to the true model is a vital part in the EDA algorithm. That is the reason that current EDA research is largely devoted to finding appropriate models. It is one of the natural ways to differentiate EDA methods, i.e. with respect to their probabilistic models. Most EDA research is concerned with linear string models similar to those used in Genetic Algorithms. Since our concern is with the use of grammars in GP search spaces, we will concentrate on that small body of research which has explored these spaces. \
2.3.1.2
PIPE and related non-Grammar EDA-GP
The earliest application of EDA to GP was actually very early in the history of EDA: Probabilistic Incremental Program Evolution (PIPE) 67 . PIPE is heavily motivated by the corresponding work in conventional EDA. Further studies based on the probabilistic model proposed in PIPE have followed, notably ECGP68 and EDP69. Interestingly, PIPE and related works based on GP can fit very directly into the same framework as conventional EDA. The prototype tree of PIPE is a model assuming independence among random variables, so that the tree structure has no effect on the probability estimation phase of the EDA algorithm (of course, it does affect the sampling phase). EDP extends this to pairwise dependences, while ECGP extends the dependency model to multivariate dependence. PIPE uses the Probabilistic Prototype Tree (PPT) to represent the probability distribution of tree form programs. The PPT is a full tree of maximum arity and maximum depth. Each node in the tree holds a probability table containing probabilities for all the symbols of the language. The algorithm is essentially that of PBIL64. PIPE iteratively generates successive populations of tree-form programs using the distribution generated by the PPT. Starting at the tree root, PIPE iteratively generates individual trees by sampling the possible symbols according to the probabilities at the corresponding nodes of the PPT. Having sampled the distribution in this way, it then uses truncation selection to select a sub-population, and then updates the probabilities of the PPT to increase the probability of generating this sub-population. Thus PIPE can be regarded as a form of PBIL in which the evaluation function is built up by treating the separate genes as generating a tree structure, and then evaluating this tree structure. PIPE implicitly assumes that GP building blocks are position-dependent, i.e. in PPT, unlike most GP systems, the useful subtrees/building blocks are attached to specific positions, and cannot be easily moved to other positions. Estimation of Distribution Programming (EDP)69 extends PIPE by modeling the conditional dependency between adjacent nodes in the PPT. Yanai and Iba argue that strong dependence is likely to exist between a node and its parent, grandparent and sibling nodes. For computational cost reasons, the implemented system actually restricts consideration to pairwise parent-child dependencies. ECGP68 further extends the PIPE modeling, applying the approaches from ECGA70 to a tree representation. The probabilistic model is based on the PPT. Each node of the PPT is a random variable. Marginal product models (MPMs) are used to model the population of genetic programming. MPMs are formed as a product of marginal distributions on a partition of the random variables. For example, in ECGA, an MPM [1,3][2][4], for a four-bit problem represents that the first and third genes have intensive interaction and the second and fourth are independent. ECGP partitions the PPT into subtrees, and the MPM factorises the joint probabilities of nodes of the PPT to a product of marginal distributions on this partition. The optimal MPM is found by a greedy search using a Minimum Measurement Length heuristic; since the partitions are subtrees, it appears that the search operator is spatial extension of a subtree to include a neighbouring node. PPT-based EDA-GP systems are able to effectively represent a population of GP-like solution trees. The PPT model is closely anaologous to the underlying tree representation, and thus provides a clear analogy to GP. However it is not so clear that there is a close correspondence between the evolution of GP building blocks and the learning which occurs in a PPT-based EDA system. What is explicitly learnt in a PPT-based EDA system is subtrees in particular locations in the PPT tree (of size 1 in PIPE, size 2 in EDP, variable size
in ECGP). This corresponds relatively closely to the rooted schemata of Rosca71 and Poli7273. However these rooted schemata were chosen largely because they are amenable to mathematical treatment, and can yield exact schema theorems as opposed to the inexact schema theorems which result from the non-rooted schemata of O’Reilly 74 and of Whigham 75 , rather than because they capture the entire effects of building-blocks in the behaviour of GP. In fact, the success in GP of modularisation methods such as Automatically Defined Functions76 and Automatically Defined Macros 77 demonstrate the importance in GP of relocation and replication of building blocks. These effects cannot be captured in PPT-based models in their present forms. In the current PPT-based models, each building block must be learnt in a particular location; if a similar structure occurs in a different location in the PPT, it must be learnt separately. They have lost the relocation and replication power of crossover (and for this reason, we would predict that crossover would make a useful local search operator in PPT-based GP systems).
2.3.1.3
Ant-Based GP
At least in the context of GP, ant-based algorithms and EDA algorithms are very closely related, differing usually only in their probability update policies (statistically based in EDAs, biologically based on pheromone deposition and evaporation in ant algorithms). Parallel to the EDA-GP developments, a number of ant-based GP algorithms have been proposed in recent years. We consider the non-grammar variants here. The earliest application known to us is Roux and Fonlupt’s Ant Programming78. Although the terminology is different, Ant Programming appears to us very similar to PIPE, differing only in the update policy for the probability tables. Two subsequent ant-based algorithms, Boryczka and Czech’s Ant Colony Programming (ACP)79 and Green et al’s Ant Colony Optimisation (ACO)80 use predefined grids labeled with GP symbols as probability models, with ant algorithms being used to determine paths representing good programs. In Ant Colony Programming, one variant has a single fully connected graph node per language symbol, with the concomitant problem that solutions requiring the same symbol to be used multiple times with different arguments will lead to convergence problems. A variant permits multiple occurrences of language symbols; it is not clear from our reading whether these graph structures are initialised by hand or randomly created. ACO uses a randomly initialized graph with a chosen level of symbol redundancy and connectivity. Both approaches rely on the predefined graph to determine the dependency relationships between symbols, and intrinsically can only encode pairwise dependency relationships at any given location. Rojas and Bentley’s Grid Ant Colony Programming 81 resembles ACP and ACO in some ways, but introduces a temporal index, corresponding closely to the depth constraint of some of the grammar-based systems described below.
2.3.2 Grammar-based GP-EDAs Grammar-based GP EDAs use some form of grammar as the probabilistic model on which the distribution learning is based. The commonest grammar form is a stochastic CFG (SCFG), in which each grammar production has an attached probability; when the grammar is used to generate a tree, productions are used stochastically, driven by the attached probabilities. Grammar-based EDAs fall naturally into two camps, EDAs using a fixed grammar as the probability model, with only the probabilities changing (analogous with PIPE) and EDAs in which the grammar model, as well as the probabilities, change as search progresses. Grammar-based GP EDA was prefigured in Whigham’s grammar learning82. It was originally described from the then-dominant perspective of a GP system, as a grammar-based GP system with a non-uniform (learnt) probability distribution over the grammar productions, with inductive modification of the grammar during search. From an EDA perspective, it could just as readily be described as an SCFG-based EDA, with genetic local search, and ad-hoc grammar model update. Tanev’s system83 is similarly described as GP with an added Stochastic Context Sensitive Grammar (SCSG)
learning component, but can also be regarded as a grammar-based EDA with contextual structure learning similar to that of PEEL84 However the first avowedly EDA-based GP work was the Stochastic Grammar Genetic Programming (SG-GP) of Ratle and Sebag 8586. The model used in SG-GP is an SCFG, initialised to a uniform distribution over the productions for each nonterminal. SG-GP uses both positive and negative reinforcement, increasing the probabilities used to produce fit individuals, and decreasing those used to produce unfit individuals. SG-GP came in two variants, called the scalar and vectorial versions. In the scalar version, there is a single probability table for the grammar. However this version suffers from similar convergence problems to the first version of ACP described above: that the initial grammar may not be sufficiently complex to record the structure of the problem solution, leading to convergence problems as different parts of good solutions impose countervailing pressures on the probability values for a given nonterminal. SG-GP alleviates this problem by imposing a partial positional constraint: in vectorial SG-GP, there is a depth dimension to the probability distribution. In SG-GP, this is implemented by attaching a vector (of length equal to the maximum permitted program depth) to each grammar production. The elements of this vector represent the probabilities of applying the given production at the corresponding depths in the derivation tree. Thus a given rule may be applied with high probability at one depth in the derivation tree, but with a low probability at another. Thus vectorial SG-GP has partial positional dependence (resembling PIPE in this respect), but the positional dependence is only on depth, not on breadth: building blocks may (indeed must) be shared between nodes at the same depth. A remarkably similar mechanism was used as the basis of the (independently-derived) Program Evolution with Explicit Learning (PEEL)84, which also uses SCFGs with a depth constraint – though described in that publication as a stochastic parametric L-System. However PEEL also adds a location constraint – initially empty (hence equivalent to SG-GP), but gradually evolving to incorporate a context (ie parent and possibly deeper ancestor nodes) in which the production may be applied. In this way, PEEL’s initial bias is to share building blocks at a given depth, but it may learn that particular building blocks are not to be shared. The context learning algorithm relies upon a heuristic that detects when particular nonterminals are both highly influential and poorly converged. The underlying assumption is that if a nonterminal is heavily used, yet its probability does not converge, it is likely that there are conflicting uses of the nonterminal which require different probability distributions, and that these conflicts may be resolved by determining the context in which the nonterminal is used. Bosman and de Jong87 also make use of an SCFG with a depth constraint, independently deriving a very similar approach to SG-GP and PEEL (but with additional complications arising from maintaining as the population only the derived sentences of the grammar as in Antonisse10, rather than the derivation trees as in many later GGGP systems – this imposes the computational cost of parsing on each generation, and requires grammar restrictions to avoid ambiguous grammars). Like PEEL, Bosman and de Jong learn the structure of the grammar as well as the probability distribution, using greedy search with rule expansion as the search operator, and a Minimum Description Length (MDL) metric as the search heuristic. A rather different SCFG-based system is Generalised Ant Programming (GAP)88. GAP records the whole path an ant has visited, building path-based rather than content-based transition tables, resulting in a grammar-based system somewhat resembling EDP (in that the probabilities represent pairwise interactions between nodes). Ant-TAG 8990 resembles scalar SG-GP in using only a single grammar probability table, but uses a stochastic TAG grammar rather than an SCFG as the probability model. Possibly as a consequence, it was not particularly successful as a pure system, but gave good results when local search based on the crossover operator was added. Grammar Model-based Program Evolution (GMPE) 91 takes a different approach. Whereas the
structure-learning systems described above learn the grammar model incrementally, using ad-hoc heuristics to expand the grammar, GMPE uses standard grammar learning methods from NLP, and learns the grammar model anew from the selected population each generation. It starts from a very specific grammar, in which the rules directly describe the selected population, and uses stochastic hill-climbing search , with rule merging as the sole operator and a carefully-derived Minimum Measurement Length (MML) metric as the search metric. GMPE has demonstrated extraordinarily good search performance in terms of numbers of fitness evaluations required to find a solution, but is not suitable for application to practical problems because of the very high computational overhead embodied in the grammar learning algorithm. GMPE’s computational cost arises because complex MML calculations are required for each merge, to ensure that the merge does not over-generalise. A very recent, and less conservative, variant – simple GMPE (sGMPE) 92 – relies on the observation that the outer EDA algorithm is a specializing algorithm (the specialization resulting from the selection step), so that a small degree of over-generalisation in the learning phase can be tolerated, so long as it is not sufficiently large as to counteract the specialization resulting from selection (which would lead to stagnation). Some types of GMPE merges (unifications) will always lead to a reduction in MML, since they do not cause any generalization. sGMPE first deterministically carries out all available unifications, then stochastically applies a user-tunable number of generalising merges, the idea being that the user tunes the number of merges to give satisfactory performance without over-generalisation. sGMPE has delivered a slight performance improvement over GMPE in terms of number of evaluations per solution, but a halving of the computational cost of the learning phase, at the cost of adding an extra tuning parameter to the algorithm; performance is relatively robust to changes in the tuning parameter.
3
What do Grammars Bring to GP?
The important place of grammars in modern GP suggests that grammars bring some valuable benefits; however there are also costs associated with these benefits. What are these benefits and costs, and how do they affect grammar-based GP systems? We consider, first, properties which apply to most or all grammar representations, and then some specific properties which are limited to only a few representations.
3.1
Properties of Grammars in General 3.1.1 Declarative Search Space Restriction
Perhaps the most obvious consequence of using grammars in GP is the ability it provides to restrict the search space. This has been the primary justification for the use of GGGP almost from the start. The primary benefit of restricting the search space is, of course, to reduce the search cost to the minimum necessary to find a solution, but it comes with the concomitant risk that the solution may not be within the search space defined by the grammar, or perhaps more serious, that the solution may be isolated by the grammar constraints and may be difficult to reach. One of the commonest uses of grammars in GP applications has been to impose type restrictions; when used in this way, it is equivalent to Strongly Typed Genetic Programming (STGP)15. Another common use is to exclude solutions of a particular form – for some problems, solutions of particular forms may have high fitness but low usefulness, so the simplest way to ensure that they do not confuse the search is to exclude them from the search space entirely. In most GGGP systems, the grammar is one of the inputs to the system – provided in Backus-Naur Form or a similar formalism. Thus the search space is readily altered, simply by changing the grammar input. A common mode of operation with GGGP systems is to start with a very general grammar, and then to iteratively refine the grammar to narrow the search space as the results of earlier searches are obtained. This interactive style of use permits the user to influence the search process to a greater degree than is possible with many other forms of GP.
3.1.2 Problem Structure A number of GP problem domains are themselves directly grammar-related. For example, protein motifs are essentially regular expressions delineating (with some level of reliability) a family of proteins. It is not surprising that grammar-based GP systems have figured prominently in motif discovery research.
3.1.3 Homologous Operators To the extent that the grammar used reflects the semantic structure of the domain, the homologous operators provided by GGGP replace one component of a solution with another with similar function. This homology may be expected to result in more effective search.
3.1.4 Solution Models Grammars can define not only the very general search spaces required to describe the problem domain, but also more restricted spaces, even down to a single solution. Hence grammars, in common with some other representations, provide a mechanism to delineate the gradually narrowing set of potential solutions. However grammars were specifically designed as a way to represent constrained contextual relationships within families of trees, so it is not surprising that they have shown strengths in this area, providing the ability to incrementally learn the structure of problem solution spaces.
3.1.5 Opportunities for Search Space Transformation Since the publication of the messy GA93, search space transformation has been a core area of GA research. Outside grammar-based GP, it appears to have been less studied in the GP literature, perhaps because the constraints imposed by GP tree structure limit the available transformations. On the other hand, a wide range of transformations have been proposed for grammar-based representations,transforming the grammar derivation trees to linear strings. All rely on the restrictions imposed by the grammar rules to impose a numbering system on the productions of the grammar. These transformations offer the advantages of a linear genotype – ability to apply the wide range of tools and methods developed for GAs, and reduced constraint allowing much more flexible genetic operators. They also appear to offer parsimony advantages, with less bloat than is observed in tree-based GP systems. However these advantages come at a cost. For one thing, the operators are no longer homologous after the linearization transformations. In all these transformations, an alteration early in the genotype often changes the interpretation of the subsequent genotype, so that (depending on the grammar) the proportion of highly disruptive operators may be much higher than in the underlying tree-based representation.
3.1.6 Feasibility Constraints Grammar-based approaches also bring with them some disadvantages. As with standard GP, grammar guided genetic programming is still far from resembling GA in its algorithmic flexibility, despite some significant efforts in this direction. In tree-based GGGP systems, it is even more difficult than in GP to design operators, especially those making small local changes to the genotype. In addition to the constraints imposed by the GP tree grammar, the derivation trees in GGGP, are even more constrained by the rewrite rules of the formalism. For GGGP systems with linear representation (including, we believe, the recent extensions of GE), the problem is partly solved in the sense that it is easy to design new operators, including local operators, in the genotype space. However, the non-causality in the genotype-phenotype mapping means that this effect disappears in the phenotype space.
3.1.7 Turing Incompleteness The Turing completeness of a GGGP system depends upon the semantics of the grammar used, so that GGGP may deal with both Turing-complete and Turing-incomplete problems. As a result, GGGP systems typically do not intrinsically offer any additional support for specific computational paradigms (recursion,
iteration etc) such as is provided in some other GP systems.
3.2
Properties of Specific Grammar Formalisms 3.2.1 Incorporating Semantics
A number of GGGP systems – Ross’ DCTG-based system41 and attribute-grammar systems383940 – incorporate semantics along with the syntax of the grammar. As a result, these systems are highly retargetable. Applying them to a new problem requires only specification of the new grammar and semantics, and definition of the fitness function. The latter is generally the only programming required, and may typically be less than ten lines of code. In incremental learning, where the grammar and fitness function may be altered by the user to guide search, it is often the case than no re-programming is required at all.
3.2.2 Operators As discussed above, defining new operators – including local search operators and biologically motivated operators – is extremely difficult in standard GGGP systems because of the constrained search space, so most tree-based GGGP systems rely solely on subtree mutation and crossover. The linearization transformation of GE and like systems makes it easy to define such new operators, and to control their effect in the genotype space, but the relatively uncontrolled rescaling resulting from the linearization transformation means that the effects in the phenotype space may be very different. One of the key advantages claimed for TAG-based grammar systems 46 is that the ease with which new operators may be defined, combined with the lack of disruption of the genotype-phenotype transformation, means that many of the advantages of the GA representation can be recaptured for GP.
3.2.3 Long-Distance Dependencies One of the key benefits of grammar-based GP is the homology of the operators, which can be viewed as providing less disruptive, and more meaningful, transformation of building blocks. In standard GGGP, the building blocks are connected sub-graphs of the derivation tree. While connectedness is clearly an important aspect of building blocks, it is arguable that many of the important structures in human-generated programs are not locally connected, and require long-distance dependencies just like those in natural language. The TAG transformation permits local dependencies in the genotype space (ie the TAG derivation tree) to map to long-distance dependencies in the phenotype space (CFG derivation trees) in a controlled way corresponding to the structure of the grammar (in TAG representation, the number dependence between ‘cat’ and ‘sits’ in ‘The cat which has just had a very filling lunch sits on the mat’ is a local dependence, whereas it is long-distance in the corresponding CFG representation).
4
Conclusions
Grammars offer many advantages in Genetic Programming, and have been widely used for this purpose. The field is still developing rapidly, with many important research directions currently being explored. Perhaps the key areas are in representation and in search. In representation, there is important current work on alternative linearization transformations, and on alternative grammar representations. In alternative search algorithms (ie EDA and ant-based algorithms), grammar representations offer clear advantages, with the result that grammar-based systems form one of the main threads in this field.
References 1
N.L. Cramer, “A Representation for the Adaptive Generation of Sequential Programs”, in J.J. Grefenstette, editor ”Proceedings of an International Conference on Genetic Algorithms and the Applications”, 183-187, 1985. 2 J.R. Koza, “Genetic Programming: On the Programming of Computers by Natural Selection”, MIT Press, MA, 1992. 3
Lappoon R. Tang, Mary Elaine Califf, and Raymond J. Mooney “An Experimental Comparison of Genetic Programming and Inductive Logic Programming on Learning Recursive List Functions” Technical Report AI 98-271, Artificial Intelligence Lab, University of Texas at Austin, May 1998
4
Whigham, P. A. (2000). Induction of a marsupial density model using genetic programming and spatial relationships. Ecological Modelling 131, 299-317. 5 Shan Y, McKay, R I, Lokan C J and Essam D L "Software Project Effort Estimation using Genetic Programming", IEEE International Conference on Communications, Circuits and Systems, Chengdu, China, July 2002, Pp 1108 - 1112 6 J. D. Lohn, D. S. Linden, G. D. Hornby, W. F. Kraus, A. Rodriguez, S. Seufert, "Evolutionary Design of an X-Band Antenna for NASA's Space Technology 5 Mission," Proc. 2004 IEEE Antenna and Propagation Society International Symposium and USNC/URSI National Radio Science Meeting, Vol. 3, pp. 2313-2316, 2004. 7 Spector, Lee, Barnum, Howard, and Bernstein, Herbert J. 1998. Genetic programming for quantum computers. In Koza, John R., Banzhaf, Wolfgang, Chellapilla, Kumar, Deb, Kalyanmoy, Dorigo, Marco, Fogel, David B., Garzon, Max H., Goldberg, David E., Iba, Hitoshi, and Riolo, Rick. (editors). Genetic Programming 1998: Proceedings of the Third Annual Conference. San Francisco, CA: Morgan Kaufmann. Pages 365–373 8 Koza, John R., Bennett III, Forrest H, Andre, David, and Keane, Martin A. 1999a. Genetic Programming III: Darwinian Invention and Problem Solving. San Francisco, CA: Morgan Kaufmann. 9 M. O'Neill and C. Ryan, Grammatical Evolution, IEEE Transactions on Evolutionary Computation, 5(4), 349-358, 2001. 10
H.J. Antonisse, A Grammar-Based Genetic Algorithm, in G.J.E. Rawlins, editor, Foundations of Genetic Algorithms, Morgan Kaufmann, 1991. 11 Nguyen Xuan Hoai, Y. Shan, and R. I. McKay, Is Ambiguity is Useful or Problematic for Genetic Programming? A Case Study, in Proceedings of 4th Asia-Pacific Conference on Evolutionary Computation and Simulated Learning (SEAL02),IEEE Press, 449-453, 2002. 12 P.A. Stefanski, Genetic Programming Using Abstract Syntax Trees. Notes from the Genetic Programming Workshop (ICGA'93), 1993. 13 G.P. Roston, A Genetic Design Methodology for Configuration Design, PhD. Thesis, Carnegie Mellon University, 1994. 14
J. Mizoguchi, H. Hemmi, and K. Shimohara, Production Genetic Algorithms for Automated Hardware Design through Evolutionary Process, in Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE Press, 85-90, 1994. 15 D.J. Montana, Strongly-Typed Genetic Programming, Evolutionary Computation 3(2), 199-230, 1995. 16
P.A. Whigham, Context-Free Grammar and Genetic Programming, Technical Report CS20/94 Department of Computer Science, ADFA, University of New South Wales, Australia, 1994. 17 P.A. Whigham, Grammatically-Based Genetic Programming, in Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications}, Morgan Kaufmann, 33-41, 1995. 18 A.G. Schultz, Fuzzy Rule-Based Expert Systems and Genetic Machine Learning}, Physica-Verlag, 1995. 19
W. Bohm and A.G. Schultz, Exact Uniform Initialization for Genetic Programming, in R.K. Belew and M.D. Vose, editors, Foundations of Genetic Algorithms 4, Morgan Kaufmann, 379-408, 1997. 20 M.L. Wong and K.S. Leung, Learning First-order Relations from Noisy Databases using Genetic Algorithms,in Proceedings of the Singapore Second International Conference on Intelligent Systems, 159-164, 1994. 21
M.L. Wong and K.S. Leung, An Adaptive Inductive Logic Programming System using Genetic Programming, in Proceedings of the Fourth Annual Conference on Evolutionary Programming, MIT Press, 737-752, 1995. 22 M.L. Wong and K.S. Leung, Combining Genetic Programming and Inductive Logic Programming using Logic Grammars, in Proceedings of the IEEE International Conference on Evolutionary Computing, IEEE Press, 733-736, 1995. 23 M.L. Wong and K.S. Leung, Applying Logic Grammars to Induce sub- functions in Genetic Programming, in Proceedings of the IEEE International Conference on Evolutionary Computing, IEEE Press, 737- 740, 1995. 24 M.L. Wong and K.S. Leung, Genetic Logic Programming and Applications, IEEE Expert 10(5), 68-76, 1995. 25 M.L. Wong and K.S. Leung, Evolutionary Program Induction Directed by Logic Grammars, Evolutionary Computation, 5(2), 143-180, 1997. 26 M.L. Wong and K.S. Leung, Data Mining Using Grammar Based Genetic Programming and Applications, Kluwer Academic Publishers, 2000 27 M. L. Wong, personal communication, 2003 28
M. Ridley, Evolution, Second Edition, Blackwell Science, London, 1996. F. Gruau, On Syntactical Constraints with Genetic Programming, in P. Angeline, and K.E. Kinear Jr, editors, Advances in Genetic Programming 2, MIT Press, 402-417, 1996. 30 M. Keijzer and V. Babovic, Dimensionally Aware Genetic Programming, in W. Banzhaf et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'1999), Morgan Kaufmann, 1069-1076, 1999. 31 A. Ratle and M. Sebag, Genetic Programming and Domain Knowledge: Beyond the Limitations of Grammar-Guided Machine Discovery, in PPSN2000, 211-220, 2000. 32 I. Tanev, DOM/XML-Based Portable Genetic Representation of Morphology, Behavior and Communication Abilities of Evolvable Agents, in Proceedings of the 8th International Symposium on Artificial Life and Robotics (AROB-03)}, 185-188, 2003. 33 I. Tanev and K. Shimohara, On the Role of Implicit Interation and Explicit Communications in Emergence of Social Behaviour in Continuous Predators-Prey Pursuit Problem, {\em Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'2003)}, Springer-Verlag, 74-85, 2003. 29
34
I. Tanev and M. Brozozowski, The Effect of Explicit Communications on the Generality and Robustness of Evolved Teams of Agents in Predator-Prey Problem, in S.B. Cho, Nguyen Xuan Hoai, and S. Yin, editors, Proceedings of the First Asia-Pacific Workshop on Genetic Programming (ASPGP'2003), 31-37. 35 R. M. MacCallum, Introducing a Perl Genetic Programming System: and Can Meta-evolution Solve the Bloat Problem?,in Proceedings of the 6th European Conference on Genetic Programming (EuroGP'2003), LNCS 2610, Springer-Verlag, 369-378, 2003. 36
Ernesto Rodrigues and Aurora Pozo, Grammar-Guided Genetic Programming and Automatically Defined Functions, in Advances in Artificial Intelligence: 16th Brazilian Symposium on Artificial Intelligence, SBIA 2002, Proceedings, LNAI, Vol. 2507, pp. 324-333, 11-14 November 2002. 37 P. Bruhn and A.G. Schulz, Genetic Programming over Context-Free Languages with Linear Constraints for the Knapsack Problem: First Results, Evolutionary Computation 10(1), 51-74, 2002. 38 T.S. Hussain and R.A. Browse, Attribute Grammars for Genetic Representations of Neural Networks and Syntactic Constraints on Genetic Programming,{\em Proceedings of AIVNGI'98: Workshop on Evolutionary Computation, 1998. 39 T.S. Hussain and R. A. Browse, Genetic Operators with Dynamic Biases that Operate on Attribute Grammar Representations of Neural Networks, in Proceedings of Workshop on Advanced Grammar Techniques Within Genetic Programming and Evolutionary Computation, 83-86, 1999 40 S. Zvada and R. Vanyi, Improving Grammar Based Evolution Algorithms via Attributed Derivation Trees, in Proceedings of the 7th European Conference (EuroGP'2004), LNCS 3003, Springer-Verlag, 208-219, 2004. 41
B.J. Ross, Logic-based Genetic Programming with Definite Clause Translation Grammars, New Generation Computing, 19(4), 313-337, 2001. 42 A.K. Joshi, L. S. Levy,and M. Takahashi, Tree Adjunct Grammars, Journal of Computer and System Sciences, 21 (2), 136-163, 1975. 43 Nguyen Xuan Hoai, Solving the Symbolic Regression Problem with Tree Adjunct Grammar Guided Genetic Programming: The Preliminary Result, in Proceedings of 5th Australasia-Japan Workshop in Evolutionary and Intelligent Systems, 52-61, 2001. 44 Nguyen Xuan Hoai, R.I. McKay, and D. Essam, Some Experimental Results with Tree Adjunct Grammar Guided Genetic Programming, In J.A. Foster, E.Lutton, J. Miller, C. Ryan, and A.G.B. Tettamanzi, editors, Proceedings of the Fifth European Conference on Genetic Programming, LNCS 2278, Springer-Verlag, 328-337, 2002. 45
Nguyen Xuan Hoai, R.I. McKay, and H.A. Abbass, Tree Adjoining Grammars, Language Bias, and Genetic Programming, in Ryan C. et al, editors, Proceedings of EuroGP 2003, LNCS 2610, Springer-Verlag, 335-344, 2003. 46 Nguyen Xuan Hoai and R.I. McKay, Softening the Structural Difficulty with TAG-based Representation and Insertion/Deletion Operators, K. Deb et al, editors, Proceedings of Genetic and Evolionary Computation Conference (GECCO2004), LNCS 3103,605-616, 2004. 47 Nguyen Xuan Hoai, R.I. McKay, D. Essam, Genetic Tranposition in Tree Adjoining Grammar-Guided Genetic Programming: The Relocation Operator, in Proceedings of the 5th International Conference on Simulated Evolution and Learning (SEAL'04)}, 2004. 48 Paul Holmes and Peter J. Barclay, Functional Languages on Linear Chromosomes, Genetic Programming 1996: Proceedings of the First Annual Conference, p. 427, MIT Press, 28-31 July 1996 49 P. Nordin, W. Banzhaf and F. Francone, Efficient Evolution of Machine Code for CISC Architectures using Blocks and Homologous Crossover, in L. Spector, W. Langdon, U. O'Reilly and P. Angeline (eds), Advances in Genetic Programming III, MIT Press, Cambridge, MA, 1999, pp. 275 -- 299 50 Lee Spector and Kilian Stoffel Ontogenetic Programming, Genetic Programming 1996: Proceedings of the First Annual Conference, pp. 394-399, MIT Press, 28-31 July 1996. 51 R. Keller and W. Banzhaf, GP Using Mutation, Reproduction and Genotype-Phenotype Mapping from Linear Binary Genomes into Linear LALR Phenotypes, in Genetic Programming 96, MIT Press, 116-122, 1996. 52 N. R. Paterson and M. Livesey, Distinguishing Genotype and Phenotype in Genetic Programming, Late Breaking Papers at the Genetic Programming 1996 Conference, 141-150, 1996. 53 N. Paterson and M. Livesey, Evolving Caching Algorithms in C by GP, in Proceedings of Genetic Programming 97, MIT Press, 262-267, 1997. 54 J.J. Freeman, A Linear Representation for GP using Context Free Grammars, in Proceedings of Genetic Programming 1998, the Third Annual Conference on Genetic Programming, Morgan Kaufmann, 72-77, 1998. 55 C. Ryan, J. J. Collins, and M. O'Neill, Grammatical Evolution: Evolving Programs for an Arbitrary Language, in Proceedings of the First European Workshop on Genetic Programming, LNCS 1391, Springer-Verlag, 83-95, 1998 56 M. O'Neill and C. Ryan, Grammatical Evolution: Evolutionary Automatic Programming in an Arbitrary Language}, Genetic programming, 4, Kluwer Academic Publishers, 2003. 57 M. O'Neill, C. Ryan, M. Keijzer, and M. Cattolico, Crossover in Grammatical Evolution: The Search Continues, in Proceedings of the Fouth European Conference on Genetic Programming (EuroGP'2001), LNCS 2038, Springer-Verlag, 337-347, 2001. 58 M. Keijzer, M. O'Neill, C. Ryan, and M. Cattolico, Grammatical Evolution Rules: The Mod and the Bucket Rule, in Proceedings of the 5th European Conference on Genetic Programming (EuroGP'2002), LNCS 2278, Springer-Verlag, 123-130, 2002. 59 J. O'Sullivan and C. Ryan, An investigation into the use of different search strategies with Grammatical Evolution, in Proceedings of the 5th European Conference on Genetic Programming (EuroGP'2002), LNCS 2278, 268-277, Springer-Verlag, 2002.
60
C. Ryan, A. Azad, A. Sheahan, and M. O'Neill, No Coercion and No Prohibition, A Position Independent Encoding Scheme for Evolutionary Algorithms- The Chorus System, in Proceedings of the 5th European Conference on Genetic Programming (EuroGP 2002), LNCS 2278, Springer-Verlag, 131-141, 2002. 61 C. Ryan, M. Nicolau, and M. O'Neill, Genetic Algorithms Using Grammatical Evolution, in Proceedings of the 5th European Conference on Genetic Programming (EuroGP 2002), LNCS 2278, Springer-Verlag, 278-287, 2002. 62
J.H. Holland, Adaptation in Natural and Artificial Intelligence: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, Michigan University Press, 1975. 63 D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, MA, 1989. 64
Baluja, S, Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning, Technical Report CMU-CS-94-163, Carnegie-Mellon University, Pittsburgh, PA 65 Bosman, P. and Thierens, D., An Algorithmic Framework for Density Estimation Based Evolutionary Algorithms, Technical Report UU-CS-1999-46, University of Utrecht 66 Pelikan, M, Goldberg, D, and Cantu-Paz, E, BOA: The Bayesian Optimization Algorithm, in Banzhaf, W, Daida, J, Eiben, A E, Garzon, M H, Honavar, V, Jakiela, M and Smith, R E (eds), Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-99, Vol 1, Orlando, Florida, 525 – 532, Morgan Kaufmann, San Francisco, 1999 67 R. P. Salustowicz and J. Schmidhuber, Probabilistic Incremental Program Evolution, Evolutionary Computation 5(2), 1997, 123 – 141, MIT Press 68 Kumara Sastry and David E. Goldberg, Probabilistic Model Building and Competent Genetic Programming, in Rick L. Riolo and Bill Worzel, eds, Genetic Programming Theory and Practise, Kluwer, 2003, 205--220 69 Kohsuke Yanai and Hitoshi Iba, Program Evolution Using Bayesian Network, in Sung-Bae Cho and Hoai Xuan Nguyen and Yin Shan (eds), Proceedings of The First Asian-Pacific Workshop on Genetic Programming, Canberra, Australia 2003, 16 - 23 70
. R. Harik and F. G. Lobo and D. E. Goldberg, The Compact Genetic Algorithm, IEEE Transaction on Evolutionary Computation, 3(4), 1999, 287-297 71 Rosca, J.P. (1997) Analysis of complexity drift in genetic programming. In J.R. Koza, K. Deb, M. Dorigo, D.B. Fogel, M. Garzon, H. Iba & R. Riolo (Eds.), Genetic Programming 1997: Proceedings of the Second Annual Conference, San Francisco, CA: Morgan Kaufmann, 286-294. 72 R. Poli and N. F. McPhee, Exact Schema Theorems for GP with One-Point and Standard Crossover Operating on Linear Structures and their Application to the Study of the Evolution of Size, Proceedings of EuroGP'2001, Julian F. Miller, Marco Tomassini, Pier Luca Lanzi, Conor Ryan, Andrea G. B. Tettamanzi and William B. Langdon (Eds), LNCS 2038, pp. 126-142, Lake Como, Italy, April, 2001. 73 W. B. Langdon and R. Poli. Foundations of Genetic Programming, Springer, due November 2001 74
U.M. O'Reilly and F. Oppacher, The Troubling Aspects of a Building Block Hypothesis for Genetic Programming, in L. Whitley and M.D. Vose, editors, Foundations of Genetic Algorithms 3, Morgan Kaufmann, 73-88, 1995. 75 P.A. Whigham, A Schema Theorem for Context-Free Grammars, in Proceedings of the 1995 IEEE International Conference on Evolutionary Computation, IEEE Press, 178-182, 1995. 76 J. R Koza, Genetic Programming 2: Automatic Discovery of Reusable Programs, MIT Press, Cambridge, Mass., 1994 77
Lee Spector,, Evolving Control Structures with Automatically Defined Macros, Working Notes for the AAAI Symposium on Genetic Programming, pp. 99-105, AAAI, 10-12 November 1995. 78
O. Roux and C. Fonlupt, Ant Programming: or how to ants for automatic programming, Proceedings of the Second International Conference on Ant Algorithms (ANTS2000) 79 Mariusz Boryczka and Zbigniew J. Czech, Solving Approximation Problems By Ant Colony Programming, in W. B. Langdon, E. Cantu-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull. M. A. Potter, A. C. Schultz, J. F. Miller, E. Burke and N. Jonoska (eds), GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann, San Francisco CA 80 Jennifer Green, Jacqueline L. Whalley and Colin G. Johnson, Automatic Programming with Ant Colony Optimization, in Mark Withall and Chris Hinde (eds) Proceedings of the 2004 UK Workshop on Computational Intelligence, 2004, 70 - 77 81 Sergio A. Rojas and Peter J. Bentley, A Grid-based Ant Colony System for Automatic Program Synthesis, in Maarten Keijzer (ed), Late Breaking Papers at the 2004 Genetic and Evolutionary Computation Conference 82 P. A. Whigham, Inductive Bias and Genetic Programming, in A. M. S. Zalzala (ed) GALESIA 1 (1995), 461 - 466 83 Ivan Tanev, Implications of Incorporating Learning Probabilistic Context-sensitive Grammar in Genetic Programming on Evolvability of Adaptive Locomotion Gaits of Snakebot, in Proceedings of GECCO 2004 84 Y. Shan, R. I. McKay, H. A. Abbass and D. Essam, Program Evolution with Explicit Learning: a New Framework for Program Automatic Synthesis, Proceedings of 2003 Congress on Evolutionary Computation, Canberra 2003, IEEE Press 85 A. Ratle and M. Sebag, Avoiding the bloat with Probabilistic Grammar-Guided Genetic Programming, in Proceedings of the 5th International Conference on Artificial Evolution, LNCS 2310, Springer Verlag, 255-266, 2001. 86 A. Ratle and M. Sebag, A Novel Approach to Machine Discovery: Genetic Programming and Stochastic Grammars, in Proceedings of Twelfth International Conference on Inductive Logic Programming}, LNCS 2583, Springer Verlag, 207-222, 2002. 87 Peter A. N. Bosman and Edwin D. de Jong, Grammar Transformations in an EDA for Genetic Programming. in Pelikan, M (ed), GECCO Special session: OBUPM - Optimization by Building and Using Probabilistic Models, GECCO 2004
88
Christian Keber and Matthias G. Schuster, Option Valuation With Generalized Ant Programming, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2002, Morgan Kaufmann, 74 - 81 89 H. A. Abbass, N. X. Hoai and R. I. McKay, AntTAG: A new method to compose computer programs using colonies of ants, in Proceedings of the IEEE Congress on Evolutionary Computation, IEEE Press 2002, 1654-1659 90 Y. Shan, H. Abbass, R. I. McKay and D. Essam, AntTAG: a Further Study, in Ruhul Sarker and Bob McKay (eds) Proceedings of the Sixth Australia-Japan Joint Workshop on Intelligent and Evolutionary Systems, Canberra, Australia, 2002 91 Yin Shan, Robert I. McKay, Rohan Baxter, Hussein Abbass, Daryl Essam and Hoai Nguyen, Grammar Model-based Program Evolution, in Proceedings of the 2004 IEEE Congress on Evolutionary Computation, IEEE Press, 2004, 478—485 92 Y Shan, Program Distribution Estimation with Grammar Models, PhD Thesis, University of New South Wales at the Australian Defence Force Academy, Canberra, Australia, 2005 93 Goldberg, DE, B. Korb, and K. Deb , Messy genetic algorithms: Motivation, analysis, and first results. Complex Systems 3(5), 493--530.