The Role of Syntactic and Semantic Locality of ... - Semantic Scholar

3 downloads 1022 Views 64KB Size Report
1Natural Computing Research & Applications Group, University College Dublin, Ireland. 2Faculty of Information Technology, Hanoi University, Vietnam ..... ence on Genetic and evolutionary computation, pages 1139–1146, Montreal, 8-12 July ...
The Role of Syntactic and Semantic Locality of Crossover in Genetic Programming Nguyen Quang Uy1 , Nguyen Xuan Hoai 2 , Michael O’Neill1 , and Bob McKay3 1 Natural

Computing Research & Applications Group, University College Dublin, Ireland 2 Faculty of Information Technology, Hanoi University, Vietnam 3 School of Computer Science and Engineering, Seoul National University, Korea [email protected], [email protected], [email protected], [email protected]

Abstract. This paper investigates the role of syntactic locality and semantic locality of crossover in Genetic Programming (GP). First we propose a novel crossover using syntactic locality, Syntactic Similarity based Crossover (SySC). We test this crossover on a number of real-valued symbolic regression problems. A comparison is undertaken with Standard Crossover (SC), and a recently proposed crossover for improving semantic locality, Semantic Similarity based Crossover (SSC). The metrics analysed include GP performance, GP code bloat and the effect on the ability of GP to generalise. The results show that improving syntactic locality reduces code bloat, and that leads to a slight improvement of the ability to generalise. By comparison, improving semantic locality significantly enhances GP performance, reduces code bloat and substantially improves the ability of Genetic Programming to generalise. These results comfirm the more important role of semantic locality for crossover in GP.

Key words: Genetic Programming, Semantics, Syntaxtic, Crossover.

1 Introduction Locality is important in all search methods. Our only justification for non-random methods is the assumption that there is some correlation between distance and fitness in the semantic space (otherwise random search is provably optimal). If we use a separate syntactic representation for the search algorithm, this requirement carries over: if there is no correlation between syntactic and semantic representation, we might as well use pure random search. Thus locality (continuity – small changes in genotype corresponding to small changes in phenotype) has long been seen as a crucial property of Evolutionary Computation (EC) representations [9, 10, 20, 21]. Assuming a continuous genotype-phenotype mapping, one may then ask, whether it is better to design operators to control locality in genotype or phenotype space. On the side of the genotype space lies the advantage of simplicity: it is easy to measure and control locality directly in the space where the operators are applied. Thus virtually all such work has relied on genotypic distance through syntactic metrics. On the other hand, at the cost of greater complexity, one might argue that phenotypic distances, being (presumably) more closely correlated with fitness, might lead to better metrics. Is this so? Is it worth the extra complication of designing semantically-based control of operators? This paper examines this question, comparing our own recent work [24] on semantics-based control of crossover in Genetic Programming (GP) with a new, syntactically-based form. Recent GP research has paid much attention to incorporating semantics in search [26, 11, 12, 3, 5, 13, 15, 14, 2, 23]. In recent work [24], Uy et al presented Semantic Similarity

2

N. Q. Uy, N. X. Hoai, M.O’Neill, B.McKay

based Crossover (SSC), which improves the semantic locality of crossover by paying attention to the scale of semantic differences between two subtrees. The results reported in [24] show that SSC significantly improves the performance of GP in solving a family of real-valued symbolic regression problems. However it also raises an important question, of the relationship between syntactic and semantic locality. Which (semantic or syntactic locality) is more important? We compare a crossover designed to directly improve syntactic locality with one relying on semantic locality for the effects on three aspects of GP: bloat, performance and generalisation. We show that syntactic locality plays a role in GP code bloat, but semantic locality is even more important in improving GP performance and in GP’s ability to generalise. The remainder of the paper is organised as follows. In the next section, we give a review of related work on semantic based crossovers in GP and a brief review of locality in Evolutionary Computation (EC). Section 3 describes SSC and a novel crossover for improving syntactic locality. The experimental settings are detailed in Section 4. The results of the experiments are presented and discussed in section 5. Section 6 concludes the paper and highlights some potential future work.

2 Related Work 2.1 Semantics in Genetic Programming Recently, semantics in GP has been addressed by a number of researchers. The work falls into three main strands: 1. using formal methods [11–13, 15, 14] 2. using grammars [26, 3, 5] 3. using structures such as GP trees [2, 23, 24] The first approach was advocated by Johnson in a series of papers [11–13]. In these methods, semantic information extracted from formal methods (e.g., Abstract Interpretation and Model Checking) is used to quantify fitness in problems where it is difficult to measure by sample point fitness. Katz and Peled consequently used model checking to solve the Mutual Exclusion problem [15, 14]. Again, individual fitness is measured through model checking. These formal methods have a strict mathematical foundation, that potentially may aid GP. Perhaps because of high complexity, however, they have seen only limited research. Their main application to date has been in evolving control strategies. The second category presents semantics by using Attribute Grammars. Attributes added to a grammar can generate some useful semantic information about individuals, which can be used to eliminate bad individuals [5], or to prevent generating semantically invalid ones [26, 3]. The attributes used to represent semantics are, however, problem dependent, and it is not always easy to design such attributes for a new problem. In the last category, semantics has mainly been used to control the GP operators. In [2], the authors investigated the effect of semantic diversity on Boolean domains, checking the semantic equivalence between offspring and parents by transformation to a canonical form, Reduced Ordered Binary Decision Diagrams (ROBDDs) [6]. This information is used to decide whether the offspring are copied to the next generation. The method improved GP performance, presumably because it increased semantic diversity. Uy et al. [23] proposed Semantics Aware Crossover (SAC), another crossover operator promoting semantic diversity, based on checking semantic equivalence of subtrees. It showed limited improvement on some real-value problems. This crossover was then extended to Semantic Similarity based Crossover (SSC) [24] by improving its semantic locality. The experimental results showed improved performance of SSC over both SC and SAC [24]. Our aim here is to investigate the effectiveness of semantic locality through a comparison of SSC with a crossover designed to improve syntactic locality.

The Role of Syntactic and Semantic Locality of Xovers in Genetic Programming

3

2.2 Locality in Evolutionary Computation In the field of GP in particular and Evolutionary Computation in general, locality (small change in genotype corresponding to small change in phenotype) plays a crucial role in the efficiency of an algorithm [9, 10, 20, 21]. Intuitively, maintaining diversity should be accompanied with improving locality. It should, however, be noted that this previous work on locality focused on the locality of the genotype-phenotype mapping. Rothlauf [21] investigated the locality of representations in Evolutionary Computation (EC). To determine the locality of a genotype-phenotype mapping, we must define two metrics, in the genotype and phenotype spaces. He argued that a representation with high locality is necessary for efficient evolutionary search. Although a representation with high locality is desirable, it may be very difficult to achieve. Thus many current GP representations are of low-locality, so that small syntactic changes in genotype can cause large change in phenotype. In this paper, we extend the concept of representation locality [21] to locality of an operator. We consider locality in both syntactic and semantic domains.

3 Methods This section briefly presents SSC beforing giving details of the new crossover based on syntactic locality. 3.1 Semantic Similarity based Crossover SSC as used here is almost identical to that of Uy et al. [24], with the exception of a slight change in the definition of the distance measure. We start with a clear defintion of (sub)tree semantics. Formally, the Sampling Semantics (SS) of a (sub)tree is defined as follows: Let F be a function expressed by a (sub)tree T on a domain D. Let P be a sequence of points sampled from domain D, P = (p1 , p2 , ..., pN ). Then, the Sampling Semantics of T on P in domain D is the corresponding sequence S = (s1 , s2 , ..., sN ) where si = F(pi ), i = 1, 2, ..., N. The optimal choice of N and P depend on the problems; we follow the approach of [24] in setting the number of points for evaluating the semantics equal to the number of fitness cases (20 points – Section 4) and in choosing the sequence of points P uniformly randomly from the problem domain. Based on SS, we define a Sampling Semantics Distance (SSD) between two subtrees. It differs from that in [24] in using the mean absolute difference in SS values, rather than (as before) the sum of absolute differences. Let U = (u1 , u2 , ..., uN ) and V = (v1 , v2 , ..., vN ) represent the SSs of two subtrees, S1 and S2 ; then the SSD between S1 and S2 is defined in equation 1: ∑Ni=1 |ui − vi | (1) N We follow [24] in defining a semantic relationship, Semantic Similarity (SSi), on the basis that the exchange of subtrees is most likely to be beneficial if they are not semantically identical, but also not too different. Two subtrees are semantically similar if their SSD lies within a positive interval. The formal definition of SSi between subtrees S1 and S2 is given in the following equation: SSD(S1 , S2 ) =

SSi(S1 , S2 ) = TruthValue(α < SSD(S1 , S2 ) < β)

4

N. Q. Uy, N. X. Hoai, M.O’Neill, B.McKay

where α and β are two predefined constants, the lower and upper bounds for semantics sensitivity. In general, the best values for lower and upper bound semantic sensitivities are problem dependent. In this work we set α = 10−3 and several values of β were tested.

Algorithm 1: Semantic Similarity based Crossover select Parent 1 P1 ; select Parent 2 P2 ; Count=0; while Count

Suggest Documents